Monday, January 14, 2013

Json performance

Robert Haas wanted to know how much slower the recursive descent JSON parser would be than the explicit stack based one it's replacing. It's a reasonable question, and no doubt one close to his heart as he was the author of the original :-)

So we need a test, and I devised one by making a single large piece of json like this:
create temp table myjson as 
    select array_to_json(array_agg(q)) as jsontext 
    from (select * from pg_class) q;
\copy myjson to jsontext.csv csv
and a test script that looks like this:
create temp table myjson(j text);
\copy myjson from  jsontext.csv csv
\timing
do $$ declare lj text; begin select into lj j from myjson; 
  for i in 1 .. 10000 loop perform json_in(textout(lj)); 
  end loop; end; $$;

(In the actual test file the last line is repeated several times).

Robert had also pointed out that the parser was copying some strings in places where it didn't need to, and I quickly fixed that. But even after that change I was a bit dismayed to discover that there was a small but quite consistent performance degradation, So I went looking for things to tweak. It didn't take long to see that some things I had written as "if" statements repeatedly calling the routine that looks at the next look_ahead token could be much better written as a single call to that routine plus a "switch" statement.

The test script and data files are published at https://bitbucket.org/adunstan/pgdevel/downloads/jsontest.sql and https://bitbucket.org/adunstan/pgdevel/downloads/jsontext.csv The latest code with the above fixes can be pulled from https://bitbucket.org/adunstan/pgdevel.git, and you're welcome to play along. When this is all sorted out I will publish a new patch to the mailing list. Right now I can no longer get a consistently better result using release 9.2 than I get with my code. One of my colleagues is testing on a more stable platform than I have available. But I'm quite hopeful not that at worst the performance is fairly close.

3 comments:

  1. Hey FYI, bitbucket doesn't redirect your web requests from .git to a web page like github so making it an href is a bit less useful/more annoying if you're trying to browse to code before doing a git clone.

    I was looking through some of the source code, and curious about parsing Infinity to a string when converting to JSON. That is against the RFC for JSON (section 2.4):

    http://tools.ietf.org/html/rfc4627

    V8 will error out on trying, but other examples like python's json parser stringifies it like your's. Python has a setting to allow that or not though.

    Taras

    ReplyDelete
  2. I have fixed the link.

    When we are converting something to JSON, if the number is not a legal json number we stringify it. That's not remotely contrary to the JSON spec - that's perfectly legal. Mind you, the fact that JSON can't represent common JavaScript values is a sad indictment of the people who composed the spec. This isn't a matter of parsing, incidentally - the JSON parser quite correctly does not allow an unquoted Infinity as a numeric literal. TIAS.

    ReplyDelete
  3. Yes, I think its fine too. Others (google) have obviously decided its better to throw an error on that instead of converting it to a string is all - so its just something interesting to watch for when coding (such as not assuming you can do math on it later on).

    ReplyDelete