JSON Performance in Python

Posted by on November 20, 2009

In part of my ongoing performance work in our CouchDB+Python application I’ve decided to sit down and profile JSON performance in the different open source libraries available for Python.

I ran this test profiling json (pure Python simplejson) available in Python stdlib, simplejson compiled with C speedups, cjson, and jsonlib2, with a large JSON document. The test decodes and encodes a large JSON object 100 times. It then runs that test 100 times in each library in succession in order to find the average encode/decode time for each library and minimize other environmental factors that may occur. These numbers were taken on my MacBook Air running Mac OS X 1.6.1 with the default Python 2.6.

The time represents in milliseconds how long it takes to encode/decode this JSON object 100 times.

JSONPerf

I honestly didn’t expect the stdlib json to be this far behind.

Among the other C based libraries there isn’t a clear winner. cjson is the best decoder but the slowest encoder, simplejson compiled with C speedups is the fastest encoder but the slowest decoder while jsonlib2 is somewhere in the middle for both cases.

Also, annoyingly, cjson doesn’t implement the same API as the other libraries (dump and load functions are named encode and decode) making it much more difficult for a library to include support for all available libraries. Now rather than just being able to add a user defined json module I’ll need to add support for user defined parsing and encoding functions to couchdb-pythonviews, couchquery, and couchdb-wsgi.

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Mixx
  • Google Bookmarks
  • StumbleUpon
  • Technorati
  • Reddit
  • Slashdot
  • Twitter
10 Comments on JSON Performance in Python

Respond | Trackback

  1. Bob IppolitoNo Gravatar says:

    The stdlib json is so far behind because they wouldn’t take the patch with all of the C speedups because it was too close to the release deadline. I also couldn’t get it in a point release since too much code changed and some behavior did too. There’s probably still something sitting in the tracker so hopefully it will make it in 2.7.

  2. [...] 在「JSON Performance in Python」這篇看到比較圖,在 Python 2.6 上面測試 json (stdlib)、simplejson、jsonlib2、cjson 這四個不同的 JSON library 的速度。 [...]

  3. Maybe I’m not reading that last paragraph right, but couldn’t you either do some monkey patching to the module json and force its dump/load to actually use the fast code? Or create a json library that is just a wrapper that can do the try/except import dance for the fast apis and wrap those accordingly?

  4. LeechaelNo Gravatar says:

    I’m have doubt why not you don’t use timeit module for benchmarking….

  5. Bob IppolitoNo Gravatar says:

    I think the no-monkeys-harmed answer to your question is:

    try:
    import simplejson as json
    except ImportError:
    import json

  6. mikealNo Gravatar says:

    I’m not a fan of monkey-patching the method names and it doesn’t solve the case where you want to use one module’s decoder and another modules encoder. Ideally, I would want to use cjson’s decoder and simplejson’s encoder so just having a module setting isn’t enough.

    I’ve been using that try block since 2.6 was released but now that I’m using couchdb-pythonviews I’m trying to every ms of performance I can out of my json parser.

  7. Michal ChruszczNo Gravatar says:

    This might be useful information:
    >>> import json
    >>> json.__version__
    ‘1.9′
    while most important (influencing performance the most) changes in simplejson were added around 2.0 release.

    The above chart might suggest that, in situation like yours, i.e. communication with database using JSON, when majority of operations is reading/decoding JSON, the best choice is cjson library. And this is true, with one caveat, though. The latest release of cjson – 1.0.5 – incorporates a bug in decoding escaped double quotes. Although it was never fixed by the maintainer of the library, there are some patches correcting this misbehaviour around, however I don’t know what’s their impact on performance.

  8. John MillikinNo Gravatar says:

    I’m the author of jsonlib (which jsonlib2 forked off of)

    First, please don’t use cjson for *anything*. It’s got multiple bugs and misfeatures, and is generally unsuited for anything except impressive benchmarks. It was easier to write my own library, from scratch, than try to fix cjson. To quote somebody, “If I want it done fast and wrong I’ll ask my cat”.

    This also means you don’t have to support custom formatters in your libraries, because you don’t need to (and shouldn’t) support cjson.

    Second, purely as a matter of curiosity, could you do an equivalent benchmark in Python 3? The version of jsonlib for Python 3 is even faster, because I got to drop some of the performance-degrading compatibility code.

  9. @mikeal – throw py-yajl in this bucket as well if you like. @agentdero and I have recently collaborated for speed, and here’s what it currently looks like on my 64bit archlinux box (python 2.6.4) :

    ParseTime: cjson 99.72539ms
    ParseTime: yajl 141.28672ms
    ParseTime: simplejson 223.61533ms
    DumpTime: cjson 289.44809ms
    DumpTime: yajl 154.87778ms
    DumpTime: simplejson 192.81055ms

    http://github.com/rtyler/py-yajl

    cheers,
    lloyd

Respond

Comments

Comments:

This site is using OpenAvatar based on