We’re gearing up for some heavy CouchDB usage in a new automation system and it has fallen upon me to do some performance benchmarking.
The most important thing for us to figure out was whether or not the CentOS virtual machine we’re currently running CouchDB on is going to be enough even in the short term. Until today we’ve been running 0.9 and have encountered performance problems.
Our main bottleneck is, and has always been, view generation and update performance. We tend to have medium to large size documents (jobs are relatively small but results from test runs can be incredibly large).
View generation of large documents has typically been our biggest issue which we have previously mitigated by refreshing all views after any large write but that isn’t going to work for the amount of results that we plan on pouring in to the new system.
Last weekend I wrote a Python view server for CouchDB. couchdb-python includes a view server but in the past I’ve heard complaints about performance (although none recently). In addition, the view server in couchdb-python only supports map and reduce, which is only about 1/5 of the current view server spec which includes handlers for update, show, list, filter, and validate which provide the groundwork for CouchDB as an application platform. As of Sunday my view server passes all of the current CouchDB spec and initial performance tests showed it faster than the JavaScript view server.
Below are the performance graphs for CouchDB trunk running on a CentOS virtual machine. I’m using Python 2.6 with the default stdlib json library. The spidermonkey core is 1.7 (I don’t know what the status of using 1.8 with CouchDB is but as we’ll see below, this won’t improve performance too much for these tests).
These graphs show view generation time for a given number of documents in a new database. The design doc I used had two views, one does emit(doc['type'],doc), the other emit(doc['_id'], 1).
The graphs support zooming, mouseover and all kinds of flot goodness
JavaScript is the yellow line. Python is the Blue line.
This is a test of moderately sized documents, what we normally expect the size of a job or build description. Each document is identical and fairly simple with a size of ~1,588 bytes.
These documents were incredibly large, they were taken from a full fennec mochitest run. Each document is identical and while large it consists mostly of small sized JSON objects inside a much larger JSON object coming in at ~139,096 bytes.
I had also intended to chart the reduce performance with a simple sum operation but all the results were sub-second regardless of the amount of documents I threw at it with Python being only a little faster than JavaScript.
The nearly identical reduce time tells me that the actual code processing time inside the view functions are hardly different which means that the large difference in performance during view generation is most likely due to JSON serialization time. This also explains why larger documents cause an even greater difference in performance between Python and JavaScript.
Improving Performance
The Python view server is already as optimized as I can imagine for processing time inside the views. Since CouchDB doesn’t provide a way for the view server to support it’s own concurrency we’ve basically hit the wall here on what Python can provide. If we increased the complexity of the view functions I think that Python would start to show better than Spidermonkey 1.7, but 1.8 with traceing enabled would likely bridge that gap, possibly even showing JavaScript faster than Python.
The big problem is JSON serialization. We can make Python faster by compiling simplejson with C speedups. But using the C based JSON parser in newer versions of Spidermonkey requires some other changes to CouchDB since there are differences in the encoding of undefined.
At the end of the day though, this all looks great. CouchDB trunk (pre-0.11) is going to run fast enough for what we need right now even on a virtual machine and if we start to see view generation bottlenecks on views that aren’t hit as often and have to update a large number of documents we can just move those views to Python and the performance should be back down to sub-second.










Mikeal,
Great graphs. I like up and to the right!
I’m curious what your CouchDB version is for this. Damien *just* landed some optimizations to the writer which may or may not effect views. Also there was that big 5x speedup from September.
Hey, just looked at your python view library, looks great. I am yet to use CouchDB but I am planning on using it as a document storage backend for my new CherryPy based website.
Interesting comparison.
It would be worth you implementing the same views in Erlang to get “native” couchdb performance.
Neville
[...] #CouchDB !CouchDB View Performance (#Python vs #JavaScript) http://www.mikealrogers.com/archives/673 [...]
[...] Via Michael Rogers: “CouchDB View Performance (Python vs JavaScript)”. [...]
[...] Via Michael Rogers: “CouchDB View Performance (Python vs JavaScript)”. [...]
Seconding @neville here. I’d like to see Erlang views for a comparison. And Spidermonkey 1.8.1 would be nice, too.
I’m curious about SquirrelFish or V8 performance. V8 in particular has a lot of optimisations for using JS as a general-purpose language.
Also, how about psyco with that python view server? I’m curious if it’s worth the memory hit.
@jan I think it goes without saying that the erlang views would be drastically faster. Since the biggest performance difference appears to be JSON serialization that entire step would be cut out in the native views.
@ Lucian processing time inside the views is almost indistinguishable in these tests and accounts for an incredibly minimal part of the total time. If i were to do a “mega view” test with a lot of logic in it we would see more of a difference. While SquirrelFish and V8 have some good performance work the traceing work in Spidermonkey 1.8 (AKA TraceMonkey) fits the expected usage profile for views a lot closer and I would expect it to beat out any competitors since it’s optimized for almost this *exact* use case.
What could increase performance more than anything else is a faster JSON serializer and I don’t know if SquirrelFish and V8 have optimized C JSON serializers available.
@jchris CouchDB version info
# git rev-list –max-count=1 HEAD
12aa5125b2b774147d6106577075d8f1b0f650be
Which appears to be about 3 days old:
http://github.com/mikeal/couchdb/commit/12aa5125b2b774147d6106577075d8f1b0f650be
As far as JS views are concerned, do you have any idea whether CouchDB is using the trace-based nanoJIT in SpiderMonkey? If not, that could be an interesting avenue to explore…
Oh, hey, I see you already mentioned that in a previous comment. Never mind!
I do think this is a most incredible website for proclaiming great wonders of Our God!