Category: CouchDB

Atomic Increments in CouchDB

Posted by on February 24, 2010

This question has come up a few times; “How do I do atomic increments with CouchDB”.

The existing document update semantics aren’t intended for operations that work on the “current” document but instead on a specific document revision in order to avoid conflicts. However, update functions always get the current document along with the request object which allows you to do update operations without any specific rev.

The update function below should increase or decrease any integer value in any document given just an array of properties that resolve the proper attribute in the document.

function (doc, request) {
  var a = JSON.parse(request.body);
  var attr = doc;
  if (typeof a.lookup == "string") { last = a.lookup }
  else {
    var last = a.lookup.pop();
    a.lookup.forEach(function (name) { attr = attr[name]; })
  }
  if (a.increase) { attr[last] + a.increase }
  if (a.decrease) { attr[last] + a.decrease }
  return [doc, "Updated, new value is "+attr[last] ]
}

With this function you can easily increase or decrease any value using the HTTP API for this update function.

PUT /dbname/_design/designdoc/_show/atomic/docid
{”lookup”:”name”,”increase”:1}

PUT /dbname/_design/designdoc/_show/atomic/docid
{”lookup”:["attr1","attr2"],”decrease”:1}

For more on update functions check out the wiki page

Enjoy!

ch-ch-ch-changes!

Posted by on February 8, 2010

One of the best features in CouchDB is the change notifications.

Basically, it’s a Comet style push service that informs you of updates to a database. This is great when you’re building Browser apps because you get Comet for free!

However, one thing that’s missing is being able to write some code in your design document that consumes these changes on the backend and doesn’t depend on another active client. To remedy this situation I decided to write a generic change consumer that you could point at CouchDB and it would find any change handlers in any of the design documents and keep them running, consuming changes, and removing or replacing them when the design document changed.

The best part is that it’s written in node.js, which means you can also make HTTP connections out to the rest of the world and put data back in to CouchDB. And since it’s a service that is intended to stay up indefinitely with only one instance, you can also use setInterval to take care of tasks you would have previously put in cron :)

All the code is over on github.

To start:

node service http://localhost:5984

To define a listener in your design document.

{ ....

"changes":"var sys = require('sys');
           var listener = function (change) {
             sys.puts(JSON.stringify(change));
           }
           exports.listener = listener;",
}

Moving on

Posted by on January 2, 2010

The new year is bringing some big changes for me. A few weeks back I accepted a position at Relaxed Inc. and notified Mozilla that I would be leaving at the end of the year.

Mozilla

I started working at Mozilla 2 years ago. I started the day after my employment at the Open Source Applications Foundation ended. At this point I already took for granted some of the best parts of working at Mozilla; working for a public benefit organization, spending 100% of my time working on Open Source, working with very smart people in the open (lists, IRC, etc.).

But Mozilla is even more than all that. Succeeding at Mozilla means something more than a pat on the back and a good end of the year review. When you succeed at Mozilla you impact one of the most important products on the internet. You reach hundreds of millions of users and contribute to keeping the web an open and free (as in speech) world. There is no other place in the world you can work where you can conceivably have this kind of impact.

Mozilla as an organization is truly unique. Last year was the hardest I’ve ever had, i suffered a huge loss in my personal life and Mozilla was as supportive during this time as any of my friends or family. There are a lot of places that let you put so much of yourself in to the organization to help it attain it’s goals but there are only a handful that are there to support you when you need it.

Relaxed Inc.

I started using CouchDB in 2008 after a great talk by Jan Lehnardt at OSCON. I started using it right away and over the next year it re-shaped how I think about web development and applications. In the last 6 months my group at Mozilla has become a heavy CouchDB user and not just because of my own interest but because CouchDB was the only solution for some of the harder problems we needed to solve with our results storage.

As I’ve used CouchDB more and more and become a part of the CouchDB community I’ve had the pleasure of knowing some of the core contributors, three of which have decided to found a new startup around CouchDB; Jan Lehnardt, J Chris Anderson, and the creator of CouchDB Damien Katz. Shorty after they received their funding they made an offer. It’s an amazing opportunity and while the decision to leave Mozilla is one of the hardest I’ve ever had to make I’m very excited about my future at Relaxed.

The Future

I’m really looking forward to working with everyone at Relaxed. It’s an exciting time and I’m not 100% sure yet which projects I currently work on that I will still have time to maintain. In the next week or so I’ll be doing a blog post on all the libraries I currently work on and maintain (it’s a long list) and what their status is moving forward. I still maintain code I wrote long before I worked at Mozilla and have every intention of continuing to work on some of the projects I started at Mozilla.

One thing is certain. I’m not the guy who figures out how to test the browser any more. Windmill and Mozmill are important projects that I have every intention of supporting by making time for code reviews and community support but I won’t be available to put time in to new feature work and refactoring like I have in the past. Luckily there are solid communities behind both of these projects and I’m confident that there are people who can continue to drive them in the future.

I don’t know what is going to happen next, all I know is that it should be fun, it won’t be like anything I’ve done before, and will certainly continue to include lots JavaScript and Python.

For everyone who depends on me and the code I’ve written over the last few years I’ll be sure to keep you all up to date. And one thing I can promise is that if you want to fix anything in one my projects, fork it on github and send me a pull request and I will always find time to look at it :)

Hosting?

Posted by on November 29, 2009

I’m starting to work on a simple blog to replace this WordPress instance.

I’ve had a great run with WordPress but I have a few ideas I want to experiment with and I also want to dogfood couchdb-pythonviews a little more.

This blog is hosted on Dreamhost. Dreamhost has been a great host for a low impact blog, the uptime hasn’t been 100% but all the maintenance has been easy and it’s also remained dirt cheap for the last few years.

I need to find a new hosting provider. I have one dedicated server but I don’t plan on running a blog there because that server is a little busy.

I need something cheap. I need root (or some kind of sudo jail) where I can run CouchDB and nginx and manage Python. Preferably Debian. Definitely Linux. Decent uptime.

I’ve considered EC2 but for a low impact site it’s actually quite expensive (~30 dollars a month before bandwidth) and the performance I’m told is about 5x slower than a Macbook.

Backups aren’t necessary since I have CouchDB replication for backing up all the important bits.

I’m open to any and all suggestions.

JSON Performance in Python

Posted by on November 20, 2009

In part of my ongoing performance work in our CouchDB+Python application I’ve decided to sit down and profile JSON performance in the different open source libraries available for Python.

I ran this test profiling json (pure Python simplejson) available in Python stdlib, simplejson compiled with C speedups, cjson, and jsonlib2, with a large JSON document. The test decodes and encodes a large JSON object 100 times. It then runs that test 100 times in each library in succession in order to find the average encode/decode time for each library and minimize other environmental factors that may occur. These numbers were taken on my MacBook Air running Mac OS X 1.6.1 with the default Python 2.6.

The time represents in milliseconds how long it takes to encode/decode this JSON object 100 times.

JSONPerf

I honestly didn’t expect the stdlib json to be this far behind.

Among the other C based libraries there isn’t a clear winner. cjson is the best decoder but the slowest encoder, simplejson compiled with C speedups is the fastest encoder but the slowest decoder while jsonlib2 is somewhere in the middle for both cases.

Also, annoyingly, cjson doesn’t implement the same API as the other libraries (dump and load functions are named encode and decode) making it much more difficult for a library to include support for all available libraries. Now rather than just being able to add a user defined json module I’ll need to add support for user defined parsing and encoding functions to couchdb-pythonviews, couchquery, and couchdb-wsgi.

CouchDB View Performance (Python vs JavaScript)

Posted by on November 4, 2009

We’re gearing up for some heavy CouchDB usage in a new automation system and it has fallen upon me to do some performance benchmarking.

The most important thing for us to figure out was whether or not the CentOS virtual machine we’re currently running CouchDB on is going to be enough even in the short term. Until today we’ve been running 0.9 and have encountered performance problems.

Our main bottleneck is, and has always been, view generation and update performance. We tend to have medium to large size documents (jobs are relatively small but results from test runs can be incredibly large).

View generation of large documents has typically been our biggest issue which we have previously mitigated by refreshing all views after any large write but that isn’t going to work for the amount of results that we plan on pouring in to the new system.

Last weekend I wrote a Python view server for CouchDB. couchdb-python includes a view server but in the past I’ve heard complaints about performance (although none recently). In addition, the view server in couchdb-python only supports map and reduce, which is only about 1/5 of the current view server spec which includes handlers for update, show, list, filter, and validate which provide the groundwork for CouchDB as an application platform. As of Sunday my view server passes all of the current CouchDB spec and initial performance tests showed it faster than the JavaScript view server.

Below are the performance graphs for CouchDB trunk running on a CentOS virtual machine. I’m using Python 2.6 with the default stdlib json library. The spidermonkey core is 1.7 (I don’t know what the status of using 1.8 with CouchDB is but as we’ll see below, this won’t improve performance too much for these tests).

These graphs show view generation time for a given number of documents in a new database. The design doc I used had two views, one does emit(doc['type'],doc), the other emit(doc['_id'], 1).

The graphs support zooming, mouseover and all kinds of flot goodness :)

JavaScript is the yellow line. Python is the Blue line.

This is a test of moderately sized documents, what we normally expect the size of a job or build description. Each document is identical and fairly simple with a size of ~1,588 bytes.

These documents were incredibly large, they were taken from a full fennec mochitest run. Each document is identical and while large it consists mostly of small sized JSON objects inside a much larger JSON object coming in at ~139,096 bytes.

I had also intended to chart the reduce performance with a simple sum operation but all the results were sub-second regardless of the amount of documents I threw at it with Python being only a little faster than JavaScript.

The nearly identical reduce time tells me that the actual code processing time inside the view functions are hardly different which means that the large difference in performance during view generation is most likely due to JSON serialization time. This also explains why larger documents cause an even greater difference in performance between Python and JavaScript.

Improving Performance

The Python view server is already as optimized as I can imagine for processing time inside the views. Since CouchDB doesn’t provide a way for the view server to support it’s own concurrency we’ve basically hit the wall here on what Python can provide. If we increased the complexity of the view functions I think that Python would start to show better than Spidermonkey 1.7, but 1.8 with traceing enabled would likely bridge that gap, possibly even showing JavaScript faster than Python.

The big problem is JSON serialization. We can make Python faster by compiling simplejson with C speedups. But using the C based JSON parser in newer versions of Spidermonkey requires some other changes to CouchDB since there are differences in the encoding of undefined.

At the end of the day though, this all looks great. CouchDB trunk (pre-0.11) is going to run fast enough for what we need right now even on a virtual machine and if we start to see view generation bottlenecks on views that aren’t hit as often and have to update a large number of documents we can just move those views to Python and the performance should be back down to sub-second.

Introducing… couchdb-wsgi

Posted by on October 28, 2009

Last weekend I put together some pretty useful code that converts [CouchDB's external process](http://wiki.apache.org/couchdb/ExternalProcesses) JSON request/responses to a WSGI compliant interface.

This means you should be able to run any modern Python web framework in an external process :)

The simplest example:

#!/usr/bin/python
import couchdb_wsgi
 
def application(environ, start_response):
    start_response('200 Ok', [('content-type', 'text/plain')])
    return ['Hello World']
 
couchdb_wsgi.CouchDBWSGIHandler(application).run()

But a far more interesting example is running a django app :)

#!/usr/bin/python
import os, sys
import couchdb_wsgi
 
django_project = os.path.join(os.path.dirname(__file__), 'mysite')
sys.path.append(django_project)
os.environ['DJANGO_SETTINGS_MODULE'] = 'mysite.settings'
 
import django.core.handlers.wsgi
 
application = django.core.handlers.wsgi.WSGIHandler()
 
couchdb_wsgi.CouchDBWSGIHandler(application).run()

All the code is [up on github](http://github.com/mikeal/couchdb-wsgi) and I’ve written up some solid [Sphinx docs that are up on gh-pages](http://mikeal.github.com/couchdb-wsgi/). I also pushed an [initial release to PyPI](http://pypi.python.org/pypi/couchdb-wsgi).

Up for a Pint?

Posted by on July 2, 2009

I’m in London for the next few days and would love to grab a drink with any community members be you Mozilla, CouchDB, Python, Windmill, JavaScript or just plain old coffee, whisky or beer geeks :)

Heading to EuroPython

Posted by on June 26, 2009

I’m getting all packed up and leaving Sunday for [EuroPython](http://www.europython.eu/) in Birmingham, UK.

This will be my first time at EuroPython and my first time in Europe!

I’ll be giving two talks, one on [Windmill](http://www.getwindmill.com) and one about [CouchDB](http://couchdb.apache.org/) and Python. The Windmill talk will be more or less the talk that I gave at [Open Source Bridge](http://opensourcebridge.org/) last week, which went very well. This is the first time I’ll be talking about CouchDB, the most exciting new technology on the web. The talk will mostly be about breaking our old data modeling habits that we developed to deal with SQL and what libraries and tools are available for interacting with CouchDB in Python.

I will also be in London for a few extra days after the conference so anyone interested in a meetup should ping me.

RiP: Annotations Remix

Posted by on May 13, 2009

I had some fun this weekend with Python, <video>, CouchDB and Brett Gaylor’s [RiP: A Remix Manifesto](http://www.ripremix.com). In just a few hours I was able to crank out a little [annotations remix](http://ripannotations.pythonesque.org) which allows anyone to add annotations to the film that are displayed as people view it.

I’m hosting it on my little mac mini (currently hidden in a data-center) so hopefully it doesn’t fall over pushing so much video :)

I’ve posted all the code up on [github](http://github.com/mikeal/ripannotations/tree/master). The more I use <video> and CouchDB the more excited I get about the future of web applications. This entire project was done in little chunks of spare time over the weekend and most of that was me messing around with styling. To get the data stored, queried, and displayed took less than 2 hours.

Hope you all enjoy the annotations remix and if you haven’t already go and pay what you want for a terrific copy of [RiP: A Remix Manifesto](http://www.ripremix.com). It’s worth it.