Moving on

Posted by on January 2, 2010

The new year is bringing some big changes for me. A few weeks back I accepted a position at Relaxed Inc. and notified Mozilla that I would be leaving at the end of the year.

Mozilla

I started working at Mozilla 2 years ago. I started the day after my employment at the Open Source Applications Foundation ended. At this point I already took for granted some of the best parts of working at Mozilla; working for a public benefit organization, spending 100% of my time working on Open Source, working with very smart people in the open (lists, IRC, etc.).

But Mozilla is even more than all that. Succeeding at Mozilla means something more than a pat on the back and a good end of the year review. When you succeed at Mozilla you impact one of the most important products on the internet. You reach hundreds of millions of users and contribute to keeping the web an open and free (as in speech) world. There is no other place in the world you can work where you can conceivably have this kind of impact.

Mozilla as an organization is truly unique. Last year was the hardest I’ve ever had, i suffered a huge loss in my personal life and Mozilla was as supportive during this time as any of my friends or family. There are a lot of places that let you put so much of yourself in to the organization to help it attain it’s goals but there are only a handful that are there to support you when you need it.

Relaxed Inc.

I started using CouchDB in 2008 after a great talk by Jan Lehnardt at OSCON. I started using it right away and over the next year it re-shaped how I think about web development and applications. In the last 6 months my group at Mozilla has become a heavy CouchDB user and not just because of my own interest but because CouchDB was the only solution for some of the harder problems we needed to solve with our results storage.

As I’ve used CouchDB more and more and become a part of the CouchDB community I’ve had the pleasure of knowing some of the core contributors, three of which have decided to found a new startup around CouchDB; Jan Lehnardt, J Chris Anderson, and the creator of CouchDB Damien Katz. Shorty after they received their funding they made an offer. It’s an amazing opportunity and while the decision to leave Mozilla is one of the hardest I’ve ever had to make I’m very excited about my future at Relaxed.

The Future

I’m really looking forward to working with everyone at Relaxed. It’s an exciting time and I’m not 100% sure yet which projects I currently work on that I will still have time to maintain. In the next week or so I’ll be doing a blog post on all the libraries I currently work on and maintain (it’s a long list) and what their status is moving forward. I still maintain code I wrote long before I worked at Mozilla and have every intention of continuing to work on some of the projects I started at Mozilla.

One thing is certain. I’m not the guy who figures out how to test the browser any more. Windmill and Mozmill are important projects that I have every intention of supporting by making time for code reviews and community support but I won’t be available to put time in to new feature work and refactoring like I have in the past. Luckily there are solid communities behind both of these projects and I’m confident that there are people who can continue to drive them in the future.

I don’t know what is going to happen next, all I know is that it should be fun, it won’t be like anything I’ve done before, and will certainly continue to include lots JavaScript and Python.

For everyone who depends on me and the code I’ve written over the last few years I’ll be sure to keep you all up to date. And one thing I can promise is that if you want to fix anything in one my projects, fork it on github and send me a pull request and I will always find time to look at it :)

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Mixx
  • Google Bookmarks
  • StumbleUpon
  • Technorati
  • Reddit
  • Slashdot
  • Twitter

Hosting?

Posted by on November 29, 2009

I’m starting to work on a simple blog to replace this WordPress instance.

I’ve had a great run with WordPress but I have a few ideas I want to experiment with and I also want to dogfood couchdb-pythonviews a little more.

This blog is hosted on Dreamhost. Dreamhost has been a great host for a low impact blog, the uptime hasn’t been 100% but all the maintenance has been easy and it’s also remained dirt cheap for the last few years.

I need to find a new hosting provider. I have one dedicated server but I don’t plan on running a blog there because that server is a little busy.

I need something cheap. I need root (or some kind of sudo jail) where I can run CouchDB and nginx and manage Python. Preferably Debian. Definitely Linux. Decent uptime.

I’ve considered EC2 but for a low impact site it’s actually quite expensive (~30 dollars a month before bandwidth) and the performance I’m told is about 5x slower than a Macbook.

Backups aren’t necessary since I have CouchDB replication for backing up all the important bits.

I’m open to any and all suggestions.

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Mixx
  • Google Bookmarks
  • StumbleUpon
  • Technorati
  • Reddit
  • Slashdot
  • Twitter

JSON Performance in Python

Posted by on November 20, 2009

In part of my ongoing performance work in our CouchDB+Python application I’ve decided to sit down and profile JSON performance in the different open source libraries available for Python.

I ran this test profiling json (pure Python simplejson) available in Python stdlib, simplejson compiled with C speedups, cjson, and jsonlib2, with a large JSON document. The test decodes and encodes a large JSON object 100 times. It then runs that test 100 times in each library in succession in order to find the average encode/decode time for each library and minimize other environmental factors that may occur. These numbers were taken on my MacBook Air running Mac OS X 1.6.1 with the default Python 2.6.

The time represents in milliseconds how long it takes to encode/decode this JSON object 100 times.

JSONPerf

I honestly didn’t expect the stdlib json to be this far behind.

Among the other C based libraries there isn’t a clear winner. cjson is the best decoder but the slowest encoder, simplejson compiled with C speedups is the fastest encoder but the slowest decoder while jsonlib2 is somewhere in the middle for both cases.

Also, annoyingly, cjson doesn’t implement the same API as the other libraries (dump and load functions are named encode and decode) making it much more difficult for a library to include support for all available libraries. Now rather than just being able to add a user defined json module I’ll need to add support for user defined parsing and encoding functions to couchdb-pythonviews, couchquery, and couchdb-wsgi.

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Mixx
  • Google Bookmarks
  • StumbleUpon
  • Technorati
  • Reddit
  • Slashdot
  • Twitter

CouchDB View Performance (Python vs JavaScript)

Posted by on November 4, 2009

We’re gearing up for some heavy CouchDB usage in a new automation system and it has fallen upon me to do some performance benchmarking.

The most important thing for us to figure out was whether or not the CentOS virtual machine we’re currently running CouchDB on is going to be enough even in the short term. Until today we’ve been running 0.9 and have encountered performance problems.

Our main bottleneck is, and has always been, view generation and update performance. We tend to have medium to large size documents (jobs are relatively small but results from test runs can be incredibly large).

View generation of large documents has typically been our biggest issue which we have previously mitigated by refreshing all views after any large write but that isn’t going to work for the amount of results that we plan on pouring in to the new system.

Last weekend I wrote a Python view server for CouchDB. couchdb-python includes a view server but in the past I’ve heard complaints about performance (although none recently). In addition, the view server in couchdb-python only supports map and reduce, which is only about 1/5 of the current view server spec which includes handlers for update, show, list, filter, and validate which provide the groundwork for CouchDB as an application platform. As of Sunday my view server passes all of the current CouchDB spec and initial performance tests showed it faster than the JavaScript view server.

Below are the performance graphs for CouchDB trunk running on a CentOS virtual machine. I’m using Python 2.6 with the default stdlib json library. The spidermonkey core is 1.7 (I don’t know what the status of using 1.8 with CouchDB is but as we’ll see below, this won’t improve performance too much for these tests).

These graphs show view generation time for a given number of documents in a new database. The design doc I used had two views, one does emit(doc['type'],doc), the other emit(doc['_id'], 1).

The graphs support zooming, mouseover and all kinds of flot goodness :)

JavaScript is the yellow line. Python is the Blue line.

This is a test of moderately sized documents, what we normally expect the size of a job or build description. Each document is identical and fairly simple with a size of ~1,588 bytes.

These documents were incredibly large, they were taken from a full fennec mochitest run. Each document is identical and while large it consists mostly of small sized JSON objects inside a much larger JSON object coming in at ~139,096 bytes.

I had also intended to chart the reduce performance with a simple sum operation but all the results were sub-second regardless of the amount of documents I threw at it with Python being only a little faster than JavaScript.

The nearly identical reduce time tells me that the actual code processing time inside the view functions are hardly different which means that the large difference in performance during view generation is most likely due to JSON serialization time. This also explains why larger documents cause an even greater difference in performance between Python and JavaScript.

Improving Performance

The Python view server is already as optimized as I can imagine for processing time inside the views. Since CouchDB doesn’t provide a way for the view server to support it’s own concurrency we’ve basically hit the wall here on what Python can provide. If we increased the complexity of the view functions I think that Python would start to show better than Spidermonkey 1.7, but 1.8 with traceing enabled would likely bridge that gap, possibly even showing JavaScript faster than Python.

The big problem is JSON serialization. We can make Python faster by compiling simplejson with C speedups. But using the C based JSON parser in newer versions of Spidermonkey requires some other changes to CouchDB since there are differences in the encoding of undefined.

At the end of the day though, this all looks great. CouchDB trunk (pre-0.11) is going to run fast enough for what we need right now even on a virtual machine and if we start to see view generation bottlenecks on views that aren’t hit as often and have to update a large number of documents we can just move those views to Python and the performance should be back down to sub-second.

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Mixx
  • Google Bookmarks
  • StumbleUpon
  • Technorati
  • Reddit
  • Slashdot
  • Twitter

Introducing… couchdb-wsgi

Posted by on October 28, 2009

Last weekend I put together some pretty useful code that converts [CouchDB's external process](http://wiki.apache.org/couchdb/ExternalProcesses) JSON request/responses to a WSGI compliant interface.

This means you should be able to run any modern Python web framework in an external process :)

The simplest example:

#!/usr/bin/python
import couchdb_wsgi
 
def application(environ, start_response):
    start_response('200 Ok', [('content-type', 'text/plain')])
    return ['Hello World']
 
couchdb_wsgi.CouchDBWSGIHandler(application).run()

But a far more interesting example is running a django app :)

#!/usr/bin/python
import os, sys
import couchdb_wsgi
 
django_project = os.path.join(os.path.dirname(__file__), 'mysite')
sys.path.append(django_project)
os.environ['DJANGO_SETTINGS_MODULE'] = 'mysite.settings'
 
import django.core.handlers.wsgi
 
application = django.core.handlers.wsgi.WSGIHandler()
 
couchdb_wsgi.CouchDBWSGIHandler(application).run()

All the code is [up on github](http://github.com/mikeal/couchdb-wsgi) and I’ve written up some solid [Sphinx docs that are up on gh-pages](http://mikeal.github.com/couchdb-wsgi/). I also pushed an [initial release to PyPI](http://pypi.python.org/pypi/couchdb-wsgi).

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Mixx
  • Google Bookmarks
  • StumbleUpon
  • Technorati
  • Reddit
  • Slashdot
  • Twitter

Mutual Benefit

Posted by on August 5, 2009

I have started objecting to the description of a contributor as a “volunteer”.

Volunteers are people who give their time/effort to an institution or group at some cost to them for little benefit in return. They usually do this as a labor of love, and therefor the argument can be made they receive some kind of emotional satisfaction from the exchange but it’s fundamentally categorized as a one way exchange where the volunteer is **giving** and the institution or group is **taking**.

Open source contributions are not market transactions. In a market there is a producer and a consumer, the transactions between them are to the benefit of either the producer or the consumer or both. Producer makes something, consumer evaluates the product and decides to give capital to the producer. Volunteer is a term used to describe an actor that is working for the benefit of the producer without reciprocal benefit to themselves to the extent that they benefit the producer.

Capital is the driving force in a market, it’s what enables the transactions. Workers are paid for contributions to product so that institutions can enable transactions where consumers are given a product in return for more capital. Capital is certainly a factor in open source, but it’s auxiliary. Capital may drive one side of the transaction, an actor paid to contribute to a product or consumption of a product used by the consumer to generate capital, but it does not drive each side of the transaction.

Contributors are not driven by a need to benefit a particular producer and are rarely driven by capital. In fact I can’t think of a way to describe open source contributions in terms of a market. In open source there is the **product**. The product exists almost as it’s own entity outside of the producers that created it or the consumers that use it. Because if it’s transparency and it’s ease of access and manipulation it cannot be viewed as a unit in a transaction within a market. Instead, all interactions need to be described in **relation to the product**. Contributors, institutions and individuals, that take part in production and those that take part in consumption take part in transactions with the **product**. This is an open source community, a group of actors taking part in transactions with a product.

The communities that thrive are the ones that remove barriers to these transactions and create tools that enable new transactions for more diverse contributions to the product. The transactions are not one-way, each transaction is two-way, benefiting both the product and the actor.

Rather than capital, mutual benefit seems to drive open source transactions. The product and the actor benefit from every transaction, with only a small portion of those transactions seeing capital as the benefit. The notion of a volunteer simply doesn’t exist in this model because there are rarely, if ever, transactions that only benefit a product at a cost to the actor. Actors **must** to be motivated and products are not “owned” in the traditional sense of ownership since their production is taken on by a community motivated by mutual benefit which tears down the relationship traditional market producers have with products.

Tools that are built to enable one sided transactions to a product usually fail because actors aren’t motivated by transactions that aren’t mutually beneficial.

A quick look at Firefox shows a very broad and diverse number of tools that enable mutually beneficial transactions. Although we often think about the new kinds of contributions that additions to Firefox itself will enable like [Personas](http://www.getpersonas.com/) or [Jetpack](https://jetpack.mozillalabs.com/) we also have a variety of tools that enable non-code contributions. Everything from that little button that reports a crash (benefits the product’s stability and improves your browser experience) to [SUMO](http://support.mozilla.com/en-US/kb/) (users seek resolution to support issues while providing the product with immeasurable usage feedback and bugs) are examples of tools that enable new transactions with the product that are mutually beneficial to the product and the actor.

One of my favorite things about working at Mozilla is being able to think about new kinds of contributions and how to enable them. There are few products with such a large and diverse ecosystem of users so the opportunities for new contribution is uniquely large so long as we create tools that are mutually beneficial.

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Mixx
  • Google Bookmarks
  • StumbleUpon
  • Technorati
  • Reddit
  • Slashdot
  • Twitter

Economics of web fonts

Posted by on July 21, 2009

Let’s say there is this market called “web content creators” and we are trying to sell them a product, a font. We want to sell them something that is going help make their web content a little better.

A really good way to sell them something would be to compare the dollar value of our product to the increased dollar value that will be added to their final product. Damn, their final product is free. Scratch that.

Since there is some labor and tools cost to creating their product we could compare our superior *professionally* created product to all the other tools they are using. Hrm…. it turns out all the tools and technologies used to create web content are pretty impressive and at a very low cost if not completely free.

Alright, let’s just hope these guys really like fonts. Let’s research the price of fonts of comparative quality and price our fonts below theirs. Uh oh, it turns out there are a lot of great quality free fonts to choose from already and it’s increasing every day.

In this market creating a font once and selling it a number of times does not seem like a smart business model. Maybe an alternative business model is more suited to this market.

There **are** other business models for individual font creators. The amount of people that **can** use newly created fonts is now increasing exponentially. Surely the creation of new custom fonts will still get some business as long as you’re willing to give them the font in an open format they can use anywhere.

Maybe the revenues aren’t enough to sustain an office and a half a dozen employees but it’s certainly enough for freelance font creators. I know a few web content creators that would love a one-on-one relationship with a font author and would pay a decent stipend for a custom tailored font.

Font creators will be fine. Font creation is sure to increase. But maybe the institutions that used to house font creation, “the foundries”, don’t have a sustainable model in this market.

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Mixx
  • Google Bookmarks
  • StumbleUpon
  • Technorati
  • Reddit
  • Slashdot
  • Twitter

Dead Font Walkin’

Posted by on July 20, 2009

I’m about to go on a tear so it’s worth saying that all of what I’m writing are my own opinions and in no way whatsoever reflect the opinions or policies of my employer.

Any new technology can have a side effect of making an entire industry irrelevant. Every time an industry is on the brink they try to plead for their own survival. Their tone and message is predictable, “you’ll miss our profession when it’s gone”. But of course, the profession never disappears only the institutions that used to enable it which are no longer relevant. These Institutions cannot survive in a new world so they scream to save their business model and claim that their profession is what is at risk.

I love journalism, and music, and I’m a total font geek but I just can’t stand newspapers, record labels, and these god forsaken font ["foundries"](http://blog.mozilla.com/rob-sayre/2009/07/19/broken-record/).

Let me get this straight. Your business model is to create fonts, **once**, and then license them on a per-use basis. I can’t be the first one to tell you **THAT ISN’T GONNA FUCKING WORK ANYMORE**.

@font-face has the side effect of invalidating your whole business model by driving demand for ubiquitous free and open fonts. I’m sorry. I wish it didn’t have to be this way. The world is cruel, and you’re fucked.

Print publications bought your fonts because they had fairly high production costs already and adding a little on top wasn’t going to break them. But this is the web, content is ubiquitous and free so the tools to create it have to be ubiquitous and free as well.

To steal a little from [Clay Shirky](http://www.shirky.com/), your industry is now going to suffer from mass-amatuerization. Font creation will not die with the foundries, there is going to be more font creation than there ever has been in human history, it’s just going to be open and free. The cost of producing a font is nearly zero. There is labor involved but the tools are ubiquitous and mostly free while distribution is a non-issue. Now that **anyone** can create a font and quality is determined solely by the creators talent, and we can actually **use** open fonts in the web content we create, it is safe to expect an explosion in the creation of fonts.

There is no technology that can save your business model because it pre-dates the web. The web changed the world you live in and you don’t get to change it back. If you don’t believe me you should have a chat with some former stock photographers.

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Mixx
  • Google Bookmarks
  • StumbleUpon
  • Technorati
  • Reddit
  • Slashdot
  • Twitter

GitHub is the winner

Posted by on July 20, 2009

I’m not lucky enough to get to choose one source control manager and use it exclusively. On a daily basis I use git, svn, and hg. Every week or so I also use bzr. Luckily, I no longer have to touch darcs.

I haven’t dug in to the internals of these tools enough to say which one has the superior technical merits although I will say that I’ve never seen a git conflict resolution interface even across unbelievably hairy merges.

I write a lot of small libraries and a couple big ones. I care far more about the social effects and contribution workflows a tool provides than any other features. There are different public web applications that try and provide infrastructure for the social effects of DCVS and after months of working with different approaches I have to say that GitHub is the winner by a mile.

At the end of the day there are two factors that make GitHub such a clear winner. The first is zero friction publishing. The second is the democratizing effect of scraping any notion of a “central” repository.

Nearly a year ago i hit the Google Code project limit and had to call in some favors to get the limit pushed up for my account. I have to push lots of small libraries so having a simple and seamless publishing of repositories has made my life much easier. The fact that I can just push my repository and worry about turning that repo on GitHub into a “project” later, instead of the other way around, means that I have no reason **not** to publish every little thing I do.

The second and more controversial feature of GitHub, and possibly of git itself, is that there is never a clear central repository. There is my repository, and your repository, and every other **person’s** repository. This throws off FLOSS projects that have always relied on a “committer” hierarchy to manage the influx of work in to a project. Nearly every book on community driven open source focuses on the creation of a class of contributors with special write permissions to the repository. There has been a huge discussion on how to translate that process to DCVS and some tools, in particular hg, make it fairly easy to simulate older workflows with a central repository.

After living with GitHub for a while and seeing the potential for new collaboration I think the answer to translating the “committer” model to DCVS is to **not translate it at all**. GitHub makes **everyone** a committer and that enables a new class of contribution that the old model totally excluded.

Since code can travel seamlessly through different developer’s repositories each change takes on a life of it’s own. People who made what they thought were small changes for their own personal use easily share them with other developers and those changes can move around repositories hopefully making it in to an official release. New contributors don’t have to worry about this giant wall of process behind getting a patch in, they simply write the patch and push it, send pull requests to other relevant contributors and module owners eventually getting those changes pushed up in to the repository that gets packaged and distributed.

Someone is always going to be responsible for releasing a product, someone owns the keys to the distribution mechanisms, so I find the notion that some amount of authority over the project’s direction is lost by not centralizing the repository to be exaggerated. Although there is some authority that is lost to the previously defined class of committers the democratization of write permissions encourages a bigger class of lost contribution that is excluded by the laborious process of patches in bugs and the required upstream process to get the work committed. This also means that a number of contributors can live with changesets for an extended period of time before they get packaged in a release which increases confidence in large changesets that many projects reject outright for fear of instability.

GitHub solves the social problems of open source collaboration by taking a much more anarchist approach to the contribution process and while this is certainly shaking the foundation of traditional contribution models I’m loving it :)

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Mixx
  • Google Bookmarks
  • StumbleUpon
  • Technorati
  • Reddit
  • Slashdot
  • Twitter

Duke Nukem OS

Posted by on July 15, 2009

We have a lot of great operating systems out there but they were all created before the web and high performance 3D gaming. To try and move technology forward I am announcing the Duke Nukem Operating System.

Duke Nukem OS is a lightweight high performance open source operating system. The first release, codename “Forever”, is due out in late 2010. A promotional comic book will accompany the release and to insure it is available by the delivery date I plan to hire [Kevin Smith](http://en.wikipedia.org/wiki/Kevin_Smith) and [Alex Ross](http://en.wikipedia.org/wiki/Alex_Ross) to write and illustrate it.

Although no screenshots, specifications, or source code will be available until release consider this a call to action for the open source community to get involved in the project.

This new operating system will be built on the Linux kernel but I will be throwing out the bloated window managers Linux is currently known for and building a next generating interface. I’ll be using XULRunner, the Mozilla runtime used to create Firefox, as the basis for this window manager but the tools to develop applications are exclusively web3.0 “semantic web” standards. [RDFa](http://www.w3.org/TR/xhtml-rdfa-primer/), a next generation semantic markup language known for it’s simplicity and rapid rate of adoption, will be the primary language for building in this next generation operating system.

More updates in the future.

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Mixx
  • Google Bookmarks
  • StumbleUpon
  • Technorati
  • Reddit
  • Slashdot
  • Twitter