Sunday, August 15

Geek

Desperately Seeking The Last Six Months

About six months ago, I was looking for a database that didn't suck.

At the time, there weren't any, though some showed promise.  Redis, for example, had a great feature set but was limited to in-memory databases.

This has now been fixed.

Fortunately (or... otherwise) my day job ate all my time for six months, and now that I'm finally returning to this project, Redis is ready for me.

While it's not the solution for every problem (the main data engine is single-threaded and blocking, so your overall throughput is limited by CPU speed) it now supports virtual memory and multi-threaded I/O, so it can scale to significantly larger datasets.

The big advantage of Redis is that it provides lightweight implementations of a variety of different data storage methods - not just key/value stores, but lists, hashes, sets, ordered sets, and message queues.  You can also set expiry times on entries to use it as a cache.  So with a bit of lateral thinking one server can replace a regular database like MySQL, a messaging system like ActiveMQ, a caching system like Memcached, and a document store like CouchDB.

It doesn't support strict transactions, but it has transactional features that are good enough for most tasks.  It also has a nice simple replication model.

You're not going to build the next Facebook on Redis - it simply won't scale - but if you're working on something a bit smaller (like a multi-user blogging app) it might be just what you need.

Not that I'm thinking of rewriting Minx - but the new module I'm developing is going to use Redis for most of its data, and if that experiment works well, I'll naturally look at extending it further.

Update: Huh.

So, as is my wont (and as is the wont of any programmer who wishes to retain what's left of his or her sanity) I ran a benchmark on Redis before going any further.  My own benchmark.  Redis comes with a benchmark that cheerfully reports ~100,000 reads or writes per second.  I do not trust it.

So:

[andrew@synclavier ~]$ python redistest.py
String write * 10000: 0.58s 17097.4/s
String read * 10000: 0.54s 18436.4/s
JSON write * 10000: 1.13s 8869.5/s
JSON read * 10000: 0.72s 13799.1/s
Hash write * 10000: 2.24s 4455.1/s
Hash read * 10000: 5.42s 1845.1/s

For the string benchmark, I am creating records each containing ~1kb of static text.  For the JSON benchmark, a Python dict is converted to ~1kb of JSON text.  For the hash benchmark, the same dict is stored as a Redis hash - basically, an extensible record structure.

String performance is not to bad for a single-threaded client/server code with no optimisations - approaching 20k ops/s for both reads and writes with a meaningful payload.  Good.

The JSON benchmark is reading and writing exactly the same amount of data, showing the overhead of the JSON (actually, SimpleJSON) library.  We still get over on the order of 10k ops/s.  With MySQL on the production server, that's what I see on cached queries.  So far, so good.

The hash benchmark is not so hot, particularly the reads.  An order of magnitude slow that strings; most of an order of magnitude slower than JSON for the same data.  Won't be using hashes for the high-volume stuff.

Then, curious, I tried one more thing:

Object write * 10000: 0.84s 11846.0/s
Object read * 10000: 0.54s 18473.1/s

I took that same Python dict and wrote it to Redis.  Without converting it to JSON.  Without mapping it to a hash.  Without doing anything.  And it worked.

That's kind of scary.  I'm not sure how it's doing that.

If you pass the Python Redis library something that's not a string, it turns it into a string.  No magic.

Update: Tweaked and re-run, with some interesting results:

Running native
String write * 10000: 0.66s 15075.6/s
String read * 10000: 0.52s 19051.1/s
JSON write * 10000: 1.07s 9359.8/s
JSON read * 10000: 0.72s 13905.8/s
JSON multi-read * 10000: 0.50s 19878.3/s
Hash write * 10000: 2.64s 3788.8/s
Hash read * 10000: 6.11s 1636.9/s
Pickle write * 10000: 0.88s 11335.1/s
Pickle read * 10000: 0.68s 14810.8/s

Running with Psyco
String write * 10000: 0.68s 14742.0/s
String read * 10000: 0.41s 24369.1/s
JSON write * 10000: 0.88s 11398.6/s
JSON read * 10000: 0.69s 14493.1/s
JSON multi-read * 10000: 0.39s 25882.6/s
Hash write * 10000: 2.06s 4845.8/s
Hash read * 10000: 2.66s 3752.5/s
Pickle write * 10000: 0.89s 11212.0/s
Pickle read * 10000: 0.63s 15953.6/s

With Psyco, hash write performance is up 20% and hash read performance is up over 100%.  That certainly suggests the bottleneck is not Redis itself.

Posted by: Pixy Misa at 06:00 PM | Comments (1) | Add Comment | Trackbacks (Suck)
Post contains 778 words, total size 6 kb.

1 Here is a benchmark that you can probably code pretty easily in Python / Java against any key-value database: http://docs.google.com/View?id=dd5f3337_24gcvprmcw

I have the code for an implementation in GT.M that you can use as a template (if any construct is not clear, please ask): http://docs.google.com/View?id=dd5f3337_25fv4vnnfw

Posted by: K.S. Bhaskar at Monday, August 16 2010 07:49 AM (NkkQ+)

Hide Comments | Add Comment

Comments are disabled. Post is locked.
51kb generated in CPU 0.0176, elapsed 0.1615 seconds.
56 queries taking 0.1491 seconds, 327 records returned.
Powered by Minx 1.1.6c-pink.