Monday, August 08

Geek

Is CouchDB The Anti-Redis?

By which I don't mean, since Redis is cool, CouchDB is uncool. More like is CouchDB Yuri to Redis's Kei? Uh, do they complement each other nicely?

Because it sure looks that way to me.

Inspired by this very handy comparison of some of the top NoSQL databases, I've compiled a simpler item-by-item comparison of CouchDB and Redis, and it appears to be that CouchDB is strong precisely where Redis is weak (storing large amounts of rarely-changing but heavily indexed data), and Redis is strong precisely where CouchDB is weak (storing moderate amounts of fast-changing data).

That is, CouchDB seems to make a great document store (blog posts and comments, templates, attachments), where Redis makes a great live/structured data store (recent comment lists, site stats, spam filter data, sessions, page element cache).

Redis keeps all data in memory so that you can quickly update complex data structures like sorted sets or individual hash elements, and logs updates to disk sequentially for robust, low-overhead persistence (as long as you don't need to restart often).

CouchDB uses an append-only single-file (per database) model - including both B-tree and R-tree indexes - so again, it offers very robust persistence, but will grow rapidly if you update your documents frequently.

With Redis, since the data is all in memory, you can run a snapshot at regular intervals and drop the old log files. With CouchDB you need to run a compaction process, which reads data back from disk and rewrites it, a slower process.

Redis provides simple indexes and complex structures; CouchDB provides complex indexes and simple structures.  Redis is all about live data, while CouchDB is all about storing and retrieving large numbers of documents.

Now, MongoDB offers a both a document store and high-performance update-in-place, but its persistence model is fling it at the wall and hope that it sticks, with a recovery log tacked on since 1.7. It's not intrinsically robust, you can't perform backups easily, and its write patterns aren't consumer-SSD-friendly. I do not trust MongoDB with my data.

One of the most unhappy elements of Minx is its interface with MySQL - writing the complex documents Minx generates back to SQL tables is painful. I've tried a couple of different ORMs, and they've proven so slow that they're completely impractical for production use (for me, anyway).

MongoDB offered me most of the features I needed with the API I was looking for, but it crashed unrecoverably early in testing and permanently soured me on its persistance model.

CouchDB is proving to be great for the document side of things, but less great for the non-document side. But I was looking at deploying Redis as a structured data cache, and it makes an even better partner with CouchDB than it does with MySQL.

It's really looking like I've got a winning team here.

Anyway, here's the feature matrix I mentioned:


Couchdb Redis
Written in Erlang C
License Apache BSD
Release 1.1.0, 2.0 preview 2.2.12, 2.4.0RC5
API REST Telnet-style
API Speed Slow Fast
Data JSON documents, binary attachments Text, binary, hash, list, set, sorted set
Indexes B-tree, R-tree, Full-text (with Lucene), any combination of data types via map/reduce Hash only
Queries Predefined view/list/show model, ad-hoc queries require table scans Individual keys
Storage Append-only on disk In-memory, append-only log
Updates MVCC In-place
Transactions Yes, all-or-nothing batches Yes, with conditional commands
Compaction File rewrite Snapshot
Threading Many threads Single-threaded, forks for snapshots
Multi-Core Yes No
Memory Tiny Large (all data)
SSD-Friendly Yes Yes
Robust Yes Yes
Backup Just copy the files Just copy the files
Replication Master-master, automatic Master-slave, automatic
Scaling Clustering (BigCouch) Clustering (Redis cluster*)
Scripting JavaScript, Erlang, others via plugin Lua*
Files One per database One per database
Virtual Files Attachments No
Other Changes feed, Standalone applications Pub/Sub, Key expiry

* Coming in the near future.

Posted by: Pixy Misa at 11:27 AM | Comments (4) | Add Comment | Trackbacks (Suck)
Post contains 639 words, total size 7 kb.

1 A big difference between MongoDB and CouchDB also seems to be the memory usage for simple 'single key access' use cases. You can't restrict how much memory MongoDB is going to take. You don't seem to need to restrict the RAM usage of CouchDB because it's barely noticeable as far as I can see.
Especially the ability to combine CouchDB with Elasticsearch is great. There is a project for MongoDB and Solr (https://github.com/mikejs/photovoltaic), but it seems to rely heavily on replication internals

Posted by: Marc Seeger at Tuesday, August 09 2011 07:07 PM (8PLzn)

2 Yes, CouchDB is very light on memory, even with large databases.  (I built a 10GB database as a test; I think CouchDB used no more than 17MB of memory at any point.)

MongoDB really needs its own dedicated server or heavyweight VM.  In my testing with a lightweight VM (OpenVZ) it promptly used up all available memory, crashed, and corrupted the database.   Not cool.

Posted by: Pixy Misa at Tuesday, August 09 2011 09:01 PM (PiXy!)

3 Perhaps the worst thing about MongoDB's memory "management" is how it reduces your ability to analyze the health of the server, and of mongod itself. Because everything is memory-mapped, once your DB is bigger than your RAM, any operation can trigger a page fault, even a simple db.stats() call, leading to very unpredictable performance. On my write-heavy log server, I can't collect performance stats from mongod at regular intervals, because the act of doing so blocks queries and replication for up to N seconds. And, of course, VM size of the mongod process is over a terabyte, and resident size is "everything it can grab", generally 19GB on my 24GB machine.


On a side note, I have had successful crash recoveries, but it took hours, and was only possible because the file system was less than 50% full. These were hard crashes of the "kernel disabled the IRQ for the SATA bus" variety, and I was pleased that I only lost the last 60 seconds or so of data. Plus the two hours while the DB was repairing itself. :-(

Based on my experience over the past year (200-350 million inserts a day), I would not attempt to run a non-trivial MongoDB project on a virtual server. It really needs 2-3 machines with tons of RAM and a fast hardware RAID (that's N+1 times bigger than your database could ever possibly be, where N is the number of snapshots you need for backups).

I would also never share a single mongod between multiple projects, regardless of their size; the global locking is a killer.

-j

Posted by: J Greely at Wednesday, August 10 2011 02:06 AM (2XtN5)

4 Awesome blog, we use CouchDB quite a bit on http://blitz.io (see http://blog.mudynamics.com/2011/08/07/blitz-io-how-we-use-heroku-aws-and-couchdb/) and have been keeping an eye on redis. When it comes to indexes, I think the sorted sets of redis is an absolute +1 because you can use that for cron-like jobs with unix timestamps as the key. What's missing I think is sync'ing CouchDB docs to redis for ephemeral write-heavy operations and periodically updating CouchDB with the resulting snapshots. But, you are right: redis & CouchDB do seem like a great fit together.

Posted by: kowsik at Friday, August 12 2011 03:14 PM (Q0X13)

Hide Comments | Add Comment

Comments are disabled. Post is locked.
54kb generated in CPU 0.0343, elapsed 0.4723 seconds.
56 queries taking 0.4543 seconds, 340 records returned.
Powered by Minx 1.1.6c-pink.