Monday, August 08
By which I don't mean, since Redis is cool, CouchDB is uncool. More like is CouchDB Yuri to Redis's Kei? Uh, do they complement each other nicely?
Because it sure looks that way to me.
Inspired by this very handy comparison of some of the top NoSQL databases, I've compiled a simpler item-by-item comparison of CouchDB and Redis, and it appears to be that CouchDB is strong precisely where Redis is weak (storing large amounts of rarely-changing but heavily indexed data), and Redis is strong precisely where CouchDB is weak (storing moderate amounts of fast-changing data).
That is, CouchDB seems to make a great document store (blog posts and comments, templates, attachments), where Redis makes a great live/structured data store (recent comment lists, site stats, spam filter data, sessions, page element cache).
Redis keeps all data in memory so that you can quickly update complex data structures like sorted sets or individual hash elements, and logs updates to disk sequentially for robust, low-overhead persistence (as long as you don't need to restart often).
CouchDB uses an append-only single-file (per database) model - including both B-tree and R-tree indexes - so again, it offers very robust persistence, but will grow rapidly if you update your documents frequently.
With Redis, since the data is all in memory, you can run a snapshot at regular intervals and drop the old log files. With CouchDB you need to run a compaction process, which reads data back from disk and rewrites it, a slower process.
Redis provides simple indexes and complex structures; CouchDB provides complex indexes and simple structures. Redis is all about live data, while CouchDB is all about storing and retrieving large numbers of documents.
Now, MongoDB offers a both a document store and high-performance update-in-place, but its persistence model is fling it at the wall and hope that it sticks, with a recovery log tacked on since 1.7. It's not intrinsically robust, you can't perform backups easily, and its write patterns aren't consumer-SSD-friendly. I do not trust MongoDB with my data.
One of the most unhappy elements of Minx is its interface with MySQL - writing the complex documents Minx generates back to SQL tables is painful. I've tried a couple of different ORMs, and they've proven so slow that they're completely impractical for production use (for me, anyway).
MongoDB offered me most of the features I needed with the API I was looking for, but it crashed unrecoverably early in testing and permanently soured me on its persistance model.
CouchDB is proving to be great for the document side of things, but less great for the non-document side. But I was looking at deploying Redis as a structured data cache, and it makes an even better partner with CouchDB than it does with MySQL.
It's really looking like I've got a winning team here.
Anyway, here's the feature matrix I mentioned:
|Release||1.1.0, 2.0 preview||2.2.12, 2.4.0RC5|
|Data||JSON documents, binary attachments||Text, binary, hash, list, set, sorted set|
|Indexes||B-tree, R-tree, Full-text (with Lucene), any combination of data types via map/reduce||Hash only|
|Queries||Predefined view/list/show model, ad-hoc queries require table scans||Individual keys|
|Storage||Append-only on disk||In-memory, append-only log|
|Transactions||Yes, all-or-nothing batches||Yes, with conditional commands|
|Threading||Many threads||Single-threaded, forks for snapshots|
|Memory||Tiny||Large (all data)|
|Backup||Just copy the files||Just copy the files|
|Replication||Master-master, automatic||Master-slave, automatic|
|Scaling||Clustering (BigCouch)||Clustering (Redis cluster*)|
|Files||One per database||One per database|
|Other||Changes feed, Standalone applications||Pub/Sub, Key expiry|
* Coming in the near future.
Especially the ability to combine CouchDB with Elasticsearch is great. There is a project for MongoDB and Solr (https://github.com/mikejs/photovoltaic), but it seems to rely heavily on replication internals
Posted by: Marc Seeger at Tuesday, August 09 2011 07:07 PM (8PLzn)
MongoDB really needs its own dedicated server or heavyweight VM. In my testing with a lightweight VM (OpenVZ) it promptly used up all available memory, crashed, and corrupted the database. Not cool.
Posted by: Pixy Misa at Tuesday, August 09 2011 09:01 PM (PiXy!)
On a side note, I have had successful crash recoveries, but it took hours, and was only possible because the file system was less than 50% full. These were hard crashes of the "kernel disabled the IRQ for the SATA bus" variety, and I was pleased that I only lost the last 60 seconds or so of data. Plus the two hours while the DB was repairing itself. :-(
Based on my experience over the past year (200-350 million inserts a day), I would not attempt to run a non-trivial MongoDB project on a virtual server. It really needs 2-3 machines with tons of RAM and a fast hardware RAID (that's N+1 times bigger than your database could ever possibly be, where N is the number of snapshots you need for backups).
I would also never share a single mongod between multiple projects, regardless of their size; the global locking is a killer.
Posted by: J Greely at Wednesday, August 10 2011 02:06 AM (2XtN5)
Posted by: kowsik at Friday, August 12 2011 03:14 PM (Q0X13)
Powered by Minx 1.1.4-pink.