Monday, August 08
Is CouchDB The Anti-Redis?
By which I don't mean, since Redis is cool, CouchDB is uncool. More like is CouchDB Yuri to Redis's Kei? Uh, do they complement each other nicely?
Because it sure looks that way to me.
Inspired by this very handy comparison of some of the top NoSQL databases, I've compiled a simpler item-by-item comparison of CouchDB and Redis, and it appears to be that CouchDB is strong precisely where Redis is weak (storing large amounts of rarely-changing but heavily indexed data), and Redis is strong precisely where CouchDB is weak (storing moderate amounts of fast-changing data).
That is, CouchDB seems to make a great document store (blog posts and comments, templates, attachments), where Redis makes a great live/structured data store (recent comment lists, site stats, spam filter data, sessions, page element cache).
Redis keeps all data in memory so that you can quickly update complex data structures like sorted sets or individual hash elements, and logs updates to disk sequentially for robust, low-overhead persistence (as long as you don't need to restart often).
CouchDB uses an append-only single-file (per database) model - including both B-tree and R-tree indexes - so again, it offers very robust persistence, but will grow rapidly if you update your documents frequently.
With Redis, since the data is all in memory, you can run a snapshot at regular intervals and drop the old log files. With CouchDB you need to run a compaction process, which reads data back from disk and rewrites it, a slower process.
Redis provides simple indexes and complex structures; CouchDB provides complex indexes and simple structures. Redis is all about live data, while CouchDB is all about storing and retrieving large numbers of documents.
Now, MongoDB offers a both a document store and high-performance update-in-place, but its persistence model is fling it at the wall and hope that it sticks, with a recovery log tacked on since 1.7. It's not intrinsically robust, you can't perform backups easily, and its write patterns aren't consumer-SSD-friendly. I do not trust MongoDB with my data.
One of the most unhappy elements of Minx is its interface with MySQL - writing the complex documents Minx generates back to SQL tables is painful. I've tried a couple of different ORMs, and they've proven so slow that they're completely impractical for production use (for me, anyway).
MongoDB offered me most of the features I needed with the API I was looking for, but it crashed unrecoverably early in testing and permanently soured me on its persistance model.
CouchDB is proving to be great for the document side of things, but less great for the non-document side. But I was looking at deploying Redis as a structured data cache, and it makes an even better partner with CouchDB than it does with MySQL.
It's really looking like I've got a winning team here.
Anyway, here's the feature matrix I mentioned:
* Coming in the near future.
Comments are disabled.
Post is locked.
By which I don't mean, since Redis is cool, CouchDB is uncool. More like is CouchDB Yuri to Redis's Kei? Uh, do they complement each other nicely?
Because it sure looks that way to me.
Inspired by this very handy comparison of some of the top NoSQL databases, I've compiled a simpler item-by-item comparison of CouchDB and Redis, and it appears to be that CouchDB is strong precisely where Redis is weak (storing large amounts of rarely-changing but heavily indexed data), and Redis is strong precisely where CouchDB is weak (storing moderate amounts of fast-changing data).
That is, CouchDB seems to make a great document store (blog posts and comments, templates, attachments), where Redis makes a great live/structured data store (recent comment lists, site stats, spam filter data, sessions, page element cache).
Redis keeps all data in memory so that you can quickly update complex data structures like sorted sets or individual hash elements, and logs updates to disk sequentially for robust, low-overhead persistence (as long as you don't need to restart often).
CouchDB uses an append-only single-file (per database) model - including both B-tree and R-tree indexes - so again, it offers very robust persistence, but will grow rapidly if you update your documents frequently.
With Redis, since the data is all in memory, you can run a snapshot at regular intervals and drop the old log files. With CouchDB you need to run a compaction process, which reads data back from disk and rewrites it, a slower process.
Redis provides simple indexes and complex structures; CouchDB provides complex indexes and simple structures. Redis is all about live data, while CouchDB is all about storing and retrieving large numbers of documents.
Now, MongoDB offers a both a document store and high-performance update-in-place, but its persistence model is fling it at the wall and hope that it sticks, with a recovery log tacked on since 1.7. It's not intrinsically robust, you can't perform backups easily, and its write patterns aren't consumer-SSD-friendly. I do not trust MongoDB with my data.
One of the most unhappy elements of Minx is its interface with MySQL - writing the complex documents Minx generates back to SQL tables is painful. I've tried a couple of different ORMs, and they've proven so slow that they're completely impractical for production use (for me, anyway).
MongoDB offered me most of the features I needed with the API I was looking for, but it crashed unrecoverably early in testing and permanently soured me on its persistance model.
CouchDB is proving to be great for the document side of things, but less great for the non-document side. But I was looking at deploying Redis as a structured data cache, and it makes an even better partner with CouchDB than it does with MySQL.
It's really looking like I've got a winning team here.
Anyway, here's the feature matrix I mentioned:
Couchdb | Redis | |
Written in | Erlang | C |
License | Apache | BSD |
Release | 1.1.0, 2.0 preview | 2.2.12, 2.4.0RC5 |
API | REST | Telnet-style |
API Speed | Slow | Fast |
Data | JSON documents, binary attachments | Text, binary, hash, list, set, sorted set |
Indexes | B-tree, R-tree, Full-text (with Lucene), any combination of data types via map/reduce | Hash only |
Queries | Predefined view/list/show model, ad-hoc queries require table scans | Individual keys |
Storage | Append-only on disk | In-memory, append-only log |
Updates | MVCC | In-place |
Transactions | Yes, all-or-nothing batches | Yes, with conditional commands |
Compaction | File rewrite | Snapshot |
Threading | Many threads | Single-threaded, forks for snapshots |
Multi-Core | Yes | No |
Memory | Tiny | Large (all data) |
SSD-Friendly | Yes | Yes |
Robust | Yes | Yes |
Backup | Just copy the files | Just copy the files |
Replication | Master-master, automatic | Master-slave, automatic |
Scaling | Clustering (BigCouch) | Clustering (Redis cluster*) |
Scripting | JavaScript, Erlang, others via plugin | Lua* |
Files | One per database | One per database |
Virtual Files | Attachments | No |
Other | Changes feed, Standalone applications | Pub/Sub, Key expiry |
* Coming in the near future.
Posted by: Pixy Misa at
11:27 AM
| Comments (4)
| Add Comment
| Trackbacks (Suck)
Post contains 639 words, total size 7 kb.
1
A big difference between MongoDB and CouchDB also seems to be the memory usage for simple 'single key access' use cases.
You can't restrict how much memory MongoDB is going to take.
You don't seem to need to restrict the RAM usage of CouchDB because it's barely noticeable as far as I can see.
Especially the ability to combine CouchDB with Elasticsearch is great. There is a project for MongoDB and Solr (https://github.com/mikejs/photovoltaic), but it seems to rely heavily on replication internals
Especially the ability to combine CouchDB with Elasticsearch is great. There is a project for MongoDB and Solr (https://github.com/mikejs/photovoltaic), but it seems to rely heavily on replication internals
Posted by: Marc Seeger at Tuesday, August 09 2011 07:07 PM (8PLzn)
2
Yes, CouchDB is very light on memory, even with large databases. (I built a 10GB database as a test; I think CouchDB used no more than 17MB of memory at any point.)
MongoDB really needs its own dedicated server or heavyweight VM. In my testing with a lightweight VM (OpenVZ) it promptly used up all available memory, crashed, and corrupted the database. Not cool.
MongoDB really needs its own dedicated server or heavyweight VM. In my testing with a lightweight VM (OpenVZ) it promptly used up all available memory, crashed, and corrupted the database. Not cool.
Posted by: Pixy Misa at Tuesday, August 09 2011 09:01 PM (PiXy!)
3
Perhaps the worst thing about MongoDB's memory "management" is how it reduces your ability to analyze the health of the server, and of mongod itself. Because everything is memory-mapped, once your DB is bigger than your RAM, any operation can trigger a page fault, even a simple db.stats() call, leading to very unpredictable performance. On my write-heavy log server, I can't collect performance stats from mongod at regular intervals, because the act of doing so blocks queries and replication for up to N seconds. And, of course, VM size of the mongod process is over a terabyte, and resident size is "everything it can grab", generally 19GB on my 24GB machine.
On a side note, I have had successful crash recoveries, but it took hours, and was only possible because the file system was less than 50% full. These were hard crashes of the "kernel disabled the IRQ for the SATA bus" variety, and I was pleased that I only lost the last 60 seconds or so of data. Plus the two hours while the DB was repairing itself. :-(
Based on my experience over the past year (200-350 million inserts a day), I would not attempt to run a non-trivial MongoDB project on a virtual server. It really needs 2-3 machines with tons of RAM and a fast hardware RAID (that's N+1 times bigger than your database could ever possibly be, where N is the number of snapshots you need for backups).
I would also never share a single mongod between multiple projects, regardless of their size; the global locking is a killer.
-j
On a side note, I have had successful crash recoveries, but it took hours, and was only possible because the file system was less than 50% full. These were hard crashes of the "kernel disabled the IRQ for the SATA bus" variety, and I was pleased that I only lost the last 60 seconds or so of data. Plus the two hours while the DB was repairing itself. :-(
Based on my experience over the past year (200-350 million inserts a day), I would not attempt to run a non-trivial MongoDB project on a virtual server. It really needs 2-3 machines with tons of RAM and a fast hardware RAID (that's N+1 times bigger than your database could ever possibly be, where N is the number of snapshots you need for backups).
I would also never share a single mongod between multiple projects, regardless of their size; the global locking is a killer.
-j
Posted by: J Greely at Wednesday, August 10 2011 02:06 AM (2XtN5)
4
Awesome blog, we use CouchDB quite a bit on http://blitz.io (see http://blog.mudynamics.com/2011/08/07/blitz-io-how-we-use-heroku-aws-and-couchdb/) and have been keeping an eye on redis. When it comes to indexes, I think the sorted sets of redis is an absolute +1 because you can use that for cron-like jobs with unix timestamps as the key. What's missing I think is sync'ing CouchDB docs to redis for ephemeral write-heavy operations and periodically updating CouchDB with the resulting snapshots. But, you are right: redis & CouchDB do seem like a great fit together.
Posted by: kowsik at Friday, August 12 2011 03:14 PM (Q0X13)
55kb generated in CPU 0.0207, elapsed 0.1289 seconds.
56 queries taking 0.1193 seconds, 350 records returned.
Powered by Minx 1.1.6c-pink.
56 queries taking 0.1193 seconds, 350 records returned.
Powered by Minx 1.1.6c-pink.