Tuesday, November 26

Geek

TokuMX

I started writing a short note on TokuMX and it turned into a history of databases and the relative value of relational vs. non-relational systems vis-a-vis traditional and non-traditional use cases.  Which would be great except that I really don't have time to write a book today.

So, quickly: Tokutek, who brought us the nifty TokuDB engine for MySQL, have done the same thing for the new default database, MongoDB.

TokuMX takes MongoDB and swaps out the rather questionable storage engine for something substantially more scalable and robust, based on fractal tree indexes.  This means three things:
  1. The database doesn't lock itself up when load gets high.  This was a big problem with MySQL in the old days (you can still see it when I do background processing on Minx, because I haven't had a chance to convert the tables to a newer format).  It's still a problem with MongoDB; less so than what MySQL was, but more so than MySQL is now.

  2. It has transactions.  This is basic database technology.  If I want to take ten dollars from my account and pay it into your account, the last thing I want to see is for the server to go down after the first step but before the second one.  Transactions make sure that either both steps happen, or neither.  MongoDB doesn't have transactions.*  MySQL didn't either, years ago.

  3. It's compressed.  It is, in fact, extraordinarily compressed relative to MongoDB.  My test system at work (which uses real-world, albeit slightly odd, data) shrank from 19GB to 1.9GB. 
TokuMX lacks two things that MongoDB does have: Full-text search, and geospatial indexing.

I don't see either as a major issue.  MongoDB's full-text search is neat if you really must have just one database for everything, but it's far less powerful than ElasticSearch.  Using a search engine as well as a database means duplicating all your data, but (a) you can set up ElasticSearch so that it automatically indexes your MongoDB / TokuMX data, and (b) since ElasticSearch also compresses data automatically, TokuMX and ElasticSearch combined require a fraction of the disk space of MongoDB alone.

(ElasticSearch also supports geospatial queries; it looks to me as if they've used materialised paths to fudge quadtrees into their inverted indexes.  Clever, and should suffice for most use cases.  I'd never considered multidimensional materialised paths before.)

The reduced disk space is more significant than it might seem at first glance.  Smaller databases mean that it's more feasible to keep everything on SSD, which means much better performance.  Also, if your database is one fifth the size (assuming you replace MongoDB with a combination of TokuMX and ElasticSearch) you can cache five times as much data in memory.  

Last week I was working with a 50GB MongoDB databases and a 5GB ElasticSearch index, which was a little slow on my 32GB server.  Now I can work with twice as much data and have it all fit in memory, which is a huge win.

So, I'm waiting to see if this is going to blow up in my face in some weird way, as shiny new things usually do.  But so far it is all looking very promising.

* It has atomic operations, and you can futz around with those to construct your own transaction manager if your really want to, but it doesn't support transactions out of the box.

Posted by: Pixy Misa at 06:25 PM | Comments (5) | Add Comment | Trackbacks (Suck)
Post contains 563 words, total size 4 kb.

1 Databases are a black art to me. I was a programmer for 25 years, but I worked on embedded software, which is an entirely different field.

Posted by: Steven Den Beste at Tuesday, November 26 2013 10:24 PM (+rSRq)

2 If their compression promise holds up, that will make my logging servers a lot happier. I'll have to play with it this weekend and see how it handles a few days of logs. Certainly better than waiting for the long-promised compression support in regular Mongodb.

-j

Posted by: J Greely at Wednesday, November 27 2013 02:50 AM (+cEg2)

3 Did you guys see that blog article by one woman who migrated from Mongo to Postregs? Awesome way to project her narrow experience onto all databases of all times.

Posted by: Pete Zaitcev at Wednesday, November 27 2013 04:07 AM (f0Btc)

4 I skimmed it just now, and when I saw "Rails", the rest was pretty predictable.

-j

Posted by: J Greely at Wednesday, November 27 2013 09:12 AM (fpXGN)

5 I hadn't seen it, but MongoEngine solves the relational issue on Python, and Mongoid does it for Ruby.

Though if you're going to migrate off MongoDB anyway, Postgres is a good choice.

Posted by: Pixy Misa at Wednesday, November 27 2013 09:18 AM (PiXy!)

Hide Comments | Add Comment

Comments are disabled. Post is locked.
50kb generated in CPU 0.0164, elapsed 0.1257 seconds.
56 queries taking 0.1143 seconds, 348 records returned.
Powered by Minx 1.1.6c-pink.