Ambient Irony

Saturday, November 30

Toku! Toku! Toku!

My test app (at my day job) continues to burble along happily under TokuMX.

Some more observations:

Compression is fantastic. We have an application with fine-grained access control, and in my test database I'm creating millions of users so that we can find scaling issues before those millions of users actually start hitting our servers.

Under MongoDB, the user permissions table is one of the largest in the database, because each user has a couple of hundred settings. Under TokuMX, we get a compression ratio of 30:1, turning it from a problem to be solved into just another table.

Compression ratios on the other tables is lower, but is still at least 7:1 on data and 3:1 on indexes on my test dataset.
The collection.stats command provides all sorts of useful information on data and index size and index utilisation. You can see not only how much space each index takes on disk, but how often it is being used by queries. In real time. That's a brilliant tool for devops.
Performance seems to suffer when running mongodump backups.

Oh, just on that subject: TokuMX comes in two versions, the Enterprise edition with hot backup support, and the Community edition without. That doesn't mean you can't do online backups on the Community edition - the mongodump and mongoexport utilities work just fine. The Enterprise version comes with a high-performance snapshot backup utility - that is, you get a transactional, consistent point-in-time backup, which mongodump won't do.

You could probably do that with a simple script if your database size is modest, using TokuMX's transaction support. Just start a transaction, iterate through the collections and their contents, and write them out in your preferred format.

Anyway, my test app slows down significantly when I run a backup; I need to investigate further to determine whether this is due to resource starvation (I only have a small server to run these tests, and it doesn't have a lot of I/O or CPU), or whether it's contention within TokuMX.
TokuMX only stores indexes. Each index gets written to a separate file in your data directory. This may lead you to ask, where the hell is my data? The answer is that the _id index for each collection is a clustered index - that is, the data payload of each record is stored alongside the key in the fractal tree.

You can create additional clustered indexes if you need them; this could be a significant win in data warehousing applications, particularly if you are writing to disk rather than SSD. If your application reads records in index order, performance from disk could approach performance from SSD at a much lower cost.
Transactions are a little fiddly due to choices made by driver developers. Many of the MongoDB drivers (in this case, PyMongo) use transparent connection pools to support multi-threaded applications without requiring huge numbers of database connections.

That doesn't work at all for transactions, because the transaction is bound to the connection, and if you have a connection pool, there is no guarantee (in fact, pretty much the opposite) that you will get the same connection for all the operations within your transaction.

The approach we've taken for our web application is a multi-process / single thread model, with one connection per process. We're using Green Unicorn at the moment, but the same would work with other servers as long as they are configured appropriately. For non-web apps, just make sure you create a connection pool per thread, and limit the pool to one connection.

Update: Actually, PyMongo has a mechanism that helps with this. Though not specifically designed for transaction support, requests bind your thread to a specific connection. So you can leave the connection pool alone and just write a little wrapper to handle transactions.

I haven't yet tested replication or sharding. Tokutek say that their replication model generates an even lower I/O load on the replica than on the primary. That's a nice thing if you're running many MongoDB databases (or shards) because you could potentially combine multiple replicas onto a single server. (Assuming that not all your primary servers die at the same time.)

There are a few things that TokuMX doesn't do yet that I'd like to see:

As I noted, TokuMX stores all your indexes as individual files in your data directory. I strongly prefer this to MongoDB's indivisible database blobs, but if you have a lot of databases, each with a lot of collections, each with several indexes, you end up with a whole bunch of files piled into that one directory. Having it organised with a subdirectory for each database would be nice. (I might even suggest a subdirectory for each collection.)
Since we can see the individual indexes, a thought arises. Sooner or later, something horrible is going to happen to your database. If you have replicas or good backups, you can recover with little pain. But if something horrible happens to your replica and your backup at the same time, the worst thing in the world is to be left with a database that is right there on your disk but won't start up because one record somewhere is corrupted.

I'd like to see a use-at-own-risk utility that can dump out the contents of a TokuMX clustered index file as a BSON object file (in the same format as mongodump), and just skip over damaged blocks with a warning.
mongoexport is oddly slow. I'm pretty sure I could write a Python script that outruns it. This is not Tokutek's fault, though; they've inherited that from MongoDB. I'd love a faster way to dump to JSON.
TokuMX doesn't currently support MongoDB's geospatial or full-text indexing. I don't see the lack of full-text indexing as a big deal; ElasticSearch offers a much more powerful search engine than MongoDB and is very easy to install and manage.

I would like to see geospatial support - it's not critical for my applications, but having it available would allow us to develop new functionality. Full-text search is something of an afterthought in MongoDB; you're better off with ElasticSearch. But geospatial support is something of an afterthought in ElasticSearch, so having it in TokuMX would potentially save deploying a third database to provide that one requirement.

(Actually, reading up what MongoDB does, it also looks like quadtrees hacked into 2d materialised paths on top of B-trees, but with some intelligence on top of that to handle distances in spherical geometry. So adding that sort of geospatial indexing to TokuMX shouldn't be very difficult.)
Counting records is kind of slow. I'd love to see TokuMX's indexes implement counted trees so that indexed count operations would be lightning fast. (I don't know if that's feasible with fractal tree indexes, but I don't know any reasons why it wouldn't be.)
A default compression setting. If you're using an ODM like MongoEngine, it's not necessarily easy to set the compression on each table and index.

My conclusion: If you are using MongoDB, and don't depend on full-text search or geospatial indexing, you should definitely look into moving to TokuMX. (If you use MongoDB's full-text search, you should look at moving to TokuMX and ElasticSearch. ElasticSearch has data compression too, so the two combined are still going to use less disk space than MongoDB by itself.)

I first looked at MongoDB early in 2010. Half an hour after I started testing it, my database was a smoking wreck, due to the OOM behaviour of OpenVZ and Mongo's storage engine, which at that time was frankly not ready for use.

OpenVZ and MongoDB have since fixed that, so that MongoDB runs under OpenVZ without crashing, and MongoDB doesn't destroy your data if it does crash, but my reservations over the fundamental architecture of the MongoDB storage engine remain.

TokuMX isn't perfect (yet), but it delivers a serious, production-quality storage engine with performance at least as good as vanilla MongoDB while requiring a small fraction of the disk space, and fine-grained locking that provides far greater potential scalability. (My test server is too small to really test that.) And transactions. It's what I was looking for when I first tested MongoDB.

TokuMX gets my coveted Doesn't Suck award.

Oh, and here's the slides to a talk given by John Schulz of AOL on their testing of TokuMX vs. MongoDB. His conclusions:

â€¢ Space per document for MongoDB databases will be reduced by at least 66%. Likely as much as 75%

â€¢ Host memory while important is no longer a serious resource constraint. Now CPUs and to a lesser extent disk I/O bandwidth are the principle constrained resources.

â€¢ We should be able to make full use of the available persistent storage on each host.

â€¢ It is reasonable to assume that we can put 3X to 4X the amount of data and associated workload on a host compared to MongoDB.

â€¢ TokuMX provides more consistent operation times than MongoDB does, improving the customer experience.

â€¢ TokuMX has the potential to save significant hardware cost

Posted by: Pixy Misa at 11:38 AM | Comments (7) | Add Comment | Trackbacks (Suck)
Post contains 1509 words, total size 11 kb.

1 FYI, Starbound's first beta (the Progenitor phase) drops Dec 4, if you haven't already heard.

Posted by: RickC at Sunday, December 01 2013 09:27 AM (swpgw)

2 Nooooo.... There goes my productivity!

Posted by: Pixy Misa at Sunday, December 01 2013 11:26 AM (PiXy!)

3 Now, I'm not sure I believe this, because the very idea is ridiculous, but I hear that you could just not play the game.

Posted by: RickC at Sunday, December 01 2013 12:22 PM (swpgw)

4 Unfortunately I'm already busy not playing Kerbal Space Program, Civ V, and all the new Sims 3 expansion packs. I don't have time to not play Starbound.

Posted by: Pixy Misa at Sunday, December 01 2013 12:45 PM (PiXy!)

5 Got any vacation saved up?

Posted by: RickC at Sunday, December 01 2013 03:45 PM (swpgw)

6 At my day job we'll be mostly shutting down for a couple of weeks over Christmas, But I was planning to spend that time migrating Minx to TokuMX. Starbound could derail that a little.

Posted by: Pixy Misa at Sunday, December 01 2013 05:09 PM (PiXy!)

7 SSSSSSSSSSSssssssssssssstarbound!
Yes, I did wind up playing for like 5 hours last night, but I was less than 5 minutes late to work this morning in spite of being up two hours past my usual bed time.

This game is either not balanced yet or just harder than normal. Kill a mob with a sword, it drops pixels (it's an in-game currency, used, among other things, as a material for most armor and weaponry, and also for 3d printers.) If you want to get food or stuff, you have to shoot it with a bow, and the first bow you can make does half the damage a sword does. Oh, and like Minecraft, you have to hold the button down to draw back on the bow or it won't shoot far or do much damage. And it slows you down, so you have to learn how to do a little dance to keep from being eaten.
Still a lot of fun!

Posted by: RickC at Friday, December 06 2013 12:56 AM (A9FNw)

Hide Comments | Add Comment

Comments are disabled. Post is locked.

58kb generated in CPU 0.0794, elapsed 0.1888 seconds.
56 queries taking 0.1731 seconds, 370 records returned.
Powered by Minx 1.1.6c-pink.

Saturday, November 30

Praise for Ambient Irony

Contact Support

Contact Pixy

Business News

Search Thingy

Recent Comments

Topics

Monthly Traffic

Content

Categories

Archives

A Fine Selection of Aldebaran Liqueurs

That Ol' Janx Spirit

Mostly Harmless

MuNu Blogroll

Dish of the Day

Feeds