Sunday, May 24


So I Was Wondering

Note to self: Implement auto-save, dammit.

I already knew that LMDB supported multiple values per key.  Reading the docs last week, I noticed that values within a key were stored in sorted order.  This evening, I was wondering if it were possible to seek to a particular value within a key.

It is.

This is neat.  It means you can use LMDB as an engine to implement a two-dimensional database like Cassandra, or a data structure server like Redis, with the elements of lists, sets, and hashes individually addressable.

Plus the advantage that unlike Redis, it can live on disk and have B-tree indexes rather than just a hash table.  (Though of course Redis has the advantage of predictable performance - living in memory and accessed directly, it's very consistent.)

The other big advantage of LMDB (for me, anyway) is that it natively supports multiple processes - not just multiple threads, but independent processes - handling transactions and locking automatically.  I love Python, but it has what is known as the Global Interpreter Lock - while you can have many threads, only one of them can be executing Python code at any time.  The other threads can be handling I/O, or calling C libraries that don't access shared data, but can't actually be running your code at the same time.

That puts a limit on the performance of any single Python application, and breaking out into multiple processes means you need to find a way to share data between those processes, which is a lot more fiddly than it is with threads.

LMDB don't care.  Thread, process, all the same, just works.


It does have limitations - it's a single-writer/multiple-reader design, so it will only scale so far unless you implement some sort of sharding scheme on top of it.  But I've clocked it at 100,000 transactions per second, and a million bulk writes per second, so it's not bad at all.

Admittedly that was with the write safety settings disabled, so  server crash could have corrupted my database.  But my main use for LMDB is as a smart distributed data structure cache, so if one node dies it can just be flushed and restarted.  In practical use, as a robust database, the numbers are somewhat lower (though with a smart RAID controller you should still be able to do pretty well).

It also supports a rather nice hot backup facility, where the backup format is either a working LMDB database ready to go (without needing to restore) or a cdbmake format backup (which is plain text if you're using JSON for keys and values), and it can back up around 1GB per second - if you have the I/O bandwidth - and only about 20% slower if the database is in heavy use at the time.


Posted by: Pixy Misa at 01:08 AM | No Comments | Add Comment | Trackbacks (Suck)
Post contains 472 words, total size 3 kb.

Apple pies are delicious. But never mind apple pies. What colour is a green orange?

49kb generated in CPU 0.0123, elapsed 0.094 seconds.
56 queries taking 0.0848 seconds, 332 records returned.
Powered by Minx 1.1.6c-pink.