Why did you say six months? He's coming. This matters. This is important. Why did you say six months? Why did you say five minutes?
Saturday, November 30
Toku! Toku! Toku!
My test app (at my day job) continues to burble along happily under TokuMX.
Some more observations:
Compression is fantastic. We have an application with fine-grained access control, and in my test database I'm creating millions of users so that we can find scaling issues before those millions of users actually start hitting our servers.
Under MongoDB, the user permissions table is one of the largest in the database, because each user has a couple of hundred settings. Under TokuMX, we get a compression ratio of 30:1, turning it from a problem to be solved into just another table.
Compression ratios on the other tables is lower, but is still at least 7:1 on data and 3:1 on indexes on my test dataset.
The collection.stats command provides all sorts of useful information on data and index size and index utilisation. You can see not only how much space each index takes on disk, but how often it is being used by queries. In real time. That's a brilliant tool for devops.
Performance seems to suffer when running mongodump backups.
Oh, just on that subject: TokuMX comes in two versions, the Enterprise edition with hot backup support, and the Community edition without. That doesn't mean you can't do online backups on the Community edition - the mongodump and mongoexport utilities work just fine. The Enterprise version comes with a high-performance snapshot backup utility - that is, you get a transactional, consistent point-in-time backup, which mongodump won't do.
You could probably do that with a simple script if your database size is modest, using TokuMX's transaction support. Just start a transaction, iterate through the collections and their contents, and write them out in your preferred format.
Anyway, my test app slows down significantly when I run a backup; I need to investigate further to determine whether this is due to resource starvation (I only have a small server to run these tests, and it doesn't have a lot of I/O or CPU), or whether it's contention within TokuMX.
TokuMX only stores indexes. Each index gets written to a separate file in your data directory. This may lead you to ask, where the hell is my data? The answer is that the _id index for each collection is a clustered index - that is, the data payload of each record is stored alongside the key in the fractal tree.
You can create additional clustered indexes if you need them; this could be a significant win in data warehousing applications, particularly if you are writing to disk rather than SSD. If your application reads records in index order, performance from disk could approach performance from SSD at a much lower cost.
Transactions are a little fiddly due to choices made by driver developers. Many of the MongoDB drivers (in this case, PyMongo) use transparent connection pools to support multi-threaded applications without requiring huge numbers of database connections.
That doesn't work at all for transactions, because the transaction is bound to the connection, and if you have a connection pool, there is no guarantee (in fact, pretty much the opposite) that you will get the same connection for all the operations within your transaction.
The approach we've taken for our web application is a multi-process / single thread model, with one connection per process. We're using Green Unicorn at the moment, but the same would work with other servers as long as they are configured appropriately. For non-web apps, just make sure you create a connection pool per thread, and limit the pool to one connection.
Update: Actually, PyMongo has a mechanism that helps with this. Though not specifically designed for transaction support, requests bind your thread to a specific connection. So you can leave the connection pool alone and just write a little wrapper to handle transactions.
I haven't yet tested replication or sharding. Tokutek say that their replication model generates an even lower I/O load on the replica than on the primary. That's a nice thing if you're running many MongoDB databases (or shards) because you could potentially combine multiple replicas onto a single server. (Assuming that not all your primary servers die at the same time.)
There are a few things that TokuMX doesn't do yet that I'd like to see:
As I noted, TokuMX stores all your indexes as individual files in your data directory. I strongly prefer this to MongoDB's indivisible database blobs, but if you have a lot of databases, each with a lot of collections, each with several indexes, you end up with a whole bunch of files piled into that one directory. Having it organised with a subdirectory for each database would be nice. (I might even suggest a subdirectory for each collection.)
Since we can see the individual indexes, a thought arises. Sooner or later, something horrible is going to happen to your database. If you have replicas or good backups, you can recover with little pain. But if something horrible happens to your replica and your backup at the same time, the worst thing in the world is to be left with a database that is right there on your disk but won't start up because one record somewhere is corrupted.
I'd like to see a use-at-own-risk utility that can dump out the contents of a TokuMX clustered index file as a BSON object file (in the same format as mongodump), and just skip over damaged blocks with a warning.
mongoexport is oddly slow. I'm pretty sure I could write a Python script that outruns it. This is not Tokutek's fault, though; they've inherited that from MongoDB. I'd love a faster way to dump to JSON.
TokuMX doesn't currently support MongoDB's geospatial or full-text indexing. I don't see the lack of full-text indexing as a big deal; ElasticSearch offers a much more powerful search engine than MongoDB and is very easy to install and manage.
I would like to see geospatial support - it's not critical for my applications, but having it available would allow us to develop new functionality. Full-text search is something of an afterthought in MongoDB; you're better off with ElasticSearch. But geospatial support is something of an afterthought in ElasticSearch, so having it in TokuMX would potentially save deploying a third database to provide that one requirement.
(Actually, reading up what MongoDB does, it also looks like quadtrees hacked into 2d materialised paths on top of B-trees, but with some intelligence on top of that to handle distances in spherical geometry. So adding that sort of geospatial indexing to TokuMX shouldn't be very difficult.)
Counting records is kind of slow. I'd love to see TokuMX's indexes implement counted trees so that indexed count operations would be lightning fast. (I don't know if that's feasible with fractal tree indexes, but I don't know any reasons why it wouldn't be.)
A default compression setting. If you're using an ODM like MongoEngine, it's not necessarily easy to set the compression on each table and index.
My conclusion: If you are using MongoDB, and don't depend on full-text search or geospatial indexing, you should definitely look into moving to TokuMX. (If you use MongoDB's full-text search, you should look at moving to TokuMX and ElasticSearch. ElasticSearch has data compression too, so the two combined are still going to use less disk space than MongoDB by itself.)
I first looked at MongoDB early in 2010. Half an hour after I started testing it, my database was a smoking wreck, due to the OOM behaviour of OpenVZ and Mongo's storage engine, which at that time was frankly not ready for use.
OpenVZ and MongoDB have since fixed that, so that MongoDB runs under OpenVZ without crashing, and MongoDB doesn't destroy your data if it does crash, but my reservations over the fundamental architecture of the MongoDB storage engine remain.
TokuMX isn't perfect (yet), but it delivers a serious, production-quality storage engine with performance at least as good as vanilla MongoDB while requiring a small fraction of the disk space, and fine-grained locking that provides far greater potential scalability. (My test server is too small to really test that.) And transactions. It's what I was looking for when I first tested MongoDB.
• Space per document for MongoDB databases will be reduced by at least 66%. Likely as much as 75%
• Host memory while important is no longer a serious resource constraint. Now CPUs and to a lesser extent disk I/O bandwidth are the principle constrained resources.
• We should be able to make full use of the available persistent storage on each host.
• It is reasonable to assume that we can put 3X to 4X the amount of data and associated workload on a host compared to MongoDB.
• TokuMX provides more consistent operation times than MongoDB does, improving the customer experience.
• TokuMX has the potential to save significant hardware cost
Posted by: Pixy Misa at Sunday, December 01 2013 11:26 AM (PiXy!)
3
Now, I'm not sure I believe this, because the very idea is ridiculous, but I hear that you could just not play the game.
Posted by: RickC at Sunday, December 01 2013 12:22 PM (swpgw)
4
Unfortunately I'm already busy not playing Kerbal Space Program, Civ V, and all the new Sims 3 expansion packs. I don't have time to not play Starbound.
Posted by: Pixy Misa at Sunday, December 01 2013 12:45 PM (PiXy!)
Posted by: RickC at Sunday, December 01 2013 03:45 PM (swpgw)
6
At my day job we'll be mostly shutting down for a couple of weeks over Christmas, But I was planning to spend that time migrating Minx to TokuMX. Starbound could derail that a little.
Posted by: Pixy Misa at Sunday, December 01 2013 05:09 PM (PiXy!)
7
SSSSSSSSSSSssssssssssssstarbound!
Yes, I did wind up playing for like 5 hours last night, but I was less than 5 minutes late to work this morning in spite of being up two hours past my usual bed time.
This game is either not balanced yet or just harder than normal. Kill a mob with a sword, it drops pixels (it's an in-game currency, used, among other things, as a material for most armor and weaponry, and also for 3d printers.) If you want to get food or stuff, you have to shoot it with a bow, and the first bow you can make does half the damage a sword does. Oh, and like Minecraft, you have to hold the button down to draw back on the bow or it won't shoot far or do much damage. And it slows you down, so you have to learn how to do a little dance to keep from being eaten.
Still a lot of fun!
Posted by: RickC at Friday, December 06 2013 12:56 AM (A9FNw)
In fact, the new version allows you to access BSON (MongoDB format) data from SQL, and access SQL tables transparently using the MongoDB API.
It has the same limitations as TokuMX (no full-text or geospatial indexes on BSON data, even though Informix itself supports full-text and geospatial indexes), but it does support indexing on arrays and nested fields.
I don't know if I'll ever use it, but it's great to have another option if you're deploying applications on MongoDB.
Edit: And DB2 as well. Very interesting. Wonder what the pricing is for a low-end DB2 deployment these days.
Edit: Well, DB2 Express-C is free. It used to be limited to 4GB of memory, though that's not a major issue since the operating system can still use free memory to cache the filesystem. That's been increased to 16GB. Still only supports two cores, but two 3.5GHz Ivy Bridge or Haswell cores can get a lot of work done. It supports databases up to 15TB, which is pretty big by mee.nu standards. (And not small by the standards of my day job, either, for a single instance. We have 1.5PB of data, but that's spread across many servers.)
DB2 Express is $2210 per server per year, and supports 8 cores and 64GB of memory per instance. The previous version was limited to 8GB, so that's a huge increase. Again, 15TB per database, but I don't see that as a problem; managing a 15TB production database is going to cost you a lot more than $2210 a year.
Microsoft rolled out IE-11 for Win7-64 yesterday, and it slightly breaks some things. Control-V doesn't paste into the post/comment composition window any more, although right-click-"paste" works.
That's what I've noticed so far but I imagine there are other things, too.
Posted by: Steven Den Beste at Thursday, November 28 2013 07:53 AM (+rSRq)
Posted by: Steven Den Beste at Thursday, November 28 2013 05:01 PM (+rSRq)
6
Glad to hear that; the latest version of the editor has the same problems in IE11.
I have another editor, but I won't be able to get to that until the weekend, because it requires changes to Minx.
Posted by: Pixy Misa at Thursday, November 28 2013 09:58 PM (PiXy!)
7
I think I know what's going on. There are two versions of the editor, one for IE and one for everything else. It looks like IE11 supports standards to the extent that it's now incompatible with the non-standard earlier versions.
I'll see if I can tweak the browser detection code to switch IE11 to use the "everything else" version of the code.
Posted by: Pixy Misa at Friday, November 29 2013 12:09 AM (PiXy!)
I started writing a short note on TokuMX and it turned into a history of databases and the relative value of relational vs. non-relational systems vis-a-vis traditional and non-traditional use cases. Which would be great except that I really don't have time to write a book today.
So, quickly: Tokutek, who brought us the nifty TokuDB engine for MySQL, have done the same thing for the new default database, MongoDB.
TokuMX takes MongoDB and swaps out the rather questionable storage engine for something substantially more scalable and robust, based on fractal tree indexes. This means three things:
The database doesn't lock itself up when load gets high. This was a big problem with MySQL in the old days (you can still see it when I do background processing on Minx, because I haven't had a chance to convert the tables to a newer format). It's still a problem with MongoDB; less so than what MySQL was, but more so than MySQL is now.
It has transactions. This is basic database technology. If I want to take ten dollars from my account and pay it into your account, the last thing I want to see is for the server to go down after the first step but before the second one. Transactions make sure that either both steps happen, or neither. MongoDB doesn't have transactions.* MySQL didn't either, years ago.
It's compressed. It is, in fact, extraordinarily compressed relative to MongoDB. My test system at work (which uses real-world, albeit slightly odd, data) shrank from 19GB to 1.9GB.
TokuMX lacks two things that MongoDB does have: Full-text search, and geospatial indexing.
I don't see either as a major issue. MongoDB's full-text search is neat if you really must have just one database for everything, but it's far less powerful than ElasticSearch. Using a search engine as well as a database means duplicating all your data, but (a) you can set up ElasticSearch so that it automatically indexes your MongoDB / TokuMX data, and (b) since ElasticSearch also compresses data automatically, TokuMX and ElasticSearch combined require a fraction of the disk space of MongoDB alone.
(ElasticSearch also supports geospatial queries; it looks to me as if they've used materialised paths to fudge quadtrees into their inverted indexes. Clever, and should suffice for most use cases. I'd never considered multidimensional materialised paths before.)
The reduced disk space is more significant than it might seem at first glance. Smaller databases mean that it's more feasible to keep everything on SSD, which means much better performance. Also, if your database is one fifth the size (assuming you replace MongoDB with a combination of TokuMX and ElasticSearch) you can cache five times as much data in memory.
Last week I was working with a 50GB MongoDB databases and a 5GB ElasticSearch index, which was a little slow on my 32GB server. Now I can work with twice as much data and have it all fit in memory, which is a huge win.
So, I'm waiting to see if this is going to blow up in my face in some weird way, as shiny new things usually do. But so far it is all looking very promising.
* It has atomic operations, and you can futz around with those to construct your own transaction manager if your really want to, but it doesn't support transactions out of the box.
1
Databases are a black art to me. I was a programmer for 25 years, but I worked on embedded software, which is an entirely different field.
Posted by: Steven Den Beste at Tuesday, November 26 2013 10:24 PM (+rSRq)
2
If their compression promise holds up, that will make my logging servers a lot happier. I'll have to play with it this weekend and see how it handles a few days of logs. Certainly better than waiting for the long-promised compression support in regular Mongodb.
-j
Posted by: J Greely at Wednesday, November 27 2013 02:50 AM (+cEg2)
3
Did you guys see that blog article by one woman who migrated from Mongo to Postregs? Awesome way to project her narrow experience onto all databases of all times.
Posted by: Pete Zaitcev at Wednesday, November 27 2013 04:07 AM (f0Btc)
4
I skimmed it just now, and when I saw "Rails", the rest was pretty predictable.
-j
Posted by: J Greely at Wednesday, November 27 2013 09:12 AM (fpXGN)
5
I hadn't seen it, but MongoEngine solves the relational issue on Python, and Mongoid does it for Ruby.
Though if you're going to migrate off MongoDB anyway, Postgres is a good choice.
Posted by: Pixy Misa at Wednesday, November 27 2013 09:18 AM (PiXy!)
Posted by: Pixy Misa at
11:00 PM
| No Comments
| Add Comment
| Trackbacks (Suck)
Post contains 9 words, total size 1 kb.
Night And Day
Clara: I think there's three of them now.
Kate: There's a precedent for that.
Posted by: Pixy Misa at
09:32 PM
| No Comments
| Add Comment
| Trackbacks (Suck)
Post contains 17 words, total size 1 kb.
Wednesday, November 20
Mission Accomplished-ish
One of my goals for the next version of Minx was to reduce build times by a factor of ten, from around 100ms to 10ms for simple pages, and for larger pages from 1s to 100ms.
Here's a post with 1400+ comments. Before (worst case, with neither the element cache or the MySQL query cache in effect):
573kb generated in CPU 4.09, elapsed 3.662 seconds.
66 queries taking 1.0599 seconds, 1591 records returned.
After:
573kb generated in CPU 0.02, elapsed 0.0534 seconds.
41 queries taking 0.0396 seconds, 55 records returned.
The new element cache eliminates the slow database queries, comment text filtering, and much of the template processing, and the performance improvement is dramatic.
Here's my main page, before:
97kb generated in CPU 0.88, elapsed 0.5679 seconds.
31 queries taking 0.2722 seconds, 121 records returned.
And after:
97kb generated in CPU 0.01, elapsed 0.0064 seconds.
14 queries taking 0.0029 seconds, 25 records returned.
What I need to do now is make the cache smarter, so that I can deliver more from the cache rather than rebuilding it. The cached performance is fantastic, but we're only hitting the cache about 40% of the time. I want to get that up above 90%.
Posted by: Steven Den Beste at Thursday, November 21 2013 01:27 AM (+rSRq)
2
One of the bonuses of this work is that I can use simple and robust database queries rather than worrying about micro-optimisation. (Proper indexing is still critical, of course; nothing will fix an unindexed query.)
I'm planning to switch from MySQL to MongoDB in the following version. I want to use MongoEngine, a very nice Python library that lets you treat MongoDB pretty much as native Python data - and automatic data validation to precisely the degree you need it. The only problem is that it imposes a pretty significant processing overhead.
The element cache will work to eliminate that overhead just as effectively as it does with MySQL and the comment formatter, so I'm free to go ahead and use MongoEngine.
Posted by: Pixy Misa at Thursday, November 21 2013 06:15 PM (PiXy!)
The American midwest ("Tornado Alley") gets the vast majority of the world's tornadoes, but it ain't 100%.
Here in the Pacific North West they're virtually unknown. But when I was in high school, one time a bowling alley in Vancouver was destroyed by a sudden high wind. It's long been conjectured that it was a tornado.
Posted by: Steven Den Beste at Tuesday, November 19 2013 01:29 PM (+rSRq)
2
Apparently this is the fourth confirmed tornado in New South Wales this year, but the only one in a populated area. (Western New South Wales is huge and incredibly empty; everyone lives on the coast.)
The Bureau of Meteorology seems to be spending a lot of time today saying "Yes, it was a tornado. Yes, they do happen in Australia."
Posted by: Pixy Misa at Tuesday, November 19 2013 01:48 PM (PiXy!)
3
Tornadoes have been seen on every continent except Antarctica, which probably means that nobody ever goes outside in Antarctica because it's so cold, so they never see 'em.
Glad you're all right, Boss!
Posted by: Wonderduck at Wednesday, November 20 2013 12:50 PM (Izt1u)
4
Apparently there was a tornado in the suburbs of Asquith and North Turramurra - right next to Hornsby - in April last year. I remember the storm but didn't know that there was a tornado involved. When it was investigated afterwards they found a narrow trail of destruction extending for several miles, and the Doppler radar confirmed a tornado.
That was actually a bigger storm, but it didn't get as much attention as this week's event because most of the tornado's path was through open forest, rather than making a direct landing on a busy commercial district in the middle of the day.
Posted by: Pixy Misa at Wednesday, November 20 2013 04:11 PM (PiXy!)
I didn't know about it until I got back to Hornsby station this evening and tried to go to the shops to pick up some groceries, only to find the whole shopping centre taped off with police and emergency services in attendance.
No-one killed, though six people were inside a portable building at the railway station that flipped over, and they must have had a hell of a fright.
My train home was a just few minutes late. Awesome work by NSWGR and the SES, given that there was a tree on the tracks this afternoon.
Update: Found another picture of that portable building from a different angle, which allowed me to identify it. Two observations: First, it didn't just tip over, it travelled a good twenty feet and landed completely upside down. Second, I was standing right next to it three hours earlier.
It wasn't just my suburb that caught it today, either; this view from the Manly ferry looks more like a fishing trawler in a storm in the North Atlantic.
On the other hand, at least the fires are out.
Update Two: From the sound of things, the mini-tornado/storm cell took a path right through the centre of town. It hit the big Westfield shopping centre, blowing out the roof of the cinema multiplex (and trashing the cinemas pretty badly), took part of the roof off the hotel across the mall, crossed over the public library (no reports of damage there), hit the railway station where it flipped that portable building and at least one car, then hit the local technical college, the police station, and the council offices.
The small number of injuries can probably be attributed to two things: First, it was a miserable day here and few people were standing around outside to get hit by debris, and second, where the glass roof blew off in two places in the shopping centre, it sounds like the wind came in through the doors and blew the panes of glass up and out rather than inwards. In some pictures some of the misplaced panes are visible, resting on the intact ones.
Update Three: News report. Apparently the library was damaged, possibly badly.
* Possibly;** witnesses have described a funnel and a debris vortex. Whatever it was, it was highly localised and strong enough to flip cars over. Possibly an earth elemental. Or a really cranky stick insect.
Posted by: Steven Den Beste at Tuesday, November 19 2013 03:21 AM (+rSRq)
2
In the states we call those "microbursts". They cause extensive damage in a very small area. Scary stuff. Glad you're okay!
Posted by: Teresa at Tuesday, November 19 2013 09:35 AM (KhgAG)
3
I was wondering about that, since there were only a couple of videos of the event, and neither one showed the usual distinctive funnel. But the Australian Bureau of Meteorology has now confirmed that it was a tornado, estimated to be an F1.
Posted by: Pixy Misa at Tuesday, November 19 2013 12:55 PM (PiXy!)
4
Glad you weren't hurt.
This sounds like the type of tornado we get most often in the southeastern US. They're small compared to the Midwestern variety but still dangerous and very sudden. They can really sneak up on you, drop a tree on your trailer and then they're gone.
Posted by: The Brickmuppet at Wednesday, November 20 2013 08:34 AM (DnAJl)
Posted by: Pixy Misa at
09:33 AM
| No Comments
| Add Comment
| Trackbacks (Suck)
Post contains 10 words, total size 1 kb.
Sunday, November 17
Fastly
Test version:
Hello Pixy Misa, you are logged in to Minx.
98kb generated in CPU 0.0, elapsed 0.0054 seconds.
14 queries taking 0.003 seconds, 25 records returned.
Powered by Minx 1.1.6c-pink.
The CPU timer has a ~10ms resolution, so it's not seeing anything. 3ms for database access, 5ms total to generate my home page.
Only issue is that posts and comments take up to a minute to percolate up to the home page and the sidebar.
How this will work:
When including a template [include ...] or invoking an applet [applet ...] you will be able to set cache directives:
cache - cache with the default system TTL (time-to-live, currently 60 seconds)
nocache - do not cache
ttl=N - set a custom TTL
Example: [include Posts ttl=30]
Applets are cached by default; regular template includes are not. You have to take care; if a template is included within a loop, and you cache it, it will evaluate once, and then repeat the content over and over. Probably not what you want.
Now I need to work on smarter cache eviction.
Another example, from one of Ace's crazy comment threads:
Hello Pixy Misa, you are logged in to Minx.
572kb generated in CPU 0.02, elapsed 0.0207 seconds.
41 queries taking 0.0086 seconds, 55 records returned.
Powered by Minx 1.1.6c-pink.
20 milliseconds to display 1408 comments. Not too shabby. Before caching, it took 1-2 seconds.
For Ace I wrote a custom template; the template is kind of clunky, but it works great. What I do is check the number of comments against 100-comment chunks, caching the full chunks, and leaving the final chunk uncached.
That way, the bulk of a long thread is cached, but the last few comments aren't, so new comments show up instantly.
1
Something's broken. The timestamp at the bottom of the article should be a link to the article, right? For my last two posts, though, it's "1312645". It's the same number for Wonderduck's last post.
Posted by: Steven Den Beste at Monday, November 18 2013 06:33 AM (+rSRq)
2
Right now I can't leave a comment on my "Noddare" post. When I click to do so, it goes to my F1 post instead.
Posted by: Steven Den Beste at Monday, November 18 2013 07:04 AM (+rSRq)
3
That's really, really, weird. I'll take a look now.
Posted by: Pixy Misa at Monday, November 18 2013 09:16 AM (PiXy!)
Posted by: Steven Den Beste at Monday, November 18 2013 09:33 AM (+rSRq)
5
Fixed. I'm not 100% sure how it happened, but I suspect it had to do with the bulk import-and-fix I ran on Ace's site; that does include code to find and fix paths, and it look like it went wrong somehow. Only 34 posts were affected (out of 900,000+) and it's not happening now, so fingers crossed!
Posted by: Pixy Misa at Monday, November 18 2013 09:34 AM (PiXy!)