Saturday, January 22

Geek

La Di Da

Just hid my earlier post because I found a bug in one of my benchmarks, and went on a bugsplatting and optimising spree.

I'm back to working on Minx 1.2.  There are two main things that I've wanted to fix for ages.

First, the common database queries for blog index pages, which often require complex joins and sorts - very well optimised joins and sorts, hence the speed, but common queries should be in index order wherever possible and not sort at all.

Second, the tag engine, which needs to populate the tag table with vastly more data than you will ever use - approximately 300 tags for each comment, for example, of which the typical comment template uses 6.

The first problem was what's had me looking into every kind of database under the Sun.*  Since all databases suck, and since it turns out that Python is too slow to take over the job** I'm back to MySQL and its relatives for the main datastore - with the likelihood of offloading search to Xapian and using Redis as a sort of structured cache.

Speaking of structured caches, that's the fix for the query problem.  It's called the Stack Engine, and basically it prebuilds and maintains all the standard queries for all your folders and threads and stores them so that you can page through them without ever having to do an index scan, much less a sort.  The initial version uses MySQL to store the stacks, but it can just as easily (and more efficiently) be handled by Redis.

This will add a few milliseconds when you post a comment, but significantly reduce the query time for displaying a page.

The other major change is in the Template Engine.  Minx has a lot of tags - a lot of a lot of tags; as I mentioned, there are currently about 300 tags just for a comment, and that's set to double in the new relase.

However, I've finally had the breakthrough I needed and found an elegant way to make most of those tags vanish.  In 1.1.1 I set many of the sub-tags to lazily evaluate - there's just a placeholder in the tag table until you actually use that tag.  If you never use it, the function never gets called.

The new engine takes this a two three steps further.  First, for the most part it automatically maps the data from database into the tag table without me having to write hundreds of lines of fiddly code - or even set up lists of fields names.  And it automatically copes with schema changes too, where the old version needed code changes and schema changes to be carefully synchronised.

Second, it completely virtualises about 80% of the existing tags.  There's not even a placeholder anymore; the tag engine looks, finds that the tag you are using doesn't exist but that it can be calculated from a value that does, calls the appropriate and pops the result into your page.  This cuts down the size of the tag table by 80% - and cuts down the time spent building it by 80% as well.

The third part is the code I've been hacking on most of the night, the new data mapper.  It can now pull more than eight million*** fields a second out of the database and into the tag table ready for use - effectively forty million, since 80% of the tags are now virtual.  The tag table itself is also significantly improved, so that adding and removing records (as your template runs through a list of posts or comments, for example) is - um, I'll have to go back and benchmark that part, but a hell of a lot faster.

So (a) these two modules make things run a whole lot faster, especially for big sites like Ace's, which I want to bring across to Minx ASAP, and (b) they make the code much cleaner, wiping out two existing modules full of boilerplate and making the rest of the code much easier to maintain.

So.  Good.  Now I just need to change all the other code to use the new modules....

* Well, that and my day job, where I need to store and index a hundred million posts a day.
** Which will be the subject of another post, you can bet.
*** Latest benchmark run is set at 4,458,841 fields per second in pure Python, 8,515,711 with Psyco.

Posted by: Pixy Misa at 11:46 AM | No Comments | Add Comment | Trackbacks (Suck)
Post contains 746 words, total size 5 kb.

Comments are disabled. Post is locked.
48kb generated in CPU 0.0158, elapsed 0.1472 seconds.
54 queries taking 0.1368 seconds, 335 records returned.
Powered by Minx 1.1.6c-pink.