Sunday, March 22

Geek

The Other Nextwave*

* No, still not the Warren Ellis comic.  Just go and buy the darn thing.  It's got drop bears in it.  What more can you ask?

I had to run a little cleanup script at mee.nu yesterday.  I'd done a bulk zap of CJKV spam via phpMyAdmin, and that left the comment counts out of sync (because they're maintained by the application, rather than by a trigger).  The script took 84 seconds to scan 2.7 million posts and comments across 480,000 threads, and fix everything up nice and neat.

Which isn't bad at all, but for those 84 seconds, mee.nu was locked up.  You couldn't load a page or post a comment or pretty much anything at all.  Some of the admin functions probably worked, and that's about it.

That's a problem of MySQL's MyISAM tables.  They're fast, they're compact, they're easy to implement, and they have all the robust concurrency of a unicycle parade going backwards through a broken turnstile.

Uphill, in a thunderstorm, at night.

So I'm looking at ways to fix that.  If the cleanup script took twice as long - but didn't lock anything up while it was running - then that would make life a lot easier for me.

One way is to use InnoDB tables.  These are less compact and slower than MyISAM, but provide better concurrency.  They don't provide full-text indexing, but at the moment we're not using full-text indexing.  We do full-text searches, but using brute-force scans.  Because, for our purposes, it's more efficient.

But MySQL isn't the only kitten in the sea.  There's PostgreSQL, which is open source and more powerful and flexible than MySQL.  I didn't use it for Minx originally because full-text indexing in PostgreSQL is hideous, but then I didn't end up using full-text indexing.

There's Firebird, the open source version of Borland Interbase.  I don't know much about Firebird, and it doesn't seem to be used much, and the documentation is somewhat lacking, so I'm not inclined towards it at all, really.

There's Ingres.  Yep, Ingres is open source these days.  I'm not sure of the power or the feature set, since I've never used it, but I'm downloading it right now.  (Actually, it would appear that I'm not, since their download server has fallen over.  Ah, there we go.  Documentation is still downloadable; just the software download server is down.)

There's Sybase SQLAnywhere.  It's not open source, but it's free for web applications.  It's your typical solid workgroup database.  Not seriously enterprisey, but without all the random limitations of MySQL.

And there's DB2.  Unfortunately, the free version, DB2 Express-C, is limited to 2 CPUs and 2GB of memory, which is plenty for now (the rest of the memory can go to filesystem cache, which works well enough).  But the next step up - 4 CPUs and 4GB of memory - costs $3000 a year.  So I'm looking at DB2, but it's not my first choice.

Oh, and MaxDB, formerly SAPDB, a database system developed by SAP for running, um, SAP.  It's open source.  Doesn't have full-text indexing at all, but then, we're not using full-text indexing.

I want to get those Ingres docs, darn it.  Open source and professionally developed sounds like a good combination to me.

Update: Ingres has no full-text indexing.  That's a little disappointing, but as I say, we currently don't use full-text indexing because a brute-force search within a site is more efficient.  And I'm looking at dedicated full-text indexing solutions like Xapian in any case; they're a lot more sophisticated than what you typically get in an RDBMS.

It does have R-Trees.  That's a bit of a surprise, and potentially useful.

It appears to have INTERSECT and EXCEPT.  Those are really useful, and can dramatically simplify and accelerate certain queries.  I say appears to, because they're in the keyword list but not described in any detail that I've found so far.

Does come with a Python DB-API compliant interface.  Kind of important, that!

Posted by: Pixy Misa at 04:27 PM | No Comments | Add Comment | Trackbacks (Suck)
Post contains 664 words, total size 4 kb.

Comments are disabled. Post is locked.
48kb generated in CPU 0.0475, elapsed 0.1424 seconds.
54 queries taking 0.1352 seconds, 342 records returned.
Powered by Minx 1.1.6c-pink.