Monday, November 28

Geek

A Grasshopper Achieves Enlightenment

The way to consistently achieve acceptable performance on full-text searches using MySQL* is to avoid full-text indexes at all costs.

The problem is threefold. Full-text indexes generally treat your text as one big splodge of data. Minx** is structured into recursive directories of sites containing recursive directories of folders containing recursive directories of threads, which contain posts and comments and various other type thingies, all of which are crosslinked like the great polymer of doom. There's all the structure you could possibly ask for when it comes to narrowing down a search. But if you use a full-text index, it searches the entire database first, and then looks at your selection. This wouldn't be a problem, though, were it not for the other two points.

MySQL somehow scatters its full-text index data all over the disk in bite-size pieces. If all the index data is in memory (and you can force that situation using a special SQL query), it is very fast. If it's not, then MySQL will gather up all its little pieces before doing anything with your search. This can take (literally) a minute. Once it's in memory, your search might take a tenth of a second, but the first time, it's likely to suck.

Finally, what with the time taken to scatter all those little pieces about, building the index in the first place takes forever. Add that one index and your database takes ten times longer to load.

How to avoid all this ugliness? Simple. Use brute-force searches. There are still some tricks there, for instance, that only indexed critera seem to be used to narrow the search before MySQL does the text match, so a carelessly defined selection can end up scanning the entire database. Also, MySQL is pretty darn slow at doing brute-force searches.***

Beyond that, the solution is to build your own search engine. Or use Google. But since I can search all of Ace's posts in about a second and all his comments in four, and since I can easily get it to restrict the search to, say, the last six months or whatever (proportionally reducing the search time) or expand it to multiple blogs or narrow it to specific categories... I think it's good enough to be getting on with.

And since that was the only problem I really still had with Minx** there is now nothing in the way of rolling out a preview release... Except that I have to move house first.****

* In the context of a large-scale blogging system.
** Which I am not working on. Not at all.
*** As it turns out, not really any slower than selecting all the text in the first place. So the problem isn't in the text search itself, which is something of a relief.
**** Yes, again. I don't want to talk about it. In fact, I don't know why I brought it up in the first place. Bah.

Posted by: Pixy Misa at 11:10 PM | Comments (2) | Add Comment | Trackbacks (Suck)
Post contains 498 words, total size 3 kb.

1 I'm sure you must have some boxes that are still not unpacked? That should make it go a bit smoother...

Posted by: Susie at Tuesday, November 29 2005 08:17 AM (a0oF7)

2 Yes, that will help. Sigh.

Posted by: Pixy Misa at Tuesday, November 29 2005 08:25 AM (QriEg)

Hide Comments | Add Comment

Comments are disabled. Post is locked.
43kb generated in CPU 0.03, elapsed 0.1584 seconds.
52 queries taking 0.138 seconds, 283 records returned.
Powered by Minx 1.1.6c-pink.