This wouldn't have happened with Gainsborough or one of those proper painters.

Saturday, June 10

Geek

You Think You've Got Problems

Not only do I have a denial-of-service attack to worry about, I've got 3000 incoming trackbacks per minute.

And that's after I firewalled off the worst offenders. Don't know what it was before, because it made Apache seize up.

And my notebook, which is where I keep, well, pretty much everything, BSOD'd on me earlier, and is now giving me random Unknown Hard Errors.

Yes, I know. BACKUP NOW. What do you think I'm doing?

Update: Whoops, there it goes again. Okay, time for rsync.

Posted by: Pixy Misa at 08:21 AM | Comments (2) | Add Comment | Trackbacks (Suck)
Post contains 95 words, total size 1 kb.

Geek

Trackbacks Are Dead

We are running a very non-standard trackback system here at munu. The standard trackback script is disabled, and our custom one simply logs the trackback request in a text file, taking a small fraction of a second. Another process comes along once a minute, scoops up the log file, filters out the crud, and posts whatever remains. But that happens entirely in the background, and since 99.8% of trackbacks are spam, and it can detect and reject a spam trackback in 50 microseconds, the processing is very, very efficient.

Nevertheless, we are getting enough trackbacks right now to tie up fifty Apache processes. That's over half a gigabyte of memory dedicated to returning 404's to spammers.

Posted by: Pixy Misa at 02:43 AM | Comments (1) | Add Comment | Trackbacks (Suck)
Post contains 122 words, total size 1 kb.

Friday, June 09

Geek

Fiddlicreepi

When tasked with building a huge and complex database application, it is valuable to have already spent half your life doing exactly that. Because then, when faced with a seemingly intractable problem, you can simply cast your mind back to how you solved it last time.

Having said that, multi-master replication still poses problems. Having said that, we're not running a bank here. We can say the order of transactions is not guaranteed. The detail lines are in a different order in Japan as compared to the Netherlands? Doesn't matter. As long as they're all present and correct, and the ordering isn't too badly screwed up (minutes matter; seconds don't), we can get away with it. It's a bit annoying that we need an extra field (the original server number) to guarantee uniqueness on some tables, but that's life.

And for the tables that need to be centrally controlled, well, we centrally control those ones. Makes up 0.01% of transactions and 0.0001% of database operations. No biggie.

Look for it on a website near you, probably around September. I can't divulge the details just yet, but don't worry, you'll know it when you see it.

Posted by: Pixy Misa at 06:30 PM | Comments (1) | Add Comment | Trackbacks (Suck)
Post contains 197 words, total size 1 kb.

Geek

Um...

From the O'Reilly book, Ajax Hacks:
For example, if you have ever used Google Maps, the way you can drag outlying regions into your view conveys the impression that you have all of the maps stored locally on your computer, for your effortless manipulation. Imagine how unpopular this application would be if every time you tried to "drag" the map the page disappeared for a few (long) moments while the browser waited for another server response.
Imagine living somewhere other than the United States.

Actually, the screen doesn't go blank; instead you see the wrong map for a while as it downloads the tiles, blip... blip... blip... blip... blip...

The application would be so sluggish that no one would use it.
Yeah.

Posted by: Pixy Misa at 03:55 AM | Comments (1) | Add Comment | Trackbacks (Suck)
Post contains 120 words, total size 1 kb.

Wednesday, June 07

Geek

Right

50 milliseconds to start Python and load the cgi, cgitb, MySQLdb, os, psyco, sgmllib, string, sys, and time libraries. (It's not currently using psyco, because it has no benefit for such a short program, but I left it in.)

50 milliseconds to connect to MySQL. (CPU time. Elapsed time is roughly the same, I think.)

7 milliseconds (elapsed) to return the 50 most recent matches from Ace of Spades for the word "bush".

10 milliseconds to process the results.

I'm going to set up a miniminx to get rid of 1 & 2. Just by way of experiment.

Posted by: Pixy Misa at 01:27 AM | Comments (3) | Add Comment | Trackbacks (Suck)
Post contains 100 words, total size 1 kb.

Tuesday, June 06

Geek

Huh?

I was forced to kill the MT 2.6 search routine at Munu because it was (a) taking 180MB of memory, (b) taking a couple of minutes, and (c) because of (b) people were clicking on it multiple times until it, essentially, killed the server.

I just rewrote it.

I tested it on Ace's blog, by searching for "Bush". It's currently set to only search the last 500 entries, but for my test I set it to scan the last 5000.

It takes one second. And 19MB of memory. About 350ms for the SQL query and 650ms for the program itself.

It sure ain't optimised. It selects the last 5000 entries, sorts them (because God forbid there should be a useful compound index), yanks the entire result set into a list, scans them for each of the search terms, uses an SGML parser to remove HTML tags, and kicks the result out.

For more reasonable searches, like searching for "test" on the last 500 entries at Munuviana, it takes 35ms for the query and 130ms for the program.

About 120ms of that is start-up: Launching the Python interpreter and loading the seven or eight libraries involved.

Now, I'm not using a template system for this. Still, one-tenth the memory, one hundredth(?) the processing time. I can't be sure about the processing difference, because I can't run the original script right now. I configured Apache with a 100MB memory limit for CGI scripts.

For Minx, the 120ms start-up time wil disappear because the application runs as a multi-threaded server itself, not as individual CGI (or PHP or ASP) scripts. Can only do so much about the query time, but I'll play around with it. And I'll pre-store the excerpts rather than create them on the fly. Well, probably. I might be able to live with an average search time of 45ms.

Hmm.

What if I get MySQL to do the matching? Let's see...

Okay, not good. Hmm.

Ah, there we go. Don't use regexp's unless you need them. "LIKE" is nice and brisk. With a 500-result limit, the Ace/Bush search is 125ms for the SQL query and 300ms for the program. And doing it that way, I could actually page it, so 50 results at a time. Hmm. And when I'm taking the 50-word excerpts, rather than whitespace-split the entire entry, I'll just look at the first 400 bytes.

And let's go back and add that compound index while we're at it...

Okay, now we're cooking. 7ms for the query, 100ms for the search script. Since the resolution of the timer seems to be 10ms, and the search script takes 100ms if you feed it an invalid blog id, that's less than 10ms or so for the actual work.

That'll do.

Posted by: Pixy Misa at 03:47 AM | Comments (1) | Add Comment | Trackbacks (Suck)
Post contains 461 words, total size 3 kb.

<< Page 2 of 2 >>
58kb generated in CPU 0.0234, elapsed 0.1607 seconds.
56 queries taking 0.147 seconds, 345 records returned.
Powered by Minx 1.1.6c-pink.
Using http / http://ai.mee.nu / 343