Saturday, July 22
Last October/November, I built a statistical trackback spam filter called Snark. It worked very well, blocking about 99.8% of spam without requiring any attention from me, until its data files got wiped by accident during the DDOS attacks last month.
When I got Snark up and running, we were getting on the order of 10,000 trackbacks per day. Almost all spam, obviously, but Snark made short work of those.
In May, the last full month of Snark operations, we received nearly two million trackbacks.
So far this month, we have received four million trackbacks. It's gotten so bad that at the peak of a spam flood, just firing off a CGI script to log the requests was enough to melt Apache.
So I was scratching my head, wondering why my POST-Redirect-GET wasn't working. All I should have to do is to set the location header, set the return status to 303, and go. But all I got is a blank page, no matter what I tried.
Okay, yeah, it might help to actually set the status field rather than creating a new "Status" header.
Friday, July 21
And Saudi Arabia. And Pakistan. And at one point, South Korea. As Vinnie says:
Why couldn't we have been banned in Turkey instead? It would have saved alot of headache.
Thursday, July 20
Condemnuum, n. A spectrum of activities which at one end would hardly raise an eyebrow at a debutante ball and at the other would make a hyena blush.
Wednesday, July 19
To add insult to injury, the only browser that seems to work properly in this case is IE. That means my CSS is broken, of course.
Update: When starting with a "known good" version of something and attempting to develop a new system from there, it may prove worthwhile to verify that the "known good" version is, in fact, good. (Which it wasn't, though I'm not yet sure precisely where the problem lies. However, my cleanroom CSS doesn't exhibit the poopy behaviour of the borrowed CSS, so it's in there somewhere.)
Thursday, July 13
Not to be using CherryPy for static files.
Well, they do tell you that in the documentation. It works just fine, and makes things a lot easier to set up. But it imposes the same level of overhead for static files as it does for dynamic pages.
6ms of overhead when a page takes 10 to 30ms to generate isn't a big problem.
6ms of overhead when a page takes 2ms to fetch from the cache is more of a problem, but it's better than 160ms of overhead.
6ms of overhead for a static file is... not so good.
So now I get to play with mod_rewrite. Because I don't have a live crocodile to shove down my pants.
Update: Oh look, mod_proxy isn't enabled. So I have to recompile Apache before I can use mod_rewrite with [P] tags. I'll leave it for another day, I think.
Minx is now pie-ified. That's reduced the overhead per page from about 160ms to something like 6ms. Since a page fetched from the cache takes about 2ms to process, and an individual entry page about 12ms, that makes the whole thing just a little bit zippier.
Need to do some more bug testing and performance testing, but it seems stable in terms of speed and memory after coughing up 60,000 pages. With 10 threads, it uses 11MB of memory, though its virtual memory footprint is 114MB. Not entirely sure why it is allocating all that memory and never using it, but since the only real problem that causes is that I can't have more than about 350 threads running in any one Minx instance (119MB real, 2935MB virtual), I can probably live with it. And that only applies on 32-bit platforms anyway.
One thing I'm not doing right now is running with Psyco. Even in the worst case (cached pages), it gives a performance boost of 20%. But it also leaks memory like Netscape 4.5.
Python doesn't assign values to variables, it binds names to values.
Need to write that on a stickynote and attach it to my monitor.
Tuesday, July 11
We hates Python scoping rules. We hates them forever.
(Currently going for a doc crawl on the theory that it can't be that broken.)
Okay, the problem I'm having involves modules, threads, and thread-specific global data. I haven't solved the problem with modules yet, but it turns out there was some magic added to Python 2.4 for thread-specific globals (threading.local). That's a comfort, because I knew that worked, but I couldn't figure out how. CherryPy, the web framework I'm using, supports this under 2.3, but it turns out that it's a hack and it's very slow. So I'm not going mad. Or at least, no more than usual.
I'm moving Minx from the test design, where it is a single CGI program (and so each request is perfectly isolated and I can slap the code together any which way) to production, as a multi-threaded persistent server. Which is much more fiddly in terms of structuring the code and variables, but is twenty to thirty times faster.
Up to 95% of the time taken by the CGI version is overhead: starting a shell, then starting a Python interpreter, loading the twenty or so libraries used, opening a connection to MySQL, and so on. The multi-threaded version does all of that once. (Or at worst, once per thread, for a persistent thread pool.) It also uses Psyco, the Python compiler, which adds a 30% to 50% speed boost for this sort of app. For the CGI version, Psyco takes long enough to do the compile that the overall performance is worse in most cases...
Only because the threads weren't actually isolated from one another, it didn't work at all. I could either add an extra parameter to all the roughly 100 functions I've written so far, or I could work out how to do thread-specific globals.
Update: Okay, all is forgiven. The threading.local trick works flawlessly, even with modules. Threading-local global data for one module is not visible in another, but even if that complicates things for me, that's right. They're modules, not include files. So I have thread-local module-local global variables... Yay!
Update: And it works perfectly with CherryPy. I expected that, because it only makes sense that CherryPy would be using the standard threading module, but there's a difference between being the only sensible way to do something and actually testing it.
Saturday, July 08
You are Wonder Woman
|You are a beautiful princess
with great strength of character.
Yes, I cheated. Of course I cheated. I got Spider Man. Twice.
(via Shamus T. Young)
57 queries taking 0.3944 seconds, 332 records returned.
Powered by Minx 1.1.6c-pink.