Wednesday, November 30
This blog now appears at ai.mu.nu. Of course, it still appears at ambientirony.mu.nu and even at ambientirony.com.
I changed the default domain because there are an increasing number of blogs filtering referrers on the word "ambien". Which sucks.
It makes an artform out of awfulness. Every page you stumble across scales unexpected pinnacles of unfriendliness. Apparently they have an online store. I sure as hell can't find it.
Look at this, which is a subsite of the above train wreck. There is exactly one link on the page, and it does nothing.
What the hell? I mean, seriously, what the hell? If this had somehow languished on someone's server since 1996 and I'd just now stumbled across it, I could understand, but the copyright dates are this year (and often, next year).
International Illustrator brings you all the best tubes, tuts, images, fonts, filters, tags and more!Now you're just making things up. You sound like my granddaughter, and I know she makes that stuff up.
Minus thirty trillion Pixy Points. Reformat your server, install Linux and, say, Joomla, and start again from the beginning, because what we have here is a failure to communicate. [You were going to say something with the word "fuck" in it, weren't you? You could tell? Well, yeah.]
So, I have this little program that converts Movable Type blogs, singly or en masse, into Minx blogs.* And I am trying out various queries to see how the database performs when it is actually using the indexes, and now and then adding a new index.
One of the indexes I added was on the number of comments on posts, so you can quickly see where the action is (or was). And the number one post, with 1633 comments, can be found here. I'm surprised the poor system survived.
MySQL takes 0.11 seconds to bring up those comments the first time; 0.03 seconds after they've been cached in memory. Whether this is a worthwhile achievement or not I will leave to Madfish's readers to decide.
* Not that I am working on Minx.
Tuesday, November 29
Voters: We've got Giuliani and Rice; Allen and Rice; Romney and Rice; Rice, McCain and Rice; Rice, Rice, Thompson and Rice; and Frist.
Hugh: I'll take the Frist.
Update: Blargh. Links to Hewitt fail due to broken referrer-spam filter. Click here for working-link version.
How long has that stupid picture been sitting there anyway? Two weeks now?
Note to self: WHM account transfer function trashes symbolic links.
After loading a fresh copy of the database, you must analyse the tables before MySQL will do anything remotely sensible with the indexes.
If you fail to do this... DOH!
(Visualise that "DOH!" in 40-foot-high flashing red neon, with searchlights and helicopters flying overhead and police cars and fire engines and so on and so forth.)
Why exactly have you provided a structure for monthly archives in your database when the system is entirely dynamic and is already indexed by date?
Because the sticky field overrides the date ordering.
But you could add a new index?
And it would add, what, 10% to the thread table size?
And it would allow monthly archives by category and stuff like that?
Monday, November 28
The way to consistently achieve acceptable performance on full-text searches using MySQL* is to avoid full-text indexes at all costs.
The problem is threefold. Full-text indexes generally treat your text as one big splodge of data. Minx** is structured into recursive directories of sites containing recursive directories of folders containing recursive directories of threads, which contain posts and comments and various other type thingies, all of which are crosslinked like the great polymer of doom. There's all the structure you could possibly ask for when it comes to narrowing down a search. But if you use a full-text index, it searches the entire database first, and then looks at your selection. This wouldn't be a problem, though, were it not for the other two points.
MySQL somehow scatters its full-text index data all over the disk in bite-size pieces. If all the index data is in memory (and you can force that situation using a special SQL query), it is very fast. If it's not, then MySQL will gather up all its little pieces before doing anything with your search. This can take (literally) a minute. Once it's in memory, your search might take a tenth of a second, but the first time, it's likely to suck.
Finally, what with the time taken to scatter all those little pieces about, building the index in the first place takes forever. Add that one index and your database takes ten times longer to load.
How to avoid all this ugliness? Simple. Use brute-force searches. There are still some tricks there, for instance, that only indexed critera seem to be used to narrow the search before MySQL does the text match, so a carelessly defined selection can end up scanning the entire database. Also, MySQL is pretty darn slow at doing brute-force searches.***
Beyond that, the solution is to build your own search engine. Or use Google. But since I can search all of Ace's posts in about a second and all his comments in four, and since I can easily get it to restrict the search to, say, the last six months or whatever (proportionally reducing the search time) or expand it to multiple blogs or narrow it to specific categories... I think it's good enough to be getting on with.
And since that was the only problem I really still had with Minx** there is now nothing in the way of rolling out a preview release... Except that I have to move house first.****
* In the context of a large-scale blogging system.
** Which I am not working on. Not at all.
*** As it turns out, not really any slower than selecting all the text in the first place. So the problem isn't in the text search itself, which is something of a relief.
**** Yes, again. I don't want to talk about it. In fact, I don't know why I brought it up in the first place. Bah.
Sunday, November 27
or, Winning A Battle In The War On Spam
If you're wondering where I've been these past few days, well, I've been busy snarking.
Snark! is the new MuNu trackback filter. It's based on the simple but elegant idea that if someone sends us lots of trackbacks, we don't want them. Unlike most people, I am in the position to collate trackback data from across two hundred blogs in real time. So if all at once someone sends three pings to Little Miss Attila and two to Ace of Spades and another four to, say, the the Llamas, I can simply say, "This is Spam, and I shall delete it forthwith", and do so.
We get a lot of spam. Tens of thousands of trackbacks a day. Thousands of comments. We are running MT Blacklist, and most of it gets summarily rejected. But. Movable Type is not the most sprightly of applications. It's a dynamically configured CGI app written in Perl. That's not a recipe for sparkling performance, and indeed, sparkle it does not. It chugs along like a diesel engine, a plough horse rather than a thoroughbred. It can take close to half a second, sometimes more, for Movable Type to decide to reject a trackback.
And when the spammers really get to work, we can receive a thousand trackbacks a minute.
Snark stops 99.8% of trackback spam before it even gets to Movable Type, and it does it very very efficiently. How efficiently? This efficiently:
Blacklist Entries: 23 (plus 61 manual entries)3000 trackbacks stopped, 360 web pages updated, 360 blacklists exported, in the same time it takes Movable Type to do one.
Session Uptime: 6 hours 0 minutes
Pings Received: 3045
Processing Time: 0.50 seconds
This is not a slam on Movable Type itself. The Perl script I use to simply log the incoming trackbacks takes 40 milliseconds, 0.04 seconds, to run. Snark can process the trackbacks a hundred times faster than the system can record them.
What I'm saying is that there are better ways to do things than CGI and Perl. PHP is a significant improvement in terms of performance, but not so much in terms of the language itself.
Python and persistent application servers are where the action is. I tried writing a blogging system exactly that way, but I was unfortunate in my choice of databases (I used Metakit, and it simply doesn't scale). Fortunately, Python SQL programming isn't as bad as all that - it's at least comparable with Perl or PHP.* CherryPy is a very neat way to organise such a system without needing any sort of CGI or PHP front end. And Psyco speeds up even text-processing applications by a good 50%.
Which is not to say that I am busy working on Minx again and hope to have something to show before the end of the year. Not at all.
Update: I've cleaned up the code a little - although it still makes multiple passes through the trackback list - and changed the order in which the filters are applied so that the volume filter comes first (that is, after the whitelist) and the blacklist comes last. That should make things even more efficient since the volume and age filters are O(1) and the blacklist is O(n). Now I just need another 10,000 trackbacks so I can do a comparison. Come in spammer!
Update: I finished the code cleanup and optimisation, and the spammers obliged:
Blacklist Entries: 24 (plus 60 manual entries)I think this one's a keeper.
Session Uptime: 13 hours 3 minutes
Pings Received: 11888
Processing Time: 0.50 seconds
*In other words, about twenty years behind commercial systems.
Saturday, November 26
It's a where, not a what.
Spam - it's an education.
Friday, November 25
Pixy's First Law of Economics: Spam is whatever you have too much of.
If you are trying to identify what is and isn't spam, forget blacklists and bayesian filters. Go by volume. Of course, you have to be in a position to measure the volume, but if you are, that's it for the spammers.
According to Snark!™ duepunti.net currently has a spam ranking of 48. Even if they send me no trackbacks at all for the next hour, they will still be considered a spammer and anything that comes from them will be automatically deleted (and bump their rank up).
Now they're up to 62.8. Slow learners. Of course, I don't provide any feedback, I just null-route the bastards.
I was just thinking - I could post the Snark!™ stats as a public service. Make it (ugh) XML and people could import it directly. Real-time dynamic spammer detection.
First I have to stop Snark!™ going mad and dropping the ball. It did that last night and generated a gigabyte of error messages. I think a leetle bit more tweaking is in order.
Update: See the link above. It still falls over now and then, so you can expect the values to suddenly get reset to zero on occasion until I (a) get that fixed and/or (b) get it to store the spam rankings.
Update: I did (b), 'cause it's easier to add code than to fix what's already there. Not better, just easier.
Update: Okay, I think I've managed (a) as well. Turned out to be a couple of bugs that only occured when there were no trackbacks to be processed. This didn't show up in my testing, because that would mean going an entire minute without getting spammed.
Update: The spammers have gone quiet for now. This is probably the first time I've ever wanted to get spamflooded. The point is, the more spam we get, the better the filter works, and the better the data we can provide to others. We now have an IP address list as well, but because the spam died down just as I implemented that function, it presently contains exactly one address.
We receive well over a million trackbacks a month, so I'm sure we'll have a nice set of sample data coming down the wire soon enough.
Update: Change log sort of thingy. Though I really just added that link to test the whitelist.
Wednesday, November 23
Brush my toothy-pegs,OSM, we hardly knew ye.
Put on my piggy jim-jams,
And say "I'm off to Sleepy Bobo's".
Monday, November 21
Sunday, November 20
The best liveblogging of the withdrawal debate I've seen:
So, the entire last 6 hours in a nutshell is:Part 1
â€œHell no! We wonâ€™t vote! oh, wait. We have to vote? Well, in that case, Hell no! Youâ€™re all wrong! We object! Do we still have to vote? Okay. We all vote on the same side you do.â€
At Euphoric Reality.
Original story by Liz Sidoti for Associated Press. Additional editing for accuracy by Pixy Misa.
WASHINGTON - The House on Friday overwhelmingly rejected calls for an immediate troop withdrawal from Iraq, a vote engineered by the Republicans that was intended to fail. Democrats derided the vote as a political stunt, although it was exactly what they had wanted.
"Our troops have become the enemy. We need to change direction in Iraq," said Rep. John Murtha of Pennsylvania, a Democratic hawk whose call a day earlier for pulling out troops
sparked stirred Republicans to respond to a nasty, personal debate season of Democrat attacks over the war pretty much everything.
The House voted 403-3 to reject a nonbinding resolution calling for an immediate troop withdrawal, after Democrats had failed in desperate attempts to stop the resolution coming to a vote.
"We want to make sure that we support our troops that are fighting in Iraq and Afghanistan. We will not retreat," Speaker Dennis Hastert, R-Ill., said as the GOP leadership pushed the issue to a vote over the protest of Democrats. [Hey, that paragraph didn't need any editing!]
It was the second time in less than a week that President Bush's Iraq policy stirred heated debate in Congress. On Tuesday, the Senate defeated a Democratic push for Bush to lay out a timetable for withdrawal, and then scored an own goal by submitting their own bill for the same thing.
Murtha, a 73-year-old Marine veteran decorated for combat service in Vietnam, issued his call for a
troop withdrawal unconditional surrender and the abandonment of the Iraqi people at a news conference on Thursday. In little more than 24 hours, Hastert and Republicans decided to put the question to the House.
Democrats, aghast that their bluff had been called, said it was a political stunt and quickly decided to vote against it in an attempt to drain it of significance.
"A disgrace," declared House Minority Leader Nancy Pelosi, D-Calif. "The rankest of politics and the absence of any sense of shame," added Rep. Steny Hoyer of Maryland, the No. 2 House Democrat, "not that there's anything wrong with that."
Republicans hoped to place Democrats in an unappealing position â€”
either supporting a withdrawal that critics said would be precipitous or opposing it and angering voters who want an end to the conflict living up to the reality of their own demands. They also hoped the vote could restore GOP momentum on an issue â€” the war â€” that has seen plummeting public support in recent weeks according to the same polls that predicted a comfortable win for John Edwards last November. [Kerry. What? Kerry was the presidential candidate, not Edwards. You're kidding. No, really, he was. Does it matter now? Guess not.]
Democrats claimed Republicans were changing the meaning of Murtha's withdrawal proposal. He has said a smooth withdrawal would take six months, although Murtha's own proposal called for an "immediate redeployment".
At one point in the emotional debate, Rep. Jean Schmidt, R-Ohio, told of a phone call she received from a Marine colonel.
"He asked me to send Congress a message â€” stay the course. He also asked me to send Congressman Murtha a message â€” that cowards cut and run, Marines never do," Schmidt said. Murtha is a 37-year Marine veteran.
Democrats booed and shouted her down â€” causing the House to come to a standstill. However, no pies were thrown.
Rep. Harold Ford, D-Tenn., charged across the chamber's center aisle screaming that Republicans were making uncalled-for personal attacks. "You guys are pathetic! Pathetic!" yelled Rep. Marty Meehan, D-Mass. Speaker of the House, Dennis Hastert, apologised to the nation for the behaviour of the House Democrats, explaining that they were a bit tired and would "feel better after a nap".
"It's just heinous," Rep. Ellen Tauscher, D-Calif., said of the Republican move. "Whatever that means. It's a good word, though. Heinous. I think it means they pulled this out of their ass.
"This is a personal attack on one of the best members, one of the most respected members of this House and it is outrageous," said Rep. Jim McGovern, D-Mass. "We never intended it to come to a vote."
A growing number of House members and senators, looking ahead to off-year elections next November, are publicly worrying about a quagmire in Vietnam. [Iraq! What? The war is in Iraq, not Vietnam. Iraq? Isn't that a desert? Well, yes, mostly. So how does it become a quaqmire? Isn't that a swamp or something? Oh, never mind.] They have been staking out new positions on a war that is increasingly unpopular with the American public according to the latest opinion polls, which we both know aren't worth diddly, has resulted in more than 2,000 U.S. military deaths - far fewer than any other major war - and has cost more than $200 billion, which would be enough money to rebuild half of New Orleans, at least until next year.
63 queries taking 0.133 seconds, 290 records returned.
Powered by Minx 1.1.6c-pink.