Friday, November 25
Pixy's First Law of Economics: Spam is whatever you have too much of.
If you are trying to identify what is and isn't spam, forget blacklists and bayesian filters. Go by volume. Of course, you have to be in a position to measure the volume, but if you are, that's it for the spammers.
According to Snark!™ duepunti.net currently has a spam ranking of 48. Even if they send me no trackbacks at all for the next hour, they will still be considered a spammer and anything that comes from them will be automatically deleted (and bump their rank up).
Now they're up to 62.8. Slow learners. Of course, I don't provide any feedback, I just null-route the bastards.
I was just thinking - I could post the Snark!™ stats as a public service. Make it (ugh) XML and people could import it directly. Real-time dynamic spammer detection.
First I have to stop Snark!™ going mad and dropping the ball. It did that last night and generated a gigabyte of error messages. I think a leetle bit more tweaking is in order.
Update: See the link above. It still falls over now and then, so you can expect the values to suddenly get reset to zero on occasion until I (a) get that fixed and/or (b) get it to store the spam rankings.
Update: I did (b), 'cause it's easier to add code than to fix what's already there. Not better, just easier.
Update: Okay, I think I've managed (a) as well. Turned out to be a couple of bugs that only occured when there were no trackbacks to be processed. This didn't show up in my testing, because that would mean going an entire minute without getting spammed.
Update: The spammers have gone quiet for now. This is probably the first time I've ever wanted to get spamflooded. The point is, the more spam we get, the better the filter works, and the better the data we can provide to others. We now have an IP address list as well, but because the spam died down just as I implemented that function, it presently contains exactly one address.
We receive well over a million trackbacks a month, so I'm sure we'll have a nice set of sample data coming down the wire soon enough.
Update: Change log sort of thingy. Though I really just added that link to test the whitelist.
Posted by: Susie at Friday, November 25 2005 10:48 AM (a0oF7)
Posted by: Wonderduck at Friday, November 25 2005 12:18 PM (mAAjO)
Posted by: Steven Den Beste at Friday, November 25 2005 12:43 PM (CJBEv)
Posted by: Pixy Misa at Friday, November 25 2005 11:49 PM (AIaDY)
Posted by: Pixy Misa at Tuesday, November 29 2005 08:55 AM (QriEg)
56 queries taking 0.6674 seconds, 338 records returned.
Powered by Minx 1.1.6c-pink.