Friday, March 25
It took nearly 24 hours all told - not including the backups, which took 48 hours themselves - but I'm on Windows 7 now, and it's working fine.
One critical point: If you have a Realtek network controller (either a card or built in to your motherboard), download the Windows 7 driver for it from the manufacturer's site before upgrading, because your network will be seriously disfunctional afterwards. The driver that ships with Windows 7 delivers only slightly better average speeds than dial-up - even on your local network - and frequently stops working entirely for several seconds at a time.
Posted by: Pixy Misa at
12:30 PM
| No Comments
| Add Comment
| Trackbacks (Suck)
Post contains 104 words, total size 1 kb.
Thirteen hours into my Windows 7 upgrade now.
Still going.
The progress indicator has, thankfully, moved from where it was six hours ago, and is now at 2,099,020 of 2,777,119.
Posted by: Pixy Misa at
02:07 AM
| No Comments
| Add Comment
| Trackbacks (Suck)
Post contains 37 words, total size 1 kb.
Thursday, March 24
Is that Microsoft outsourced the job of writing progress indicators to a trained hamster.
Which died in 1982.
And has yet to be replaced.
Posted by: Pixy Misa at
09:49 PM
| Comments (4)
| Add Comment
| Trackbacks (Suck)
Post contains 26 words, total size 1 kb.
Transferring files, settings, and programs (608,859 of 2,777,119 transferred)
This is the first time I've ever upgraded a Windows system. Usually I'll hang onto them until they're old enough to need replacing or the operating system gets corrupted and dies.*
Nagi is a quad-core machine with 8GB of RAM, and until AMD's new Bulldozer chips arrive later this year there's no upgrade that's worth bothering with. Not that I can reasonably afford, anyway.
So after carefully backing up 2.2TB of miscellaneous stuffs, I kicked off the upgrade at about 2 o'clock this afternoon. It's just gone 9 o'clock now, and the status is exactly as I gave above.
It's not a quick process, not when you start with a 2.5 year old Vista system with 748 applications installed.
And it's telling me that The Sims 2 may not work afterwards.
Also my IDE controller, but I don't think that's even in use.
* Which has happened to me twice, both times due to memory problems of one sort or another.
Posted by: Pixy Misa at
08:02 PM
| No Comments
| Add Comment
| Trackbacks (Suck)
Post contains 173 words, total size 1 kb.
Tuesday, March 22
When you change the user interface, unless it was obviously broken before, 90% of the time 90% of your users will hate the new version. It doesn't matter if you think the new way is right. Your users will hate it.
Since you collectively seem to have all the user experience intuition of a dead possum, you should always always always provide an option to revert to the previous behaviour, even if - no, especially if - you think the previous behaviour was buggy, broken, politically incorrect and caused rats in laboratory cancer.*
Since the Minx user interface is entirely driven by the Minx templates, and completely user-configurable, I can't break it without breaking Minx itself, so I'm relatively safe from this disease. But only relatively safe.
* Today I'm ranting about the "switch to tab" idiocy in Firefox 4, but it's always the same:
Devs: Look at our great new feature!
Users: We looked at it and we don't like it. How do we turn it off?
Devs: You can't. You should love this feature. All of us love this feature. What's wrong with you that you don't love this feature?
Users: You suck.
User X: Here's a patch that turns it off!
Users: Thank you user X!
Devs: Why would anyone want to turn this feature off? It's a great feature. We all love this feature. Oh, and anyone know why our market share is in free fall?
This problem is hardly limited to the Firefox developers, of course, or even UI developers, as witness the great swappiness=0 debate. Though I guess that was an inversion of the trope, with the users jumping with glee on a feature the developer thought was a bad idea.
Posted by: Pixy Misa at
07:24 PM
| Comments (7)
| Add Comment
| Trackbacks (Suck)
Post contains 289 words, total size 3 kb.
Monday, March 21
Running my little Python benchmark again:
| AMD 3.0GHz | Intel 2.93GHz | AMD 2.6GHZ | Intel 3.3GHz | Psyco | |
| Loop | 0.613 | 0.690 | 0.707 | 0.613 | 0.013 |
| String | 1.103 | 0.987 | 1.273 | 0.876 | 0.180 |
| Scan | 0.540 | 0.453 | 0.623 | 0.402 | 0.547 |
| Call | 1.383 | 1.140 | 1.596 | 1.012 | 0.100 |
| Mean | 3.639 | 3.270 | 4.199 | 2.903 | 0.840 |
| Score | 275 | 306 | 238 | 344 | 1190 |
| Mark | 1000 | 1113 | 867 | 1253 | 4332 |
After a little work to eliminate as many of the variables as possible, this is what I get. These scores are from my little Python benchmark, run on Fedora 13 under OpenVZ on my development machine, a 3GHz AMD Phenom II, and the main production server, a 2.93GHz dual Xeon 5670.
One tricky factor is that the Xeon 5670 can actually run at up to 3.33GHz when lightly loaded. I can't see directly what clock speed each core is running at, but by comparing results between busy and quiet times, and taking the best of ten scores for each test when the CPU was lightly loaded, I'm pretty sure I got a snapshot of it running at top speed, and the difference is about 7%. Intel's newer Xeons also have turbo boost, so I've left the numbers unchanged as averages measured on a moderately busy system.
When it comes to new server hardware, I'm projecting these scores to the Opteron 4180, a 2.6GHz $200 chip, and the Xeon E3-1245, a $280 3.3GHz chip. The Opteron clock speed is slower and the Xeon E3 somewhat faster than my test systems, making the difference much more significant. On the other hand, the Opteron has six cores vs. the Xeon E3's four. On the third hand, the Xeon has hyperthreading, which gives a small but measurable boost as well. All that means that the throughput is likely to be pretty much the same between the two chips.
And the Xeon E3 has a downside in that you can't put more than 16GB of RAM on it: It only supports unbuffered memory, and only four modules. Operon 4180 supports both unbuffered and registered memory, and up to six modules of the latter, so it can easily take 48GB. (More is possible, but requires more expensive high-density DIMMs.)
Also, the Xeon E3 got side-swiped by the Great Sandy Bridge Chipset Disaster, and isn't actually available.
So the new low-end Intel chips will be measurably faster than the current low-end AMD server chips, about 45%, in response times if not overall throughput.
On the other hand, there's that 16GB limit. Memory is dirt cheap and you want to put as much of it in a server as you can, and being able to put three times as much in the AMD system is pretty significant. (Oh, and the Opteron is a dual-socket CPU, so you can easily scale to 96GB and a dozen cores if you want.)
The Psyco numbers are from my dev environment, and point out once again what a nifty bit of work Psyco is, and that it should have been rolled into the Python core years ago.
Posted by: Pixy Misa at
10:20 PM
| No Comments
| Add Comment
| Trackbacks (Suck)
Post contains 459 words, total size 6 kb.
I missed my daily Billy action.
On my alt, anyway.
My other alt.
Posted by: Pixy Misa at
08:14 PM
| Comments (1)
| Add Comment
| Trackbacks (Suck)
Post contains 19 words, total size 1 kb.
A little box from Sony just arrived at my door. That was quick.
It turns out that the regular (as opposed to budget Classic) loops come in rather nice jewel cases. It actually makes me miss physical packaging for music. And games. And stuff. Not DVDs so much; DVD cases aren't pretty or well-designed, and I have twenty of them on my desk as it is.
Posted by: Pixy Misa at
08:11 AM
| No Comments
| Add Comment
| Trackbacks (Suck)
Post contains 69 words, total size 1 kb.
Is actually pretty darn good.
I was expecting to post a mild rant (mild because I got into the problem myself, rant because they set things up so it was impossible to get myself out of it).*
But instead they gave me a freebie that solves the problem. Not quite 100% the way I'd like it solved, but that's just my OCD; it solves the problem.
So, well done Electronic Arts.
* Relating to one-use untransferable DLC codes, saved games that lock unless you have all the necessary DLC activated, the fact that you can't buy the same game on Steam twice even if you wanted to, and that with their purchase of Bioware I now have multiple EA accounts.
Posted by: Pixy Misa at
04:14 AM
| No Comments
| Add Comment
| Trackbacks (Suck)
Post contains 124 words, total size 1 kb.
Sunday, March 20
It's possibly the least safe-for-work (safe-for-anywhere) site that anyone sane would have any reason to visit, but this was too cute not to repost.

Posted by: Pixy Misa at
03:39 AM
| Comments (3)
| Add Comment
| Trackbacks (Suck)
Post contains 29 words, total size 1 kb.
Friday, March 18
Rain. Rain rain rainity rain. Rain with cloudy periods and patchy rain. With more rain, and occasional showers, sprinkles, and storms.
For at least the next week.
Good.
Posted by: Pixy Misa at
06:24 PM
| Comments (7)
| Add Comment
| Trackbacks (Suck)
Post contains 34 words, total size 1 kb.
Thursday, March 17
Okay, looks like I'm going to have to either write some more music or wallow in guilt. Sony Creative Software is going download only, so they're clearing out their stock of loops on physical media at 75% off.
I thought that the delivery charges to Australia would be prohibitive, but I guess CDs and DVDs in cardboard sleeves don't cost much to ship, because it's a flat $30 for FedEx Priority shipping.
So I went through the sale catalog, ticked just about everything I had ever wanted to buy but couldn't quite justify previously, and ordered the whole lot. Whee!
Should land here late next week, which is perfect.
Posted by: Pixy Misa at
04:31 AM
| No Comments
| Add Comment
| Trackbacks (Suck)
Post contains 113 words, total size 1 kb.
Wednesday, March 16
I'm busy working on the new (and much needed) spam filter for mu.nu and mee.nu.
The old filter was based on heuristics and blacklists and a couple of security-by-obscurity tricks (a honeypot, a secret question).
The new filter is purely Bayesian.
It's more than a simple text analyser, though. Some of the things I'm doing:
- Contextual analysis: A comment about designer shoes might be fine on a fashion blog, but on a politics blog it's almost certainly spam.
- Language analysis: A comment in Chinese may or may not be spam, but a comment in Chinese replying to a post in French almost certainly is.
- Geographics analysis: Are you in a spam hotspot? Are you in the same part of the world as the blogger?
- Content analysis: Is the comment full of crappy Microsoft markup?
- Metadata analysis: You can put a name, URL, and email address on your comments. The system treats those specifically as names, URLs, and email addresses, not just more comment text.
- Trend analysis: How many comments have you posted in the last ten minutes? How many total? How about under that name vs. that IP? What's the average spam score for comments from that IP?
SMACK
The key understanding here is that Bayesian analysis makes that problem go away. You don't feed the Bayesian score into a calculation along with a bunch of numbers generated by other heuristics. That just makes more work and reduces the reliability of the core mechanism.
What you do is you simplify the numbers in some way (rounding, logarithms, square roots), turn them into tokens, and throw them into the pool. You want to simplify the numbers so that there's a good chance of a match; for example, a five-digit ratio of content:markup isn't going to get many hits, but one or two digits will.
So what we do is we parse, compute, and calculate all these different tokens for a given post, and then we look for the most interesting ones in our database - the ones that, based on our training data, vary the most from the neutral point.
Then we just take the scores for each of those interesting elements, positive or negative, and throw them at Bayes' formula.
And out pops the probability that the comment is spam. (Not just an arbitrary score, but an actual, very realistic, probability.)
And then, based on that, we go and update the scores in the database for every token we pulled from the comment. So if it works out that a comment is spam using one set of criteria, it can train itself to recognise spam using the other identifiable criteria in the comment - based on how distinct those criteria are from non-spam.
Automatically. Which means I don't have to come back and tweak weights or add items to blacklists; it works it all out from context.
The framework is done; I need to write some database code now, load up some tables (like the GeoIP data), and then start training and testing it. If that goes well, I should have it in place early next week.
I have a ton (4 gigabytes) of known spam to train against, but I need to identify a similar amount of known good comments, and that alone is going to take me a day or two.
I looked at just using a service like Akismet. That, all by itself, would cost me more than all the other expenses for keeping the system running put together. Just filtering what's been filtered by the current edition of the spam filter would have cost upwards of $50,000.
A week or two of fiddly coding and training looks like it should pay for itself very quickly.
Posted by: Pixy Misa at
04:16 PM
| Comments (15)
| Add Comment
| Trackbacks (Suck)
Post contains 659 words, total size 4 kb.
Friday, March 11
As an online discussion grows longer, the probability of someone citing the Huffington Post approaches one.
Depending on local statute, you may be allowed to shoot the offender. In Texas, this is actually mandatory.
Posted by: Pixy Misa at
05:44 PM
| Comments (1)
| Add Comment
| Trackbacks (Suck)
Post contains 36 words, total size 1 kb.
Thursday, March 10
Is that even correct?
Anyway, if you're trying to read this from Turkey, that's probably not working out too well.
Posted by: Pixy Misa at
06:08 PM
| No Comments
| Add Comment
| Trackbacks (Suck)
Post contains 23 words, total size 1 kb.
Powered by Minx 1.1.4-pink.









