Sunday, February 22
Sakura, the third server in the mu.nu cluster, used for offsite backups and for offloading some sites during the election overload, just dropped dead.
Now the two main servers, Aoi and Midori, do daily backups and then exchange backup files. I'd intended to apply that to Sakura as well, but Sakura went live about the time I went insane, and it never happened.
The first sign of trouble was earlier today, when Protein Wisdom started running slow, and then stopped entirely. I logged in and poked around a bit, but couldn't see any problem. Restarted Apache. Ran a check on the database. All fine.
Took a look at the system log. Reporting 6 bad sectors on the disk. Okay, that's not great, but not the end of the world. I'll take a backup, copy it offsite, and get them to replace the disk.
Start the backup to the second disk, and everything's fine for a few minutes, and then it's nothing but I/O errors. Uh-oh. The system disk has gone offline, taking with it the databases and the operating system.
I'm still logged into the server, mind you. But I've lost /bin, /usr/bin, everything bin. What still works? Well, let's see. I have cd. I have echo, because that's built into the shell. And I have cat.
And that's about it.
And using cd, echo, and cat, I managed to rescue the main Protein Wisdom database from a disk that I couldn't access, because the entire thing was cached in RAM.
The reason why I only almost dodged the nuke is that it came up with a bunch of I/O errors, and I thought it hadn't worked, so I had tech support reboot the server.
Mistake. The server wouldn't boot from the drive, so they had to replace it. They put it in an external caddy, but it still wouldn't recognise it. So all I had was a corrupted backup of one database.
But I was able - thanks to myisamchk and repair table and a spare copy of the Wordpress database structure - to rebuild that database and merge it into the last off-site backup.
If I hadn't rebooted at that point, I might have been able to do the same for the other databases.
And then I got a call from my business partner. He had a site - a business site - hosted on that server as well. And I'd just lost all his data... Except not, I realised, after my blood pressure had hit 180/140. He's running a hosted app that was linked from the site he has on that server, so all he needed was for me to restore the static content of his site, which was backed up safely. Twice, in fact.
The third site on there wasn't so lucky, and lost a couple of month's data. Fortunately, it's a less busy site, and a couple of hours spent trawling Google's caches let me dig out most of the missing posts; the site owner will just need to cut-and-paste those and re-load the associated media files.
I just spent five solid days - as in, 120 consecutive hours - panicking about the servers at my day job. And the minute I get that fixed (and in the end, a lot better than it ever was before) and everyone is happy, boom!
And now I'm dead. I'm so dead my deads have dead on them. I hadn't even been grocery shopping for two weeks, until I hauled myself up the hill to the shops just now at 8:30 on a Sunday night.
I need some Popotan.
Posted by: Comrade Tovya at Monday, February 23 2009 06:08 AM (DAaYy)
Posted by: Pete Zaitcev at Monday, February 23 2009 07:04 AM (/ppBw)
The good thing about my day job is that I'm doing some pretty advanced stuff with MySQL and SSDs, and doing some fancy server configurations, so I'm learning new stuff about Linux as well.
The bit that drives me crazy is that it's my job to deliver 24x7 operations on a pre-beta application.
Posted by: Pixy Misa at Monday, February 23 2009 12:10 PM (PiXy!)
If you lose your own stuff, it's sad and annoying, but you kick yourself for being an idiot an not doing/checking your backups and you move on.
When you lose someone else's stuff because you forgot to do something that you know you have to do, well, you really feel like a jerk.
Posted by: Pixy Misa at Monday, February 23 2009 12:12 PM (PiXy!)
Posted by: Andrew at Monday, February 23 2009 05:02 PM (/uGTr)
Posted by: Pixy Misa at Monday, February 23 2009 08:13 PM (PiXy!)
I tried home delivery of groceries one time. I have no idea whether it would be the same for Pixy, but it wasn't too bad.
However, delivery was scheduled for a 4 hour window during banker's hours and I had to be here to receive it. That's no problem for me; I'm always here anyway. But as busy as Pixy is, I don't think it would work. It'd take less time for him to go to the store, plus he could do it in the evening or on a weekend.
Posted by: Steven Den Beste at Wednesday, February 25 2009 11:33 AM (+rSRq)
Posted by: Space monkey at Wednesday, February 25 2009 01:26 PM (Yx/od)
Sakura is "cherry" often used as "Cherry Blossom,"
Midori is easy cuz of the licquor, which is green.
Posted by: wickedpinto at Thursday, March 05 2009 01:53 PM (hsFNJ)
Yes, the three current servers are blue, green, and cherry-blossom-pink.
They'll be replaced next month with Akane (crimson) (from Ranma, of course) and Mikan (orange) (from Gakuen Alice).
The server naming convention is "female leads in anime whose names refer to colours". And for the main server cluster, they have to also start with A or M. So next two after Akane and Mikan are Ai (indigo) (Popotan) and Momoko (peach) (Sumomomomomomo).
Virtual servers are, of course, named after the supporting female cast of the respective shows.
Posted by: Pixy Misa at Thursday, March 05 2009 02:44 PM (PiXy!)
56 queries taking 0.1673 seconds, 303 records returned.
Powered by Minx 1.1.6c-pink.