Sunday, February 22

Life

I Think I Just Dodged A Tactical Nuke

Almost.

Sakura, the third server in the mu.nu cluster, used for offsite backups and for offloading some sites during the election overload, just dropped dead.

Now the two main servers, Aoi and Midori, do daily backups and then exchange backup files.  I'd intended to apply that to Sakura as well, but Sakura went live about the time I went insane, and it never happened.

The first sign of trouble was earlier today, when Protein Wisdom started running slow, and then stopped entirely.  I logged in and poked around a bit, but couldn't see any problem.  Restarted Apache.  Ran a check on the database.  All fine.

Took a look at the system log.  Reporting 6 bad sectors on the disk.  Okay, that's not great, but not the end of the world.  I'll take a backup, copy it offsite, and get them to replace the disk.

Start the backup to the second disk, and everything's fine for a few minutes, and then it's nothing but I/O errors.  Uh-oh.  The system disk has gone offline, taking with it the databases and the operating system.

I'm still logged into the server, mind you.  But I've lost /bin, /usr/bin, everything bin.  What still works?  Well, let's see.  I have cd.  I have echo, because that's built into the shell.  And I have cat.

And that's about it.

And using cd, echo, and cat, I managed to rescue the main Protein Wisdom database from a disk that I couldn't access, because the entire thing was cached in RAM.

The reason why I only almost dodged the nuke is that it came up with a bunch of I/O errors, and I thought it hadn't worked, so I had tech support reboot the server.

Mistake.  The server wouldn't boot from the drive, so they had to replace it.  They put it in an external caddy, but it still wouldn't recognise it.  So all I had was a corrupted backup of one database.

But I was able - thanks to myisamchk and repair table and a spare copy of the Wordpress database structure - to rebuild that database and merge it into the last off-site backup.

If I hadn't rebooted at that point, I might have been able to do the same for the other databases.

And then I got a call from my business partner.  He had a site - a business site - hosted on that server as well.  And I'd just lost all his data...  Except not, I realised, after my blood pressure had hit 180/140.  He's running a hosted app that was linked from the site he has on that server, so all he needed was for me to restore the static content of his site, which was backed up safely.  Twice, in fact.

The third site on there wasn't so lucky, and lost a couple of month's data.  Fortunately, it's a less busy site, and a couple of hours spent trawling Google's caches let me dig out most of the missing posts; the site owner will just need to cut-and-paste those and re-load the associated media files.

I just spent five solid days - as in, 120 consecutive hours - panicking about the servers at my day job.  And the minute I get that fixed (and in the end, a lot better than it ever was before) and everyone is happy, boom!

And now I'm dead.  I'm so dead my deads have dead on them.  I hadn't even been grocery shopping for two weeks, until I hauled myself up the hill to the shops just now at 8:30 on a Sunday night.

I need some Popotan.

Posted by: Pixy Misa at 08:15 PM | Comments (10) | Add Comment | Trackbacks (Suck)
Post contains 610 words, total size 4 kb.

1 that would suck to lose all your old posts.  that happened to me before, and it took me weeks of working around the clock to restore them all!

Posted by: Comrade Tovya at Monday, February 23 2009 06:08 AM (DAaYy)

2 I'm glad you have a day job.

Posted by: Pete Zaitcev at Monday, February 23 2009 07:04 AM (/ppBw)

3 So am I Pete, though it drives me crazy at times.

The good thing about my day job is that I'm doing some pretty advanced stuff with MySQL and SSDs, and doing some fancy server configurations, so I'm learning new stuff about Linux as well.

The bit that drives me crazy is that it's my job to deliver 24x7 operations on a pre-beta application. eek

Posted by: Pixy Misa at Monday, February 23 2009 12:10 PM (PiXy!)

4 Comrade Tovya - yeah, only this is worse, I lost someone else's posts.

If you lose your own stuff, it's sad and annoying, but you kick yourself for being an idiot an not doing/checking your backups and you move on.

When you lose someone else's stuff because you forgot to do something that you know you have to do, well, you really feel like a jerk. surprised ops:

Posted by: Pixy Misa at Monday, February 23 2009 12:12 PM (PiXy!)

5 Home delivery ??

Posted by: Andrew at Monday, February 23 2009 05:02 PM (/uGTr)

6 It's not that far to the shops, it's just I haven't had any time.  But I don't know in advance that I won't have any time, and don't know when I'll be home, so I end up living on stale Saladas and tinned peaches.

Posted by: Pixy Misa at Monday, February 23 2009 08:13 PM (PiXy!)

7

I tried home delivery of groceries one time. I have no idea whether it would be the same for Pixy, but it wasn't too bad.

However, delivery was scheduled for a 4 hour window during banker's hours and I had to be here to receive it. That's no problem for me; I'm always here anyway. But as busy as Pixy is, I don't think it would work. It'd take less time for him to go to the store, plus he could do it in the evening or on a weekend.

Posted by: Steven Den Beste at Wednesday, February 25 2009 11:33 AM (+rSRq)

8 There needs to be some sort of proxy device that would allow you receive/inspect things while not at home.

Posted by: Space monkey at Wednesday, February 25 2009 01:26 PM (Yx/od)

9 Aoi, that slips my mind, I know it. . . is it yellow? or is it red?  I think it's red.

Sakura is "cherry" often used as "Cherry Blossom,"

Midori is easy cuz of the licquor, which is green.

Posted by: wickedpinto at Thursday, March 05 2009 01:53 PM (hsFNJ)

10 Aoi is blue. smile

Yes, the three current servers are blue, green, and cherry-blossom-pink.

They'll be replaced next month with Akane (crimson) (from Ranma, of course) and Mikan (orange) (from Gakuen Alice).

The server naming convention is "female leads in anime whose names refer to colours".  And for the main server cluster, they have to also start with A or M.  So next two after Akane and Mikan are Ai (indigo) (Popotan) and Momoko (peach) (Sumomomomomomo). smile

Virtual servers are, of course, named after the supporting female cast of the respective shows.

Posted by: Pixy Misa at Thursday, March 05 2009 02:44 PM (PiXy!)

Hide Comments | Add Comment

Comments are disabled. Post is locked.
53kb generated in CPU 0.0147, elapsed 0.1055 seconds.
56 queries taking 0.0965 seconds, 355 records returned.
Powered by Minx 1.1.6c-pink.