Ambient Irony

Saturday, May 02

Lost A Day

Oops. Looks like the server died just before the daily backups, not just after. (I pasted yesterday's post in from the version I cross-posted over on Ace's blog.)

If you didn't see the notice I put up while the server was dead, well, the server died. It's up and running but all the LXC containers where the actually work happens are completely frozen, and I was worried that if I touched anything it would just get worse, so I grabbed all the backups and moved them to the new server I already had set up for that purpose.

Took about twelve hours from the old server failing to the new one being operational, but the move I've been planning for months finally happened, so there's that.

If you're missing anything major or having any other problems, please comment here.

Update: Found the problem. Disk errors threw the ZFS pool on the second SSD (where the containers lived) into "faulted" state, so the server was responding but the load average was around 600 because anything in those containers that tried to write to disk was hanging indefinitely.

I've recovered it (which was easy) but it's still warning about data corruption. Backups were intact because they were in a partition on the boot SSD - the idea being that a disk failure of either one would leave us with intact data. I also have offsite backups but they weren't as up to date.

Since we're already on the new server I'll take a final set of backups and the clear and cancel the server.

The new server has a single SSD, but it's in a cluster and backups are synced daily to a storage server with RAID-Z3, so we'd have to lose the main server and four drives on the backup server before we lost data. So we're fine unless there's a datacenter fire.

Another datacenter fire. We survived the last one but that server was down for three weeks while they cleaned up.

Outage Message

Sorry, our server decided to stop serving. We have full backups from just a few hours ago and are restoring them on a new server.

Hold tight, it's just a tiny bit diffly fiddly.

Update 1:43 AM UTC: Application container has been restored - there was an issue with the backup file and I needed to hand-edit it, but that's done and it all worked.

Database is restoring and that will likely take a couple more hours.

Update 3:54 UTC: Database container restored. Configuring things now.

Also, the database needs some repairs since it was a live snapshot.

-- Pixy

Posted by: Pixy Misa at 06:25 PM | Comments (3) | Add Comment | Trackbacks (Suck)
Post contains 437 words, total size 3 kb.

1 Making sure that comments work.

Posted by: Pixy Misa at Saturday, May 02 2026 06:05 PM (PiXy!)

2 Good.

Posted by: Pixy Misa at Saturday, May 02 2026 06:05 PM (PiXy!)

3 I'm missing my family, but that's not your fault.

Posted by: normal at Sunday, May 03 2026 01:01 AM (lzyeE)

Hide Comments | Add Comment

51kb generated in CPU 0.0587, elapsed 0.1642 seconds.
58 queries taking 0.1448 seconds, 367 records returned.
Powered by Minx 1.1.6c-pink.

Saturday, May 02

Outage Message

Praise for Ambient Irony

Contact Support

Contact Pixy

Business News

Search Thingy

Recent Comments

Topics

Monthly Traffic

Content

Categories

Archives

A Fine Selection of Aldebaran Liqueurs

That Ol' Janx Spirit

Mostly Harmless

MuNu Blogroll

Dish of the Day

Feeds