Thursday, May 12

Geek

Self-Similar Loads And The Deaths Of Cloud Computing

The recent collapse of not one, but multiple entire Amazon Availability Blooples* into a smoking crater caused a certain amount of buzz in the webosphere.  It would have caused more of a buzz if it hadn't reduced a fair chunk of the webosphere to a smoking crater along with it.

What happened?

Well, someone at Amazon threw the wrong switch during a network upgrade.  Effectively, instead of rerouting traffic onto a carefully planned detour, they rerouted traffic onto the sidewalk.

This did not go over terribly well with all the servers trying to send data to their storage pigs* further along the sidewalk.  Since there's a significant variability in performance of Amazon storage pigs* many servers were set up to take any slowdown as an indication of a bad pig* and automatically try to set up a new pig* to replace it.  To do this, the data had to be replicated....

Along the sidewalk, which was already jammed beyond capacity.

To say that the problem snowballed at this point would be to waste a perfectly good video involving mousetraps and ping-ping balls.



You see, the idea of setting up a huge hosting cloud thingy like Amazon has done is that most servers run mostly idle most of the time.  (Ours, for example, has 12 cores and uses, on average, slightly less than one.)

So if you aggregate a whole lot of servers together into one huge bloople* you can get far more sites running on far less hardware and make a huge amount of money in the process.  Until someone drops a ping-pong ball; once that happens there's no way to stop the process.  It's far too big and complicated to control manually.  The entire bloople* is set to burn down, fall over, and sink into the swamp and all you can do is watch.

/images/AChannelCloudFailure1.jpg
All you can do is watch...


Because traffic (and hence load) doesn't neatly average out when you aggregate lots of different services together.  Instead, it piles up.  Internet activity levels are self-similar - everything everywhere tending to follow the same pattern of spikes and dips at the same time

When one service spikes, it's likely that everything else is spiking at exactly the same moment.  And since cloud computing gains efficiency by eliminating the huge amount of headroom you would traditionally plan into a dedicated server (or server farm, depending on how many shoestrings you have to throw around), this leads to everyone looking for extra capacity at the same moment.  And that puts more strain on everything right when it's at its busiest, and....

Splat.*

In Amazon's case, the splat* was triggered by someong dropping a ping-pong ball.  But that's just the proximate cause.  People drop ping-pong balls every day.  It's only a drama if you happen to have covered every level surface of your home including the ceiling with fully-armed spring-loaded ping-pong ball launchers.

But that's what every cloud provider, almost without exception, has done.  That's the entire business model.  It is cheap, but it's intrinsically flaky.

/images/AnoHanaCloudFailure.jpg
Intrinsically flaky.


It's no accident either that the piece of the puzzle - uh, bloople* - that flaked out in this was the flakiest flake of all, the network-attached storage.  Amazon's EBS gives you disks attached across a network. 

Disks suck.  There's no gentler way to put it.  At my day job, we have SSDs all over the place, because we'd be dead without them.  (We know, because we tried that at the start.  We died.  Then we went out and bought a bunch of SSDs and tried again.)  Disk access is on the order of ten million times slower than CPUs, and modern servers typically have more CPUs than disks.

Even so, when your disks are right there in your server, at least you can see how busy they are (too busy) and who's using them (you).  When the disks are abstracted away to free-roaming data pigs*, all you have is an end result.  Pig* too slow?  Don't try to investigate the problem.  You can't investigate the problem; it's been abstracted to such a degree that there's simply no information available.  People tried mounting new pigs* because that was the only thing they could do.  They were throwing gasoline onto a bonfire, but when you build a bonfire and hand everyone a free can of gasoline, you really shouldn't be surprised at the result.

So, how do we fix this?

Well, first, everyone everywhere who has anything to do with anything at all should be nailed to the floor and forced to read J. B. S. Haldane's On Being the Right Size.

Second, anyone planning to deploy a new server with disks used for anything other than backups and log files should be lightly shot.

Third, watch Ano Hana.

* The technical term.

Pictures from A Channel and Ano Hana via RandomC.

Posted by: Pixy Misa at 10:11 PM | Comments (3) | Add Comment | Trackbacks (Suck)
Post contains 815 words, total size 6 kb.

1 The question is, who's going to pay the price of your suggestions. Actually what's interesting, the costs of most cloud providers are not far removed from dedicated. And for the storage, dedicated is better (same price at Rackspace Cloud buys 10GB (which is what yukiho.zaitcev.us is), but 400GB at Pacific Rack (where mitsuki.animeblogger.net used to be)). Your operation may be just big enough to have a few servers fully utilized, which puts you in the sweet spot. Anyone who's too smal or too big has to go cloud. I pay $11 for yukiho, but $80 is the smallest box available otherwise (Dreamhost? Don't make me laugh. It's pure cloud too, only without cloud's convenience and programmatic acess, and with mafia customer service).

Posted by: Pete Zaitcev at Friday, May 13 2011 02:23 AM (9KseV)

2 I was actually going to title this piece Self-Similar Loads And The Death And Death And Death Of Cloud Computing, but it was kind of long.  Maybe Self-Similar Loads And The Deaths Of Cloud Computing.  Yeah, that'll work.

The point isn't that cloud computing as a paradigm is bad, but that we're going to see repeats of this outage until everyone understands that (a) there are always diseconomies of scale to contend with, and (b) if you need robust shared storage, you can't use a big pool of network-attached disks.  You just can't.

Joyent seem to have recognised these points; their blog can be both interesting and entertaining.

Amazon's AWS, on the other hand, is the classic beautiful implementation of a terrible idea.

Posted by: Pixy Misa at Friday, May 13 2011 02:55 AM (PiXy!)

3 Where was I?

Oh, yeah.

The brilliant thing about modern SSDs is that accesses are fungible.  With a decent controller, it doesn't matter whether you're reading or writing 4KB blocks or 1MB blocks; you get similar performance.  (Early commodity SSDs - just a few years ago - were mind-bogglingly awful at random writes, but that has since been resolved.)

With disks, accesses are most definitely not fungible.  Reading random 4KB blocks you might get 0.5 MB/s off a typical low-end drive, and only twice that off a high-end server drive.  Reading sequentially, you can easily get 100MB/s off a low-end disk.  Only...  Not if someone is trying to read random blocks off it at the same time.

With the right design - like Apache's Cassandra database, for example - disk I/O can still be screamingly fast.  But when you share disk, you're counting on every one of your customers having well-designed software.  You might as well count on every one of your customers being an 18-year-old natural redhead with 36-24-36 curves, i.e., unless you live in a cartoon it ain't going to happen.

While SSDs are significantly more expensive, the fact that they don't give a damn about access patterns actually makes it cheaper to build robust storage systems.

Now, for you, this doesn't matter a whole lot.  A small, smartly run provider is perfect for you, and on that scale they can actually have an idea of what's going on with the storage systems.

It's the people who are running multiple Quadruple Extra Large EC2 Instances with Cheese - at $1500+ per month plus bandwidth plus storage - who are getting burned, and will keep getting burned until the cloud providers change their approach.

Posted by: Pixy Misa at Friday, May 13 2011 03:11 AM (PiXy!)

Hide Comments | Add Comment

Post is locked.
46kb generated in CPU 0.03, elapsed 0.0829 seconds.
51 queries taking 0.0149 seconds, 214 records returned.
Powered by Minx 1.1.6c-pink.