Late at night last weekend, as Hurricane Sandy was beating the crap out of the eastern seaboard, I received an e-mail message from lower Manhattan. You may have received this message, too, or one just like it. It felt to me like getting a radiogram from the sinking Titanic. An Internet company was running out of diesel fuel for its generator and would shortly be dropping off the net. The identity of the company doesn’t matter. What matters is what we can learn from their experience.
The company had weathered power outages before and had four days of diesel fuel stored onsite. They had felt ready for Sandy. But most of their fuel wasn’t at the generator, it was stored in tanks in the building basement — a basement that was soon flooded, the transfer pumps destroyed by incoming seawater. It was like a miniature Fukushima Daiichi, not far from Wall Street.
The company felt prepared but wasn’t. There was no way to get fuel from the basement to the generator so they were staying online as long as possible then shutting down.
There are obvious lessons here like don’t put the pumps below ground level. Good pumps could draw well enough to be placed above the maximum historical flood stage. But that’s not all that’s wrong with this scenario. Why was the company dependent on a single data center?
It makes little sense for any Internet business to be dependent on a single data center. With server virtualization it is possible to put images of your server here and there to cover almost any failover problem. Not just multiple servers but multiple servers on multiple backbones in multiple cities supported by multiple power companies and backed by multiple generators. We do that even here at I, Cringely and we’re known to be idiots.
Or you could rely on the cloud. The simple idea here is that your application is deployed across tens or hundreds or thousands of server instances in data centers all over the world and even a nuclear attack on Wall Street would have little to no effect on un-irradiated users.
But are you sure about that?
The problem with clouds is that some of them aren’t very cloudy at all. Cloud computing is for some providers more of a marketing term than anything else. What if your cloud is really a single data center in a single city and the pumps in the basement have failed? How cloudy is that?
Last summer Amazon had a major EC2 (cloud services) outage that affected many customers including Netflix. There are long and convoluted stories about how this outage came to happen involving fires and generators and load balancers and mishandled database updates — stories that sound a lot to me like the dog ate my homework — but whatever went wrong it was isolated to a single data center yet somehow took down Amazon’s entire cloud for the U.S. east coast.
That cloud wasn’t a cloud at all but a data center — a single point of failure.
For some companies Sandy was a validation of good network design and emergency planning. For others it was a data disaster. And for the rest it was probably a stroke of luck that they, too, didn’t go down.
Even if nothing happened at your company as Sandy blew through, if you don’t know exactly why you were unscathed, now might be a good time to investigate.