Friday, 27 June 2014

Data Centre Failures: Literally Anything Can Happen

So, you thought things were bad the last time the data centre where you work went down? Perhaps things were pretty bad compared to other problems your facility has undergone in the past.  Nevertheless, there have been some pretty epic data centre failures that make others look fairly routine. Data Center Knowledge recently published their top 10 list of epic failures; we would like to share some of them with you:

1. The Yahoo! Squirrel

From our perspective, one of the qualifications for an epic data centre failure is that it be rooted in a seemingly harmless event that causes more damage than one would normally expect.  Such is the case with the 2010 failure of a Yahoo! Facility in Santa Clara, California.

Squirrel problems at data centres are not all that rare. After all, the furry little rodents love to chew. However, in this particular case a single squirrel took out half the Santa Clara data centre by chewing through some very important wires.  Knowing that it is so easy to disrupt Yahoo! operations, one wonders what some of the other search engine companies are up to.

2. No Smoking, Please!

We already know that cigarette smoking can cause serious health risks including lung cancer, emphysema and heart disease however apparently it can spark one of those memorable data centre events that qualifies as being epic.  A case in point is an Australian data centre that was brought down by a smouldering cigarette and a bed of mulch.

Apparently, the Perth iX data centre was down for about an hour when the facility's smoke detection equipment caught a whiff of burning mulch outside the building.  The system mistakenly thought there was a fire within the building, responding just the way it was designed to.

3. A Raging Storm

Superstorm Sandy, the 2012 hurricane that ended up being the second most expensive storm in American history, produced data centre failures up and down the US eastern seaboard.  Although few Americans are surprised by hurricane damage in the southern states, no one was prepared for how serious the storm was in the North East.  Need we say more?

4. It Only Takes a Second

Perhaps the most epic failure to interrupt data communications happened in 2012 as a result of the 'Leap Second Bug'.  When a leap second was added to the atomic clock the world relies on for accurate timekeeping, numerous data centres around the world did not know what to do with it.  Social media went down, torrent sites were affected and even a number of flights out of Australia were interrupted.  It's amazing what one second can do.

Lessons Learned

What can we learn from these four epic data centre failures?  We can learn that a failure can happen literally anywhere – and at any time.  You do not have to be operating a big commercial facility to be affected by the littlest of things.  In the end, it pays to be super diligent at all times.

