Thursday, 18 September 2014

Facebook Runs Stress Test – Shuts Down Entire Data Centre

We have a rare piece of data centre news to share with you today courtesy of the people over at Facebook.  Apparently, the social media giant recently ran a system-wide stress test by completely shutting down an entire data centre for a full day.  The idea was to see how the remainder of the company's systems would respond in the event of a complete data centre failure.

Facebook is not saying which of its data centres was shut down for the test.  That said, it operates facilities in Sweden and five US states: California, Iowa, North Carolina, Oregon and Virginia.  Had the shutdown resulted in a massive failure, it could have made for one of those data centre events that made headlines the world over however apparent the success of the test has served to leave it off the news radar for the most part.

According to Facebook's global head of engineering Jay Parikh, the company did undertake a few ‘fire drills’ to prepare for the eventual test.  He spoke about the plans and results of the test at a recent San Francisco conference.  He told the assembled crowd that, when the day finally came to pull the plug, they shut down an entire region by turning off tens of megawatts of power.

All signs indicate that the test was successful.  Although there were some minor glitches, all of the important components of Facebook remained active around the world.  The biggest point users may have noticed is that some of their favourite applications were not working.  By and large however, any disruption was not significant enough for people to suspect there was a problem.

With the test now behind it, Facebook has developed a number of improvements that will be implemented in the future.  The improvements are designed to address the shortcomings observed during the shutdown.  Parikh says Facebook was pleased enough with the results that it is planning to do more stress tests in the future.

Will Others Follow?

To our knowledge, the complete shutdown of a major data centre for the purposes of conducting a stress test is not normal industry practice.  The fact that Facebook was willing to do it demonstrates the confidence it has in the integrity and management of its data systems.  We would be lying if we said we were not impressed by it.

Parikh told the San Francisco conference that the philosophy of Facebook is to embrace both risk-taking and potential failure.  He encouraged their engineers to take risks in order to push the company further, just as long as those risks are not unnecessarily reckless.  Company officials believe this to be necessary in order to continue to be a dominant industry player.   At this point, we have no reason to argue.

Our question now is one of whether or not other Internet giants will follow suit.  Could Facebook have started a landslide of major data centre shut downs in the future?  We will know soon enough…

No comments:

Post a comment