So, let’s face it, Facebook is having a really bad week.
Three major players – Facebook, Instagram and WhatsApp – in the Social Media’sphere all had major outages that lasted for about 6 hours on Monday, which felt like an eternity to devotees. “We’re aware that some people are having trouble accessing our apps and products,” Facebook said on Twitter. “We’re working to get things back to normal as quickly as possible, and we apologize for any inconvenience.”
Facebook’s family of apps, which includes Instagram, WhatsApp and Messenger, were offline for almost six hours as employees scrambled to repair the damage. More than 3.5 billion people around the world use Facebook’s services to communicate with friends and family, distribute political messaging, and expand their businesses through advertising and outreach.
The initial problem occurred in a network Facebook calls its “backbone,” which connects its data centers around the world, Santosh Janardhan, a vice president of infrastructure at Facebook, wrote in the blog post. During maintenance of the network, a command was issued to assess how much capacity was available. But the command backfired, disconnecting the network and blocking Facebook’s data centers from communicating, Mr. Janardhan said. An audit tool designed to catch mistaken commands failed to detect the error, he added.
As the scope of the outage became clear, Facebook engineers struggled to restore access because its data centers are heavily protected and the employees could not gain immediate entry, the company said. “We’ve done extensive work hardening our systems to prevent unauthorized access, and it was interesting to see how that hardening slowed us down as we tried to recover from an outage caused not by malicious activity but an error of our own making,” Mr. Janardhan wrote.
Once the engineers were inside Facebook’s data centers and began to work, they were able to restore the network. But they needed to be gradual when bringing servers online so as not to overwhelm the system, Mr. Janardhan said. The company planned to study how the outage occurred and to create drills that would allow employees to practice fixing Facebook’s systems more quickly, he added.
—
Photo Credit: tanuha2001 / Shutterstock.com