According to Facebook, it’s Outage on Monday was caused by “A Cascade of Errors”

So, let’s face it, Facebook is having a really bad week.

Three major players – Facebook, Instagram and WhatsApp – in the Social Media’sphere all had major outages that lasted for about 6 hours on Monday, which felt like an eternity to devotees. “We’re aware that some people are having trouble accessing our apps and products,” Facebook said on Twitter. “We’re working to get things back to normal as quickly as possible, and we apologize for any inconvenience.”

Outage tracking site Down Detector logged tens of thousands of reports for each of the services. Facebook’s own site would not load at all for about an hour on Monday; Instagram and WhatsApp were accessible, but could not load new content or send messages. The reason for the outage was not immediately clear. However, multiple security experts quickly pointed to a Domain Name System (DNS) problem as a possible culprit. Around 1 pm ET, Cisco’s internet analysis division ThousandEyes said on Twitter that its tests indicate the outage is due to an ongoing DNS failure. The DNS translates website names into IP addresses that can be read by a computer. It’s often called the “phonebook of the internet.”
More than four hours after the outage started, Facebook CTO Mark Schroepfer tweeted: “We are experiencing networking issues and teams are working as fast as possible to debug and restore as fast as possible.”
“I don’t know If I’ve seen an outage like this before from a major internet firm,” said Doug Madory, director of internet analysis at network monitoring firm Kentik. For a lot of people, Madory told CNN, “Facebook is the internet to them.”  Now that the crisis is over, experts are saying that a “cascade of mistakes” made during maintenance on Facebook’s network caused the outage that took its services offline Monday, the company said in a blog post published on Tuesday.

Facebook’s family of apps, which includes Instagram, WhatsApp and Messenger, were offline for almost six hours as employees scrambled to repair the damage. More than 3.5 billion people around the world use Facebook’s services to communicate with friends and family, distribute political messaging, and expand their businesses through advertising and outreach.

The initial problem occurred in a network Facebook calls its “backbone,” which connects its data centers around the world, Santosh Janardhan, a vice president of infrastructure at Facebook, wrote in the blog post. During maintenance of the network, a command was issued to assess how much capacity was available. But the command backfired, disconnecting the network and blocking Facebook’s data centers from communicating, Mr. Janardhan said. An audit tool designed to catch mistaken commands failed to detect the error, he added.

As the scope of the outage became clear, Facebook engineers struggled to restore access because its data centers are heavily protected and the employees could not gain immediate entry, the company said. “We’ve done extensive work hardening our systems to prevent unauthorized access, and it was interesting to see how that hardening slowed us down as we tried to recover from an outage caused not by malicious activity but an error of our own making,” Mr. Janardhan wrote.

Once the engineers were inside Facebook’s data centers and began to work, they were able to restore the network. But they needed to be gradual when bringing servers online so as not to overwhelm the system, Mr. Janardhan said. The company planned to study how the outage occurred and to create drills that would allow employees to practice fixing Facebook’s systems more quickly, he added.

The outage came the morning after “60 Minutes” aired a segment in which Facebook whistleblower Frances Haugen claimed the company is aware of how its platforms are used to spread hate, violence and misinformation, and that Facebook has tried to hide that evidence. Facebook has pushed back on those claims. The interview followed weeks of reporting about and criticism of Facebook after Haugen released thousands of pages of internal documents to regulators and the Wall Street Journal. Haugen is set to testify before a Senate subcommittee on Tuesday.
Shares of Facebook were down more than 5% in midday trading Monday, putting it on pace for its worst trading day in nearly a year.


Photo Credit: tanuha2001 / Shutterstock.com