The AWS Outage – What Happened, and Why is it a Big Deal?

Magnifying glass over the Amazon Web Services website

A service disruption at Amazon Web Services (AWS), Amazon’s popular cloud hosting and data service, caused massive problems for internet users starting their workweek on Monday. Since AWS powers huge portions of the internet, the list of services and sites that suffered outages on Monday was pretty staggering.

As Mashable reports, according to user-reported issues affected services include United Airlines, AT&T, Fortnite, Disney+, HBO Max, Signal, Snapchat, McDonald’s, Verizon, Venmo, and many more. Amazon services like Prime and Alexa were affected, too. In short: Almost anyone could’ve been affected in some way. Why? Because nearly everything we own these days is internet-connected — our fridges are WiFi-enabled billboards — meaning an AWS outage can disrupt large swaths of lives.

Nearing midday, it appeared the issue was over. But then Amazons’s AWS Health Dashboard indicated problems had resurfaced. “We have confirmed multiple AWS services experienced network connectivity issues in the US-EAST-1 Region,” read an update around 10:30 a.m. ET. “We are seeing early signs of recovery for the connectivity issues and are continuing to investigate the root cause.”

It appeared AWS was seeing issues again, though not on the scale of the outage in the earlier hours. Some services, such as Venmo and Boost Mobile, saw a corresponding jump in user-reported issues on Downdetector.

Amazon previously said that problem had either fully resolved or was resolving. Mashable reached out for comment and was directed to the AWS Health Dashboard. At about 6:35 a.m. ET the AWS Health Dashboard indicated the main issue was resolved, though problems may persist as things got up and running. That could, perhaps, hint at the new problems that surfaced.

“The underlying DNS issue has been fully mitigated, and most AWS Service operations are succeeding normally now,” the 6:35 a.m. ET update read. “Some requests may be throttled while we work toward full resolution.”

Finally, as of 8:20pm ET, Amazon provided more updates on how it repaired its AWS services and noted, “By 3:01 PM [PT, or 6:01 p.m. ET), all AWS services returned to normal operations. Some services such as AWS Config, Redshift, and Connect continue to have a backlog of messages that they will finish processing over the next few hours. We will share a detailed AWS post-event summary.”

What caused the AWS outage?

The exact reason AWS initially went down remains unknown, but according to Mashable, services using AWS were unable to access DynamoDB, an Amazon-run database, because the Domain Name System (DNS) had a problem. The DNS effectively translates website names into IP addresses. So when Amazon wrote on its Health Dashboard that the DNS issue had been “fully mitigated,” it’s saying the real problem was fixed.

“Amazon had the data safely stored, but nobody else could find it for several hours, leaving apps temporarily separated from their data,” Mike Chapple, an IT professor at University of Notre Dame, told CNN. “It’s as if large portions of the internet suffered temporary amnesia.”

Rafe Pilling, the director of threat intelligence at the cybersecurity firm Sophos, told The Guardian that the incident didn’t appear to be a cyberattack or anything nefarious, which is aligned with Amazon’s statements. “When anything like this happens the concern that it’s a cyber incident is understandable,” he told the U.K. outlet. “AWS has a far-reaching and intricate footprint, so any issue can cause a major upset.”

It’s likely Amazon will, at a later time, explain what happened Monday further. It’s unclear how the 10:35 a.m. ET “network connectivity issues” are related, if at all, to the initial issue with the DNS, though it feels reasonable to assume issues could arise as services worked to return to normal.

Why is an AWS outage such a big deal?

In short: AWS is a central pillar of the modern internet. Without it, things crash. As major companies gobbled up market share, it actually made the infrastructure on the internet remarkably fragile — an issue with AWS, or Google, or Microsoft, or Crowdstrike means issues for tons of users.

Advocates even argue that such reliance on these big players is a free speech issue. “We urgently need diversification in cloud computing,” said Dr. Corinne Cath-Speth, head of digital human rights organization Article 19, according to The Guardian“The infrastructure underpinning democratic discourse, independent journalism, and secure communications cannot be dependent on a handful of companies.”

The long and short of it: If something goes wrong with AWS, a lot goes wrong everywhere else. And that’s really the biggest problem.


Photo Credit: Gil C / Shutterstock.com