The October AWS Outage

13746983475?profile=RESIZE_400xIn the early morning hours of 20 October, issues regarding a single service allegedly caused major disruptions to the basic things that make our lives functional.  Canvas crashed, disrupting learning nationwide.  Lloyds Bank customers lost access to their accounts.  Some United Airlines flyers could not check in or view their reservations.  People's alarms didn't go off.  There are too many examples to list, it was a full meltdown.  To some, what happened on 20 October was an example of Big Tech being too big.  If an AWS outage can cause such widespread issues, that may be a problem.[1]  "If a company can break the entire internet, they are too big. Period," wrote Democratic Sen.  Elizabeth Warren on X.  "It's time to break up Big Tech."  Political overreach?  Maybe.

One of the victim platforms, Canvas, is a widely used online learning management system, was among the most visible casualties of the outage. As AWS services faltered, the platform became inaccessible to students and educators across the United States, leading to canceled classes and disrupted academic schedules. The abrupt loss of access to Canvas highlighted just how deeply integrated cloud infrastructure is in the daily function of modern education.

20 October, Monday morning: Amazon confirmed AWS was experiencing issues by late Monday morning.  The company wrote it was investigating "the root cause for the network connectivity issues that are impacting AWS services such as DynamoDB, SQS, and Amazon Connect."

13746983859?profile=RESIZE_400xTwo steps Forward, One step Backwards - Nearing midday, it appeared the issue was over.  But then Amazons's AWS Health Dashboard indicated problems had resurfaced.   "We have confirmed multiple AWS services experienced network connectivity issues in the US-EAST-1 Region," read an update around 10:30 am ET.  "We are seeing early signs of recovery for the connectivity issues and are continuing to investigate the root cause."

It then appeared that AWS was seeing issues again, though not on the scale of the outage in the earlier hours.  Some services, such as Venmo and Boost Mobile, saw a corresponding jump in user-reported issues on Downdetector.

Monday afternoon: Amazon indicated its AWS services were well on the way to fully recovering.  "We continue to observe recovery across all AWS services," the company wrote.  It did note customers may still face "intermittent function errors" with Lambda, its serverless computer service.

Monday evening: The latest updates from Amazon indicated its AWS services were progressing toward full resolution.  "Service recovery across all AWS services continues to improve," the company wrote.  It noted it was continuing to "reduce throttles" on certain affected tools.

It was a remarkably chaotic Monday for AWS.  The popular cloud platform saw a major outage in the early morning hours, briefly recovered, and then experienced new problems around midday.

AWS saw a major outage in the early hours of Monday October 20th, a temporary recovery, and then further issues as the East Coast neared midday.  In a short explanation, any problem with AWS means major issues for large swaths of the Internet.  Sites and services such as United Airlines, Snapchat, McDonald's, Verizon, Venmo, and countless others all saw spikes in user-reported issues on Downdetector.

While the internet is vast, there are a few pillars within it.  AWS is perhaps the chief among them, that can lead to large, disruptive downstream effects should they experience problems.

During the incident, Amazon said that “its continued efforts to remedy issues with its AWS services appeared to be working, noting it saw ‘decreasing networking connectivity issues,’ in its most recent update on its status page.”  But users were still reporting a relatively high number of issues with AWS on Downdetector, though many third-party services apparently affected by the AWS outage appeared to be recovering.

The company kept informing the public throughout the day that its "mitigations to resolve launch failures" were progressing and that it expected "launch errors and network connectivity issues to subside" as it worked to apply fixes more widely.  "We continue to apply mitigation steps for network load balancer health and recovering connectivity for most AWS services," read the latest update from the AWS status page.

Mike Chapple, an IT professor at the University of Notre Dame, said that further issues surfacing after the initial outage is not necessarily a surprising development.  "While this is disruptive, it isn't unusual.  The process of fixing a serious IT infrastructure issue often creates new problems and fixes often need to be rolled out across a large number of systems over time," Chapple said in an emailed statement.   "As engineers work to steady the system, operations slowly stabilize and things return to normal.    Think of it like a utility outage that occurs in a large city.  The power might flicker on and off a few times as repair crews do their work.  We're seeing something similar now with AWS."

Despite initial successes in mitigation, widespread service disruptions across the internet continued.  User-reported issues have spiked for several popular services, according to Downdetector, including FanDuel, Snapchat, Apple Music, Asana, Verizon, and many more.  The renewed AWS problems appeared to be significant and once again caused problems for a large number of users.

The event caused massive problems for internet users starting their workweek on a Monday.  Since AWS powers huge portions of the internet, the list of services and sites that suffered outages was quite alarming.  According to user-reported issues at the site Downdetector, affected services include United Airlines, AT&T, Fortnite, Disney+, HBO Max, Signal, Snapchat, McDonald's, Verizon, Venmo, and many more.  Amazon services like Prime and Alexa were affected, too.  Many individuals and companies were affected in some way.

Nearly everything many own is internet-connected, our refrigerators are WiFi-enabled billboards, meaning an AWS outage can disrupt large areas of people’s lives.

So, what caused the AWS outage?  AWS engineers traced the issue to a DNS resolution problem affecting the DynamoDB API endpoint in the US-East-1 region. The failure also disrupted other AWS services and global features dependent on that region, including IAM updates and DynamoDB Global Tables.[2]  The exact reason AWS initially went down remains unknown, but some experts have an idea.  Services using AWS were unable to access DynamoDB, an Amazon-run database, because the Domain Name System (DNS) had a problem.  

The Domain Name System (DNS) is a foundational component of internet infrastructure, acting as the directory for the web.  When users type a website address into their browser, DNS translates that human-readable domain, like www.example.com, into the numerical IP address that computers use to locate servers.  This process enables seamless navigation across the internet, allowing users to access websites, services, and applications without needing to remember complex strings of numbers.

If DNS fails or is disrupted, users may find themselves unable to reach their desired websites or services, even if the underlying servers are functioning correctly.  This makes DNS reliability critical, especially for cloud providers that support vast portions of the internet’s daily operations.

The DNS effectively translates website names into IP addresses.  So when Amazon wrote on its Health Dashboard that the DNS issue had been "fully mitigated," it's was saying that the real problem was fixed.  "Amazon had the data safely stored, but nobody else could find it for several hours, leaving apps temporarily separated from their data," Mike Chapple, said an IT professor at University of Notre Dame.  "It's as if large portions of the internet suffered temporary amnesia."

Rafe Pilling, the director of threat intelligence at the cybersecurity firm Sophos, reported that the incident didn't appear to be a cyber-attack or anything nefarious, which is aligned with Amazon's statements.  "When anything like this happens the concern that it’s a cyber incident is understandable," he told an UK outlet.  "AWS has a far-reaching and intricate footprint, so any issue can cause a major upset."

It's likely Amazon will eventually fully explain what happened.  It's currently unclear how the 10:35 a.m. ET "network connectivity issues" were related, if at all, to the initial issue with the DNS, though it feels reasonable to assume issues could arise as services worked to return to normal.

Why is an AWS outage so serious?  In short: AWS is a central pillar of the modern internet.  Without it, things crash.  As major companies gobbled up market share, it actually made the infrastructure on the internet surprisingly fragile; this an issue with AWS, or Google, or Microsoft, or Crowdstrike, which transfers to many, many users.

"We urgently need diversification in cloud computing," said Dr. Corinne Cath-Speth, head of digital human rights organization Article 19.  "The infrastructure underpinning democratic discourse, independent journalism, and secure communications cannot be dependent on a handful of companies."

Article 19 is an international human rights organization that advocates freedom of expression and access to information.  Named after Article 19 of the Universal Declaration of Human Rights, the group works to defend and promote rights essential for open and democratic societies.  Its efforts include policy work, litigation, and campaigns aimed at protecting journalists, supporting digital rights, and ensuring transparency in government and business.  Monopolies do have repercussions.

The exact technical cause of the 20 October 2025 AWS outage has not been immediately disclosed (as of 21 October).  Amazon's statements referenced "network connectivity issues" and "launch failures," indicating that the disruption originated from problems within AWS's networking infrastructure.  Throughout the day, Amazon applied mitigation steps to address network load balancer health and restore connectivity for most services.  As fixes were implemented, some services experienced a backlog of messages and intermittent errors, suggesting that the outage's root was a complex infrastructure issue likely related to network management and service orchestration.

Further details were promised in a post-event summary by Amazon, but as of the timeline provided, the company was still working through the recovery and did not release a full technical breakdown of the incident.  The nature of the outage, its impact on a wide range of internet services, highlights how critical AWS's networking components are to the broader web ecosystem.

Long and short of it: ‘If something goes wrong with AWS, a lot goes wrong everywhere else.’

10 28 2025 - AWS Summary of the Outage:  Summary of the Amazon DynamoDB Service Disruption.pdf

Amazon provided a detailed summary of the service disruption that occurred in the N. Virginia (us-east-1) Region on October 19 and 20, 2025.   While the event started at 11:48 PM PDT on October 19 and ended at 2:20 PM PDT on October 20, there were three distinct periods of impact to customer applications. First, the Amazon DynamoDB experienced increased API error rates in the N. Virginia (us-east-1) Region.  Next on October 20, the Amazon Network Load Balancer (NLB) experienced increased connection errors for some load balancers in the N. Virginia (us-east-1) Region.  This was caused by health check failures in the NLB fleet, and that resulted in increased connection errors on some NLBs.    Finally later on October 20, new EC2 instance launches failed and, while instance launches began to succeed from late morning, some newly launched instances experienced connectivity issues which were resolved later that day. 

This article is shared with permission at no charge for educational and informational purposes only.

Red Sky Alliance is a Cyber Threat Analysis and Intelligence Service organization.  We provide indicators of compromise information via a notification service (RedXray) or an analysis service (CTAC).  For questions, comments or assistance, please contact the office directly at 1-844-492-7225, or feedback@redskyalliance.com    

Weekly Cyber Intelligence Briefings:

Weekly Cyber Intelligence Briefings:

REDSHORTS - Weekly Cyber Intelligence Briefings

https://register.gotowebinar.com/register/5207428251321676122

[1] https://mashable.com/article/aws-outage-update-amazon-what-happened-why

[2] https://www.zerohedge.com/technology/internet-outage-sparked-operational-issues-amazon-aws-data-centers-northern-virginia

E-mail me when people leave their comments –

You need to be a member of Red Sky Alliance to add comments!