
Amazon Web Services (AWS) powers a significant portion of the internet. But even the world’s most dominant cloud provider isn’t immune to outages.
If you’re looking to keep your systems resilient, it’s essential to monitor not only AWS service status, but also the real-time health of third-party providers and even your own product. With tools like StatusGator, you can monitor multiple cloud services, including AWS, Azure, and Cloudflare.
At StatusGator, we have been monitoring AWS outage history since 2015. In this article, we did a 5-year overview of service downtime. But upon signing in, you will get access to 10 years of historical uptime of Amazon Web Services.
For a deeper understanding of AWS as a platform, you could also check out AWS facts and stats, and analysis on the least reliable AWS region.
Now, let’s dive into the history of AWS outages.
February 2025: Major Networking Disruption in eu-north-1 (Stockholm)
On February 13, 2025, at approximately 23:40 UTC, AWS customers in the eu-north-1 region (Stockholm) began experiencing elevated error rates and increased latency across a wide range of AWS services.
The disruption primarily affected intra-region traffic, particularly within Availability Zone eun1-az3, although some services in other zones were also impacted.
Affected services included a broad spectrum of core AWS infrastructure, such as:
- Amazon EC2, S3, DynamoDB, and RDS
- AWS Lambda, CloudWatch, API Gateway, and Route 53
- Amazon Redshift, Kinesis Data Streams, EventBridge, Elastic Load Balancing
- And dozens more, spanning networking, compute, storage, and serverless products
AWS engineers quickly identified the root cause as a networking issue internal to the region. While connectivity into and out of eu-north-1 remained operational, internal traffic between services, especially within eun1-az3, was significantly disrupted. AWS advised customers to route workloads away from this Availability Zone where possible.
By 00:46 UTC on February 14, AWS reported early signs of recovery, with error rates and latencies gradually improving. Recovery continued over the following hours, with full service restoration confirmed by 04:05 UTC.
This outage highlights how intra-region networking faults, even when external connectivity remains unaffected, can cascade across multiple AWS services that rely on high-speed, low-latency internal communication.
July 2024: Kinesis Outage in us-east-1 Causes Widespread AWS Service Disruptions
On July 30, 2024, AWS suffered a major service disruption in its us-east-1 or the North Virginia region. It lasted nearly seven hours due to a failure in an internal cell of Amazon Kinesis Data Streams.
The outage began around 2:45 PM PDT and was fully resolved by 9:37 PM PDT. It impacted a broad range of services that depend on Kinesis as a backbone for real-time data processing.
The issue stemmed from a newly upgraded Kinesis architecture designed to improve scalability and fault isolation.
A routine deployment exposed a flaw in how the system handled an unusual workload: a very large number of low-throughput shards. This caused Kinesis’ cell management system to misinterpret the health of hosts, leading to an overload in shard redistribution and ultimately degrading traffic processing.
As a result, core AWS services like CloudWatch Logs, Amazon Data Firehose, Amazon S3 event notifications, AWS Lambda, ECS, Amazon Redshift, and AWS Glue experienced increased latency and error rates. For example:
- CloudWatch Logs suffered delayed log processing, which in turn affected Lambda logs, ECS tasks, and API Gateway logging.
- Lambda customers were unable to view real-time logs and experienced function errors if logging operations were blocked.
- ECS tasks using blocking log drivers failed health checks and were terminated.
- Redshift, MWAA, and Glue encountered API failures and degraded performance.
Although data was not lost, the incident significantly impacted observability, application behavior, and service availability across a wide swath of AWS’s most commonly used features.
This event underscores the risks of architectural complexity and the challenges of updating core infrastructure in the high-stakes us-east-1 region.
June 2023: AWS us-east-1 (North Virginia) Regional Outage
On June 13, 2023, Amazon Web Services (AWS) experienced a widespread outage in its us-east-1 (N. Virginia) region. It affected many services and high-profile organizations, including The Boston Globe, the New York MTA, and the Associated Press.
The incident began around 19:11 UTC, when multiple AWS services began reporting increased error rates and latencies.
Amazon Connect was particularly impacted. The callers couldn’t connect, chats failed to initiate, and agents had login issues.
AWS Lambda, along with services like EventBridge, SQS, CloudWatch, IAM, and many more, experienced significant delays, especially with asynchronous invocations.
Although many services recovered within a few hours, AWS worked through a large backlog, with full recovery of some functions stretching into the evening.
As we explored the reliability of AWS regions, North Virginia gained quite a lot of attention.
December 2022: AWS Outage in US-East-2 Region Raises Cloud Resilience Questions
On December 5, 2022, AWS experienced a cloud outage in its US-East-2 availability zone, affecting Internet connectivity and site-to-site VPN services. The outage began at 12:26 p.m. PST and lasted exactly 40 minutes, during which time customers reported connectivity issues across applications and services relying on AWS infrastructure in the region.
Although relatively short, this disruption underlined the broader challenge of maintaining service continuity in a cloud computing environment. As AWS is the world’s leading cloud service provider, even localized issues can ripple across the digital ecosystem, interrupting operations for enterprises and users alike.
Notably, AWS declined to share a detailed root cause analysis or a post-event summary. Instead, they cited their policy to only publish such reports for incidents with significant control plane or infrastructure impact.
This lack of transparency leaves customers relying on external tools like StatusGator’s health dashboard to track service availability and understand the broader implications of such events.
Adding to the concern, this outage was not an isolated incident. Just months earlier, there was a major disruption. The lessons and takeaways from the US-East-2 events stress the importance of preparing for cloud provider failures and embedding business continuity into both technical and strategic planning.
July 2022: AWS us-east-2 (Ohio) Outage
On July 28, 2022, AWS faced a power outage in Availability Zone 1 (AZ1) of its us-east-2 (Ohio) region. This led to some serious network connectivity problems.
The incident disrupted Amazon EC2 services, causing higher error rates and latency for EC2 APIs, and impacted customers who depended on the Columbus, OH data center.
Even though the power issue lasted around 20 minutes, some services took as long as three hours to get back on track. The outage resulted in partial or complete downtime for various third-party applications hosted on AWS, such as Webex, Okta, Splunk, and BambooHR.
Users of Webex in the affected area reported difficulties with messaging, authentication, and file sharing. Okta experienced widespread access problems due to connectivity issues and SSL/HTTP errors.
This incident serves as a reminder of the ripple effects that AWS outages can have on services relying on EC2, underscoring the importance of keeping an eye on multiple cloud status pages and getting instant alerts about downtime to ensure business continuity.
December 2021: Major Multi-Phase AWS Outages Disrupt Global Services
On December 7, a major outage originating in the us-east-1 region disrupted a wide range of AWS services and third-party applications that rely on them. The incident unfolded in two phases.
The first began around 15:35 UTC, when AWS users observed degraded performance across services like EC2, DynamoDB, and the AWS Console. Although some functionality appeared to return within an hour, a second, more severe wave followed. It lasted over 7 hours and was resolved by 00:44 UTC on December 8.
The root cause was tied to elevated API error rates and latency within AWS API Gateway services, which serve as a backbone for many AWS-hosted applications.
These failures didn’t stem from internet connectivity or network path issues, but rather internal infrastructure degradation. Even during the height of the outage, some users were still able to reach services, illustrating how a distributed service architecture can produce uneven impacts.
Just a few days later, on December 10, AWS suffered a second outage. It lasted over an hour, starting at 13:05 UTC. This time the event was less severe than the December 7’s, but it caused widespread disruptions. Users saw intermittent recovery, followed by renewed 500 server errors, further highlighting the cascading effects of API service degradation within a cloud environment.
These incidents demonstrate the growing complexity of cloud infrastructure and the importance of independent monitoring tools. During both outages, AWS’s status page was slow to reflect the scope of the disruptions.
The two significant outages that AWS experienced in December 2021 underscored the fragility of cloud infrastructure. That’s why cloud status monitoring becomes so important for companies.
September 2021: 8-Hour EBS Outage Causes Cascading Failures in us-east-1
On Sunday, September 26, 2021, AWS experienced a significant service degradation in the North Virginia region (us-east-1). The cause was the stuck IO issue in Amazon Elastic Block Store (EBS). The problem originated in a single Availability Zone but had cascading effects across multiple services that rely on EBS for storage.
The incident lasted approximately eight hours and impaired Amazon EC2 instances, both existing and new ones. AWS warned that some EC2 instances could experience impairment while new instance launches might fail altogether.
Because services like Amazon RDS, Amazon ElastiCache, and Amazon Redshift depend on EBS volumes, these too were affected.
This outage highlights the interconnected nature of AWS infrastructure and how a localized storage issue can ripple through dependent services. It also serves as a cautionary tale for over-reliance on the us-east-1 region, a frequent hotspot for high-impact outages.
November 2020: AWS Kinesis Outage Disrupts Dozens of Cloud Services
On November 25, 2020, AWS experienced one of its most far-reaching outages, starting at 6:36 AM PST. The root cause was traced to Amazon Kinesis Data Streams, which suffered increased API error rates and latency in the us-east-1 region. Because many AWS services depend on Kinesis, the incident triggered a cascade of failures across the AWS ecosystem.
By 7:30 AM PST, the disruption spread to CloudWatch, Cognito, IoT Core, EventBridge, and other critical services. By 8:05 AM, the outage had escalated to include over two dozen services, including:
- Lambda, API Gateway, ACM, AppStream2, Athena
- CloudFormation, CloudTrail, DynamoDB, SageMaker
- Support Console, Managed Blockchain, and many more
AWS stated that the issue also impaired its ability to post updates to the Service Health Dashboard, delaying communication with customers during the height of the disruption.
As the API failures persisted, customers experienced degraded functionality across a wide range of applications and platforms. Major companies publicly affected by the outage included 1Password, Coinbase, Adobe Spark, Roku, Pocket, Anchor, Glassdoor, and The Washington Post. Reports on Downdetector confirmed widespread disruptions throughout the day.
By 6:23 PM PST, AWS had mitigated the issue affecting Kinesis request processing, although throttling remained in place. Recovery of CloudWatch metrics lagged behind due to backlog delays.
Full restoration was confirmed by 4:18 AM ET on November 26, except for lingering issues for IoT SiteWise.
Although the incident affected only one AWS region, its ripple effect was global. Services depending on real-time data ingestion, authentication, and monitoring were heavily impacted. It highlighted how deep service interdependencies can turn a localized issue into a system-wide failure.
AWS provides powerful cloud infrastructure, but outages like this demonstrate the need for independent monitoring and early warning signals across all your dependencies. With StatusGator, you can:
- Aggregate status pages for all your cloud vendors, SaaS tools, and third-party APIs
- Monitor specific regions (like US-EAST-1) or individual components (e.g., Kinesis, CloudWatch)
- Receive downtime alerts instantly, even before the cloud provider officially confirms the issue
- Reduce support ticket burden by proactively informing your users of service disruptions
By using StatusGator, engineering and IT teams can stay ahead of outages, reduce MTTR, and maintain business continuity during cloud incidents like the AWS Kinesis failure of 2020.
Conclusion
While AWS has built a reputation for redundancy and resilience, these incidents raise critical questions about capacity management subsystems, network paths, and the transparency of AWS incident communications.
Enterprises leveraging AWS need to consider architectural safeguards such as multi-cloud strategies, open-source tools over proprietary ones like Amazon’s DynamoDB, and proactive outage impact mitigation.
We also saw how a regional AWS outage (June 13, 2023) can ripple across dependent services, causing widespread disruption.
It underscores the importance of monitoring multiple AWS service status pages and setting up instant downtime alerts to stay informed and reduce operational risk during cloud service disruptions.
In today’s cloud-first world, a major AWS disruption, even in a single availability zone, can affect many AWS services and cause cascading issues.
For organizations that depend on AWS, early visibility into service degradation can make all the difference in incident response and customer trust.
Tools like StatusGator provide proactive monitoring with real-time alerts, Early Warning Signals, and historical insights to help you stay ahead of outages.
Whether you’re managing infrastructure in AWS, relying on Azure, or operating on the Cloudflare edge, StatusGator empowers you to detect downtime quickly, reduce support burden, and maintain uptime transparency.
Stay informed. Stay resilient. Monitor smarter with StatusGator.
FAQ
How to check AWS maintenance history?
To check the AWS maintenance history, review the AWS Service Health Dashboard or StatusGator. The AWS official status page shows past events such as service degradations, outages, and maintenance windows. However, AWS only provides limited historical data, and it’s not easy to search or aggregate across services and regions.
A better alternative is using StatusGator, which maintains a searchable history of AWS status changes across all regions and services. You can view historical data on specific services, regions, or your entire AWS footprint in one place.
How to check the AWS outage?
To check if AWS is down or experiencing an outage, you can:
1. Visit the AWS Status Page
2. Use StatusGator to monitor real-time AWS outages, including early warning signals before official announcements are posted.
StatusGator aggregates AWS service status in real-time and can alert you via Slack, Teams, email, or webhook when an AWS region or service experiences downtime. You can also track AWS along with third-party tools like Cloudflare, Azure, and your own SaaS stack.
Why do AWS outages happen?
AWS outages happen from a handful of root causes, including:
– Networking issues (e.g., intra-region traffic failures, DNS problems)
– Code bugs or faulty deployments (often from changes in underlying systems)
– Capacity or resource exhaustion
– Dependency failures within AWS or third-party services
– Physical infrastructure issues (like power or cooling failures)
Because AWS operates at a massive scale, even minor glitches can have cascading effects across multiple services and regions. That’s why it’s crucial to have proactive monitoring in place. With StatusGator, you can detect outages before they’re officially acknowledged, helping you respond faster and reduce downtime impact
















