StatusGator October 20, 2025 Outage Postmortem

Published:

October 24, 2025

Updated:

October 24, 2025

Portions of Amazon Web Services went down on Monday, October 20, 2025 for 15 hours, an outage StatusGator’s Early Warning Signals alerted about 10 minutes before they officially acknowledged it.

The outage officially impacted more than 2,000 of the 6,000 services we monitor: everything from Apple to Yahoo. During the incident we sent more than 100,000 notifications to our customers about outages, both acknowledged and unacknowledged.

But StatusGator itself was also impacted during the outage and experienced some downtime. I know a lot of people depend on StatusGator to understand the impact of outages and I am sorry we did not deliver.

StatusGator has been monitoring the world’s infrastructure for 11 years and our platform resiliency has proved vital during many global-scale outages. But in the case of the largest outage in internet history, we were not immune.

We’ve made some changes to ensure ensure reliable service through the next outage of this scale.

Impact Summary

Impact #1: The StatusGator dashboard and status pages experienced approximately 90 minutes of downtime starting 3:50 AM ET on October 20th.

Cause: A victim of our own success, we received an incredible crush of public website traffic. On top of that, the third-party on call notification service we use was affected by the outage and delayed notification to our on-call team member.

Impact #2: The StatusGator dashboard and status pages experienced approximately 120 minutes of downtime starting 10:30 AM ET on October 20th.

Cause: We utilize an upstream provider for multi-AZ container deployment and their load balancer was affected at this point, approximately 7 hours into the AWS outage. We deployed new infrastructure in a new region but before traffic had migrated to it, our provider restored service.

Completed Improvements

Despite what it may seem at first, StatusGator has a lot of redundancy and stability built into it already. We have a robust and resilient architecture with multiple layers of redundancy and we use world-leading IaaS and PaaS providers. We are fortunate that this was the first such cloud outage to impact StatusGator since our inception in 2014.

That said, there is always room for improvement and in this case we have made several changes to ensure we can survive a similar outage in the future.

Set up a second, redundant, on-call notification provider.
For obvious reasons, we do not use StatusGator to monitor StatusGator. We use a third-party monitoring and paging service. Our first outage was primarily caused because we slept through it: The engineer on-call was not notified because the provider we use to alert us was affected by the outage. Once we received the alert, our team awoke and implemented pre-planned traffic mitigations. We’ve implemented a second paging service to account for this future scenario. And our primary provider has also committed to improves in their own resiliency.
Improved public website caching.
The vast majority of traffic that StatusGator receives is from users browsing our public website looking for information about ongoing outages. When there is a large outage a single provider we can serve millions of pages. But when that traffic is spread out across thousands of providers, our caching fell short. We’ve implemented several changes to our caching strategy at several layers to ensure this traffic does not impact customer dashboards and status pages, even during widespread outages like this.
Deployed a second region of our infrastructure.
Monitoring, notifications and background processing, were largely unaffected during the outage but our web traffic was during the second phase. While it was, we spun up a second region of StatusGator to serve as a hot standby. We’ve now rehearsed cut over of that and will add this process to our annual DR plan as required by SOC 2. This means in the case of a similar, multi-AZ outage like this, we can serve web requests again after 3 to 5 minutes of migration time.

Looking Ahead

At the end of the day, StatusGator exists to make outages a little less painful for everyone who relies on the cloud. When we go down, we know we’re adding stress during already difficult moments and that’s something I take personally.

The AWS outage was unprecedented in both scale and duration, but it exposed valuable lessons for everyone affected. We’ve turned those lessons into concrete changes that strengthen our ability to serve you through even the most severe global disruptions.

Reliability is never “done”. It’s a continuous process of learning, testing, and improving. We’re committed to transparency in that process and to providing the most reliable status page aggregator available.

Announcements

Use Cases

Features

Pricing

Integrations

Chat

Embeds

Help Desk

Incident Management

Monitoring

Notifications

Private Status

Status Pages

Advanced

.st0{fill:#252F3E;} .st1{fill-rule:evenodd;clip-rule:evenodd;fill:;} AWS status

.cls-1{fill:url(#linear-gradient);}.cls-2{fill:url(#linear-gradient-2);}.cls-3{fill:#2684ff;} Opsgenie

.st0{fill:#252F3E;} .st1{fill-rule:evenodd;clip-rule:evenodd;fill:;} AWS status

.cls-1{fill:url(#linear-gradient);}.cls-2{fill:#2684ff;} Atlassian Statuspage

StatusGator October 20, 2025 Outage Postmortem

Impact Summary

Completed Improvements

Looking Ahead

Recent posts

AWS status

Opsgenie

AWS status

Atlassian Statuspage