
As one of the world’s leading cloud platforms, Google Cloud Platform (GCP) powers thousands of businesses, applications, and services globally. While Google Cloud is known for its reliability, no cloud provider is immune to service disruptions.
In this post, we dive deep into the history of Google Cloud outages, highlighting the most significant incidents starting from 2020. We’ll explore the scope, causes, durations, and impacts of each major disruption. The timeline of events is based on real data from StatusGator, the platform that monitors and aggregates status pages for 5,000+ cloud services.
Whether you’re a cloud engineer, IT leader, or site reliability team, understanding the outage history of Google Cloud can help you better prepare for future incidents, improve your incident response, and minimize downtime.
Let’s take a look at the most notable Google Cloud service disruptions—and what they reveal about the evolving complexity of cloud infrastructure.
January 2025: Widespread Service Disruptions Hit Apigee, Vertex Gemini, and Cloud Pub/Sub
In early January 2025, Google Cloud experienced a series of interconnected service disruptions affecting Apigee, Vertex Gemini, and Cloud Pub/Sub, with issues spanning over 18 hours across multiple regions and services.
The first signs of trouble began on January 7 at 12:51 PM PT, when users of Apigee and Apigee Edge Public Cloud reported login issues with SAML-based authentication on their integrated developer portals.
The issue affected both Apigee X and Apigee Edge users. For many customers, the disruption blocked access to critical API developer portals. Google acknowledged the issue and initiated an investigation, eventually resolving the login failures by early January 8 at 5:20 AM UTC.
However, during the same time window, another problem emerged involving Vertex Gemini APIs. Starting at around January 7, 10:30 PM UTC, Google Cloud experienced an elevated rate of HTTP 500 errors when accessing Gemini 1.5 Flash and Gemini 1.5 Pro models.
The errors were intermittent but affected the reliability of inference for AI workloads. These issues were ongoing alongside the Apigee problems and compounded disruption for AI-focused customers who used Gemini for application logic or content generation.
Shortly after, on January 8 at 4:00 PM UTC, a new outage affected Google Cloud Pub/Sub. Customers across multiple regions experienced a complete publishing block, rendering them unable to push messages to Pub/Sub topics. Just minutes later, subscription failures occurred, further breaking real-time message flow for applications depending on the service. The outage lasted around 15 minutes, with all Pub/Sub functions restored by 4:15 PM UTC.
Although short, the Pub/Sub incident affected critical real-time pipelines, while the longer outages in Apigee and Gemini APIs had a broader operational impact.
August 2024: Vertex AI and Cloud TPU Services Disrupted Globally
On August 8, 2024, a widespread outage affected Google Cloud’s Vertex AI platform, impacting both Online Prediction and Training services across nearly every major global region.
Beginning around 00:10 UTC, customers in North America, Europe, Asia Pacific, South America, and the Middle East experienced failures when using Vertex AI, particularly when attempting to upload models or access services backed by Cloud TPU (Tensor Processing Unit) infrastructure.
The root of the issue was tied to Cloud TPU service activation failures, which affected users trying to enable or re-enable TPU APIs either through Vertex AI or directly via the Cloud TPU Service/API.
The problem had reportedly begun as early as August 7 at 12:00 PM PT, though the full scope became evident by the next day. Affected users encountered error messages such as:
“Failed to upload model. Contact Vertex AI.”
This incident disrupted both new deployments and ongoing machine learning workflows, as many users rely on TPUs for scalable training and inference in production environments. Notably, the outage impacted multiple Vertex AI Online Prediction and Training regions simultaneously, a rare global-level disruption for such a specialized service.
Google’s engineering team deployed a mitigation strategy shortly after identifying the issue, and by 01:45 UTC on August 8, the outage was marked as resolved.
Although relatively brief, this incident underscores the critical role of TPU services in modern AI pipelines and the potential wide-reaching effects when they go offline.
April 2023: Widespread Disruption Due To Rainy Days in Paris
On April 28, 2023, Google Cloud experienced a significant service outage across multiple regions, beginning with a serious incident in its europe-west9-a zone in Paris. A water intrusion at a Global Switch-operated data center, potentially linked to a fire, triggered a multi-cluster failure, forcing the shutdown of several Google Cloud services.
Among the affected services were Virtual Private Cloud (VPC), Cloud Load Balancing, Cloud NAT, Cloud Key Management Service, Cloud Workflows, and BigQuery Data Transfer Service. Users were unable to access critical infrastructure, and connectivity issues with multiple Google Cloud services were reported.
The status dashboard for Google Cloud listed the incident as ongoing. Customers experienced connectivity issues in multi-region environments, including Europe, Asia, Australia, North America, and South America.
The Google Cloud Networking stack was also impacted, with disruptions in load balancing, speech-to-text, and service directory services.
For many customers relying on Google Cloud services in zone europe-west9-a, the outage meant prolonged downtime, degraded performance, and emergency failover operations to other zones or regions.
Despite the disruption, Google Cloud published updates detailing the issue, although there was no immediate ETA for recovery. The engineering team advised users to reroute workloads away from the affected zone.
In the days following, residual problems continued, especially for Google Cloud Bigtable, as the company worked to restore availability.
August 2022: Data Center Fire Sparks Cloud Logging Issues
On August 8, 2022, a serious electrical incident at Google’s data center campus in Council Bluffs, Iowa resulted in a fire that injured three employees.
While Google stated that the fire was not directly related to the outages affecting Google Search and Maps on the same day, the company’s cloud service infrastructure did experience disruptions shortly after.
By August 10, Cloud Logging, a core Google Cloud service, began to degrade globally. Status updates posted on the Google Cloud status dashboard indicated a DEADLINE_EXCEEDED error across multiple Logs Coliseum endpoints, causing increased latency and slower response times for logging services across multiple regions.
This affected users’ ability to monitor and manage logs, which are essential for debugging and operational awareness across Google Cloud Platform (GCP) environments.
The Google Cloud engineering team acknowledged the incident and continued the investigation, although no workaround was available at the time. Customers were advised to monitor the situation via official status updates, but many were left in the dark as latency persisted.
July 2022: London Heatwave Disrupts Google Cloud Data Center
In mid-July 2022, a historic heatwave swept through the UK, with London temperatures soaring past 40°C (104°F). The extreme weather triggered cooling failures in Google Cloud’s data center located in the europe-west2-a zone, which is based in London. As a result, a wide array of Google Cloud services experienced significant outages, elevated error rates, and latency issues.
The root cause was a cooling system failure in one of the buildings hosting this zone. To protect hardware and ensure system safety, Google shut down some services, causing a ripple effect across dozens of cloud products.
Impacted Services and Timeline
- Google Cloud Storage: Users reported elevated errors when reading objects in the affected region.
- BigQuery: Datasets in europe-west2 were unavailable.
- App Engine & Cloud Functions: High error rates were observed for traffic routed through services like Cloud Pub/Sub, Eventarc, Cloud Tasks, and Cloud Scheduler.
- Persistent Disks (PD): Devices became unavailable in the affected zone, causing IO errors.
- Cloud SQL & Spanner: Some instances were unreachable, particularly non-HA and zonal deployments.
- Cloud Filestore, Cloud Dataproc, and Bigtable: Customers saw degraded performance or complete service unavailability.
- Compute Engine (GCE): Virtual machine instances experienced disruptions; some were unreachable until the issue was mitigated.
- GKE (Kubernetes Engine): Clusters in europe-west2-a suffered partial outages.
- Vertex AI, API Gateway, Cloud Tasks, Looker, and Dataflow: All experienced service degradation, timeouts, or failure to process requests.
Google engineers restored the cooling system on July 19, but full service restoration continued well into July 20, as product teams worked to bring individual services back online. For some products like Cloud Datastore, Looker, and PDs, the full mitigation wasn’t completed until nearly 24 hours later.
Google issued multiple updates and mitigation timelines for each affected product and encouraged customers to fail over to unaffected regions where possible. While the outage was localized to a single zone, the widespread product impact affected businesses and developers across Europe who relied on services tied to europe-west2.
December 2020: Global Outage Disrupts Gmail, YouTube, and Google Cloud Services
On December 14, 2020, a widespread outage struck Google’s services. It disrupted access to core products, including Gmail, YouTube, Google Docs, Google Home, Pokémon GO, and numerous others that rely on Google Cloud infrastructure.
The incident began at approximately 13:35 UTC, with users across the globe reporting failures to log in, send emails, or use integrated Google Workspace tools.
This outage was especially notable for its broad impact across both consumer and enterprise services, illustrating the interconnectedness of Google Cloud’s backend systems.
According to Google’s status updates, the root cause was attributed to an internal storage quota issue, which prevented authentication services from functioning properly, locking out users from accessing many of their accounts.
The disruption lasted for roughly 90 minutes, with full restoration reported by 15:20 UTC. Though brief, the outage highlighted the risks of centralized authentication and the potential ripple effects of Google Cloud issues on a wide array of global platforms and applications.
Conclusion
Outages like this highlight the challenges of managing operations during cloud infrastructure incidents. With StatusGator, organizations gain proactive visibility into disruptions affecting Google Cloud services, including real-time alerts, historical uptime, and personalized monitoring of critical services like Cloud Logging, Google App Engine, and Google Compute Engine.
When cloud customers are experiencing an issue, whether due to a service outage, connectivity problem, or unavailable service, StatusGator helps ensure teams are informed without relying solely on the official Google Cloud service health dashboard.




















