AWS Outage Explained: How a Cooling Failure in Virginia Disrupted Major Apps

23

Amazon Web Services (AWS) has provided a technical explanation for the widespread service disruption that occurred in late May 2026. The outage, which spanned from Thursday into Friday, was triggered by a “thermal event” at a single data center in Northern Virginia, leading to a complete loss of power.

This incident highlights the fragility of cloud infrastructure dependencies, even for systems designed with redundancy. When core physical components fail, the ripple effects can instantly paralyze major consumer-facing platforms, from financial exchanges to sports betting apps.

The Technical Breakdown

According to an official update on the AWS status page, the root cause was a failure in the cooling systems at one specific facility. This overheating event forced a critical safety protocol: Amazon had to shift traffic away from the affected Availability Zone during the late afternoon on Thursday to prevent further hardware damage.

The resolution process focused entirely on restoring physical infrastructure before digital services could resume.

  • Initial Response: Traffic was rerouted away from the compromised zone.
  • Restoration: By early afternoon on Friday, engineers stabilized the cooling systems to pre-event levels.
  • Service Recovery: This stabilization allowed AWS to restore the majority of impaired EC2 instances (virtual servers) and EBS volumes (storage).

“Our main effort during the event mitigation strategy was to bring back our cooling systems capacity. By May 8 1:50 PM, we were able to stabilize cooling system capacity to pre-event levels, which helped us to restore the majority of the impaired EC2 instances and EBS volumes,” Amazon stated.

While the bulk of services were restored, the company noted that a small number of instances and storage volumes remained impaired as recovery efforts continued.

Impact on Users and Businesses

The outage was not just a backend technical issue; it had immediate, tangible consequences for end-users. Several high-profile applications hosted on AWS went offline or experienced significant degradation, including:

  • FanDuel: Sports betting platforms were unable to process bets.
  • Coinbase: Cryptocurrency trading was disrupted, preventing users from executing trades.

For users of these platforms, the downtime caused significant frustration and potential financial uncertainty, particularly during active trading hours or live sports events. However, as AWS stabilized its infrastructure, these services gradually returned to normal operation.

Why This Matters

This incident serves as a reminder that cloud computing is still dependent on physical hardware. While AWS and other providers build extensive redundancy across multiple zones and regions, a catastrophic failure in a single zone—such as a cooling system collapse—can still cause significant localized outages.

For businesses relying on these platforms, the takeaway is clear: while AWS is robust, it is not invincible. The rapid shift of traffic and subsequent restoration demonstrate the effectiveness of AWS’s mitigation strategies, but the initial disruption underscores the importance of multi-region architectures for mission-critical applications.

In summary, a cooling failure in Northern Virginia caused a temporary but significant AWS outage, disrupting major apps like FanDuel and Coinbase until physical systems were stabilized and services restored.