THE CLOUD CARTEL

Three companies control 67% of the internet's foundation. This isn't just a market imbalance. It's a single point of failure for the global economy.

The Triumvirate

The global cloud market isn't a diverse ecosystem. It's an oligopoly. Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP) form a triumvirate that holds unprecedented power over the digital world.

Source: Synergy Research Group, Q4 2024

Anatomy of an Outage

The cloud is sold as infinitely resilient. The reality is far more fragile. A single human error, a software bug, or a cooling failure can trigger a catastrophic cascade, shutting down vast swathes of the internet. These aren't hypotheticals; they are history.

Feb 2017: The Typo That Broke The Internet

Provider: Amazon Web Services (AWS)

An engineer entered a command incorrectly, intending to remove a few servers. Instead, the typo took a massive chunk of the S3 storage system offline in the critical US-EAST-1 region.

The Fallout:

$0M

Estimated loss for S&P 500 companies.

Mar 2021: The Lost Key

Provider: Microsoft Azure

An automated process to rotate cryptographic keys went wrong. A crucial key for Azure Active Directory (AD) was deleted, locking millions of users out of Office 365, Teams, and countless other services for 14 hours.

The Blast Radius:

Global Authentication Failure

Impacted services from Xbox Live to enterprise applications.

Jul 2022: The London Heatwave

Provider: Google Cloud Platform (GCP)

During a record-breaking heatwave, cooling systems failed in a London data center. To prevent permanent hardware damage, Google was forced to shut down a portion of its `europe-west2` region, causing multi-day disruptions.

The Vulnerability:

Physical World Intrusion

Proved the cloud is still vulnerable to physical and environmental threats.

The Golden Handcuffs

The Triumvirate's dominance isn't just about scale. It's enforced by strategic business practices designed to make switching providers technically difficult and financially punishing. This is "vendor lock-in."

The Egress Tollbooth

Getting data into the cloud is free. Getting it out? That'll cost you. Providers charge high "egress fees" for transferring data out of their network. This acts as a massive financial barrier, discouraging customers from moving to a competitor or adopting a true multi-cloud strategy.

A Tax on Freedom

Restrictive Licensing

Microsoft, in particular, is criticized for using its dominance in enterprise software (like Windows Server) to push customers to Azure. Licensing terms can make it up to 5 times more expensive to run the same software on a rival cloud like AWS, creating a powerful, anti-competitive incentive.

An Unfair Advantage

The Regulators Awaken

Governments are finally taking notice. Antitrust authorities from the US to the EU and UK are launching investigations into the cloud market's lack of competition, targeting the very lock-in mechanisms that sustain the oligopoly.

USA

The FTC has launched a formal inquiry into anti-competitive practices, with a focus on egress fees and software licensing.

United Kingdom

The CMA's market study cited "significant concerns" about the dominance of AWS and Microsoft, triggering a deeper investigation.

European Union

The groundbreaking Digital Markets Act (DMA) and Data Act directly target vendor lock-in, aiming to make switching providers "fast, free and technologically fluid."

The Typo That Broke The Internet

AWS S3 Outage | February 28, 2017

It began with a routine debugging task. An AWS engineer needed to remove a small number of servers from the S3 billing system. They used a standard script.

But a single mistyped character in the command changed everything. Instead of targeting a handful of servers, the command was directed at a much larger set of critical S3 subsystems in the US-EAST-1 region—the oldest and most important AWS data center hub.

The command executed. The servers vanished from the network.

The two subsystems that were accidentally crippled were essential for all S3 operations. One managed metadata and location information for all S3 objects. The other managed storage allocation. Without them, the entire S3 service in the region was blind and paralyzed.

The cascading failure was immediate. Thousands of websites, applications, and services that rely on S3 for storing images, files, and data went down. The list included Adobe, Atlassian (Jira, Confluence), Docker, Expedia, Slack, and even Apple's iCloud. 54 of the top 100 online retailers were hit.

The ultimate irony? The AWS Service Health Dashboard, the very tool customers use to check for outages, also relies on S3. It turned red to indicate a problem, and then it too went down, leaving everyone in the dark.

The Lost Key

Azure AD Outage | March 15, 2021

Azure Active Directory (AD) is the digital gatekeeper for Microsoft's cloud. It handles logins and permissions for almost everything: Office 365, Teams, Dynamics, Xbox, and countless non-Microsoft applications that use it for authentication.

To keep this system secure, Microsoft uses cryptographic keys that are regularly "rotated" or updated. On March 15, 2021, an automated process to perform this rotation contained a fatal flaw.

During a complex key migration, the automation was supposed to retain an older, still-active key while a new one was deployed. Instead, a bug caused the automation to incorrectly delete the old key.

The moment the key was removed, the system that validates user login tokens broke. Anyone trying to log in or renew an existing session was rejected. The digital gatekeeper had slammed shut and lost its own key.

For 14 hours, a significant portion of the world's businesses and users were locked out of their most critical tools. The outage highlighted a terrifying reality: centralizing authentication into a single service creates a catastrophic single point of failure. When Azure AD goes down, the Microsoft ecosystem goes down with it.

When The Cloud Overheats

GCP Cooling Failure | July 19, 2022

The term "cloud" evokes an image of something ethereal and intangible. This incident was a harsh reminder that the cloud is intensely physical, housed in massive buildings packed with heat-generating servers that require constant, industrial-scale cooling.

In July 2022, London experienced an unprecedented heatwave, with temperatures soaring above 40°C (104°F). One of the data centers hosting Google's `europe-west2` region couldn't cope.

The building's cooling systems failed. As temperatures inside the server rooms climbed to dangerous levels, Google's automated systems made a drastic decision to prevent a "permanent loss of machines": it began shutting down hardware.

This wasn't a software bug or a human error, but a physical failure. A portion of the region went offline, causing errors and outages for customers relying on services like Google Compute Engine and Cloud Storage. The recovery was slow and painful, lasting for days.

The event was a stark demonstration that for all their digital sophistication, data centers are bound by the laws of physics. They are vulnerable to the same environmental threats—heatwaves, floods, power grid failures—as any other piece of critical infrastructure, shattering the illusion of the cloud's invincibility.