Amazon Says One Engineer's Simple Mistake Brought the Internet Down

Image: Gizmodo

Roughly 48 hours after its major service outage, Amazon is admitting what caused the problem. Apparently, some poor engineer at Amazon Web Services (AWS) did an oopsie and brought the internet to its knees. Oopsies are the worst!

In all seriousness, it’s a sobering story. Here’s how Amazon described it in a recent blog post:

At 9:37AM PST, an authorized S3 team member using an established playbook executed a command which was intended to remove a small number of servers for one of the S3 subsystems that is used by the S3 billing process. Unfortunately, one of the inputs to the command was entered incorrectly and a larger set of servers was removed than intended.

Advertisement

We’ve all been there. You push the wrong button and end up getting Sprite instead of Coke. But this poor guy or gal probably made an errant keystroke that crippled AWS for at least four hours. Since about a third of all internet traffic reportedly flows through AWS servers, deleting a whole bunch of those servers screwed up a few people’s days.

In theory, a series of failsafes should keep the fallout from such errors localized, but Amazon says that some of the key systems involved hadn’t been fully restarted in many years and “took longer than expected” to come back online.

The company now claims it’s “making several changes as a result of this operational event.” One of these changes will involve modifying a tool so that a large number of servers can’t be deleted at once. Which makes total sense, but still doesn’t solve the problem of unknown unknowns (like, say, a slower than expected restart) on an internet that relies so heavily on a single service.

Advertisement

In the meantime, let this serve as a shoutout to that poor AWS engineer who made a tiny mistake that led to major consequences. We’re having a rough year, too.

We’ve reached out to Amazon to find out more details about the incident, specifically the fate of the poor engineer who caused the problem. We’ll update this post when we hear back.

Advertisement

[Amazon]

Share This Story

About the author

Adam Clark Estes

Senior editor at Gizmodo.

EmailTwitterPosts
PGP Fingerprint: 91CF B387 7B38 148C DDD6 38D2 6CBC 1E46 1DBF 22A8PGP Key
OTR Fingerprint: D9330D9B 6CF5E271 7FAC6194 DAA9B51B E09A99B2