Why the Internet Disappeared Last Week... Again

Illustration for article titled Why the Internet Disappeared Last Week... Again

A week ago, the internet fell victim to a thunderstorm outside of Washington D.C., taking out a chunk of Amazon's servers, which plays host to sites and services such as Instagram and Netflix. Then, on Friday; the internet died again, taking down sites such as Yelp, Reddit, and even our very own Gawker network.


But this time it wasn't because of the elements. This time it was because of a leap second. Huh?

Long story short: at 12:00 am Greenwich Mean Time, all of the atomic clocks across the world inserted a leap second (or in simpler terms, paused for a second) so that they could remain in unison with the rotation of the planet. It's happened 24 times since 1972, but we're at a point now when many pieces of technology, ranging from servers, to networks, to laptops, sync up their clocks with the atomic clocks. When a leap second gets thrown in to the mix—meaning they see the same time two seconds in a row—it does not compute. And we've all seen enough dystopian robot movies to know what that means: time to freak the frak out.

The end result was pretty ugly. Sites were down for hours, as administrators worked to clean up the havoc wreaked by one little stray second. Gizmodo crashed around 8:00pm EDT, was down for roughly 45 minutes before regaining functionality. But not everyone was affected. In fact, one site was well-prepared: Google. As Wired points out, Google had anticipated this eventuality months ago, outlining its strategy for handling the leap second threat last September with the unappealingly titled Leap Smear.

The solution we came up with came to be known as the "leap smear." We modified our internal NTP servers to gradually add a couple of milliseconds to every update, varying over a time window before the moment when the leap second actually happens. This meant that when it became time to add an extra second at midnight, our clocks had already taken this into account, by skewing the time over the course of the day. All of our servers were then able to continue as normal with the new year, blissfully unaware that a leap second had just occurred. We plan to use this "leap smear" technique again in the future, when new leap seconds are announced by the IERS.

So you were wondering why the internet was a barren wasteland all weekend, now you know.

Image via Shutterstock/iofoto



Broken Machine

I was only slightly inconvenienced, but I'll just continue to blame any network outages on Verizon, they'll just continue to blame any issues on my equipment. It's worked this way for years.