The Large Hadron Collider Throws Away More Data Than It Stores

You'd think that with a price tag of billions of dollars the LHC would have more storage capacity than it could ever use. But with the machine producing a petabyte of data every second, the researchers simply can't store it all.


To put that into perspective, a petabyte is a million gigabytes, or enough capacity to store 13.3 years of HDTV content. And that's how much data the LHC's sensors are producing every second during an experiment. That's insane, and understandably, the facility simply doesn't have the capacity to save all that data.

When a collision occurs, the LHC's detectors capture somewhere in the neighborhood of 40 million snapshots in a single second. And that petabyte of data is then processed by a sophisticated array of electronics that decides what snapshots might actually have useful data in them, paring it down to about 100,000. That smaller group is then sent to a large farm of in-house computers which further narrows down the results to anywhere from 100 to 300 snapshots which can then be sent to a network of computers around the world for detailed analysis. And you complain because your DVR had to delete an episode of Mad Men to make room for the latest episode of Game of Thrones. Tsk tsk. [YouTube]


I knew the LHC produced a lot of data, but I didn't stop to consider that it probably throws away a large chunk of it. I learned something today!

I also find it interesting that they grab a bunch of random snapshots juts to make sure their algorithms to guess as to what is a good snapshot isn't borked.