Black Friday Is Almost Here!
The Inventory team is rounding up deals you don’t want to miss, now through Cyber Monday. Click here to browse!

The Library of Congress Can't Quite Handle That Massive Tweet Archive It Was Trying to Build

Illustration for article titled The Library of Congress Cant Quite Handle That Massive Tweet Archive It Was Trying to Build

A few years ago, the Library of Congress announced its plans to create an archive of every public tweet ever. If you thought that sounded a little bit optimistic, you'd be right; the Library of Congress released a white paper today explaining why they can't quite pull it off.

Advertisement

It's not all bad. The Library of Congress has an archive, and you can search it. The situation just isn't optimal, so they've had to turn away over 400 researchers who've requested access. Due to the Library's agreement with Twitter, the archive could never have been fully public anyway, but the collection as-is chugs under even the lightest stress. A single query can take about 24 hours, and that's only for the tweets from 2006-2010.

Advertisement

In order provide a search that's a little more useful, the Library of Congress says it'd need a lot more resources.

To achieve a significant reduction of search time, however, would require an extensive infrastructure of hundreds if not thousands of servers. This is cost prohibitive and impractical for a public institution.

The upside to all this is that it seems the Library of Congress is succeeding in its most basic goal of actually archiving tweets. But that's of questionable worth if no one can really find what they're looking for. Fortunately you can archive your own tweets now, but if you're looking for a searchable database of the whole tweetverse, it seems you'll have to keep on waiting. [The Library of Congress via Buzzfeed]

Advertisement

Image by Biehler MichaelShutterstock

Share This Story

Get our newsletter

DISCUSSION

Doesn't Twitter have such an archive? There's a "search" tool in Titter. Is it severely limited or what?

I'm not up on my twitter-knowledge so please pardon me if this is a silly sounding question.