The Library of Congress Can't Quite Handle That Massive Tweet Archive It Was Trying to Build

A few years ago, the Library of Congress announced its plans to create an archive of every public tweet ever. If you thought that sounded a little bit optimistic, you'd be right; the Library of Congress released a white paper today explaining why they can't quite pull it off.

It's not all bad. The Library of Congress has an archive, and you can search it. The situation just isn't optimal, so they've had to turn away over 400 researchers who've requested access. Due to the Library's agreement with Twitter, the archive could never have been fully public anyway, but the collection as-is chugs under even the lightest stress. A single query can take about 24 hours, and that's only for the tweets from 2006-2010.

In order provide a search that's a little more useful, the Library of Congress says it'd need a lot more resources.

To achieve a significant reduction of search time, however, would require an extensive infrastructure of hundreds if not thousands of servers. This is cost prohibitive and impractical for a public institution.

The upside to all this is that it seems the Library of Congress is succeeding in its most basic goal of actually archiving tweets. But that's of questionable worth if no one can really find what they're looking for. Fortunately you can archive your own tweets now, but if you're looking for a searchable database of the whole tweetverse, it seems you'll have to keep on waiting. [The Library of Congress via Buzzfeed]

Image by Biehler MichaelShutterstock