The Internet Archive Fights Wiki Citation Wars With Books

Image: Getty

Thanks to the laborious stewardship of the blessed Internet Archive and the unyielding armies of Wikipedia’s citizen scholars, we may, within our lifetimes, reach a consensus on basic historical information.

Last week, the Internet Archive announced that it’s been filling out Wikipedia’s book citations with links to two-page previews of scanned books, so that the cited passage can be viewed with a bit of surrounding context. So long as no one else has borrowed it, you can check out the book for 14 days, similar to a lending library; if the book predates 1923, and is therefore in the public domain, you can likely see the whole thing. So far, the IA claims to have turned 130,000 references into live links from 50,000 books in English, Greek, and Arabic. They hope, in the words of WayBack Machine director Mark Graham, to “achieve Universal Access to All Knowledge.”

Advertisement

The Internet Archive has long been wending its way through vast tracts of Wikipedia entries; scroll down to the bottom of virtually any Wiki page, and you’ll probably find the tiny bootprints of the InternetArchiveBot, which has weeded out about 13 million rotten links and supplanted them with Wayback Machine-archived pages. According to Internet Archive founder Brewster Kahle, WayBack Machine links are now the top-clicked citation links “by a factor of three.”

I just pulled up Genghis Kahn’s Wikipedia entry and found the InternetArchiveBot’s trail all throughout the citations; every fourth link sends me to the WayBack machine, and one sends me to a two-page preview of a 1998 history of Mongols in the Internet Archive. In a perfect world, I’ll eventually be able to visit page 313 of The New Encyclopedia of Islam without hitting a Google Books paywall.

Advertisement

IA’s bot scans from a collection of 3.8 million books, a collection which, according to Graham, is currently being scanned by 100 paid workers at 22 worldwide locations at a rate of 1,000 books per day, with millions waiting in storage centers in California, in addition to operations out of the Getty, the Boston Public Library, the Library of Congress, and Princeton University. The Internet Archive is also reviving tens of thousands of books from deep storage–for example, when Phillips Academy at Andover had planned to stock away pallets of books in preparation for its library renovation, the Internet Archive swooped in and digitized the 70,000-book collection. Now Phillips can offer an exclusive Internet Archive link to its own members to borrow against a hard copy.

“Librarians have been able to confidently weed excess, outdated materials from our collection,” San Francisco Public Library librarian Michael Lambert has said, “secure in knowledge that the books will not disappear, but rather have a new life where people around the world can read and research the materials that SFPL has meticulously collected over the decades.”

Advertisement

About 1 million of the Internet Archive’s books are modern, i.e., post-1923, and subject to copyright law. The IA’s Open Library has coined the term “controlled digital lending” (CDL), an “own-to-loan” program dictating that the number of copies the library physically owns will be proportional to the number it digitally loans. (CDL has drawn the ire of publishers and authors like the Author’s Guild, who take issue with CDL as a way to circumvent e-lending licenses, which libraries purchase from authors and publishers, and many have issued DMCA takedown notices. “We’re really not focused on the most recent books,” the Internet Archive’s Open Libraries director Chris Freeland told Gizmodo in a call. “Our goal is to un-blank the 20th century: books from the 1920s to the late 1990s, for which there is often no digital equivalent.” The Internet Archive is recognized by the state of California as a library and uses digital rights management software to limit sharing, downloads, and print-outs.)

Wikipedia book citations may add an extra patina of authority to term papers and blog posts, but the Internet Archive-Wikipedia partnership envisions grander plans. Last week at the Internet Archive’s Annual bash, founder Brewster Kahle spoke of the misinformation crisis during the 2016 election and segued into Wikipedia Executive Director Katherine Maher’s prophetic warning that “the truth might fracture.” Maher had been referring to citation wars, in which fact-checkers bitterly debate that which should be self-evident (see concentration camps). “Wikipedia is built on the idea that on any particular subject, a consensus will arise,” Kahle said. “That we’ll be pushing and shoving, but we will arrive at a consensus.” In lieu of that, the solution was to point to print, which, hopefully, America hasn’t totally written off along with TV and digital media. “We know that a lot of the best, most vetted information that we have are in things like books,” he said.

Advertisement

Each of those books costs the Internet Archive about $20 to acquire, digitize, and store in both physical and digital forms, so you can help them out by sponsoring a book. If you want to see a book that’s not on the list, you can click on the “Want to Read” button and put that book up for sponsorship.

Share This Story

About the author

Whitney Kimball

Staff reporter, Gizmodo. wkimball @ gizmodo