We’re at a critical inflection point for the World Wide Web. Everything is changing, disappearing, splintering, expanding, and being remade. It’s time we provide legal protection to what’s left of this moment in history. And one way to do that is to make it a national monument.
It’s difficult to overstate what a unique moment this is in terms of historical progress and opportunity. The world has bifurcated into online and IRL experience, and the two are intrinsically feeding off of each other. But unlike any other turning point in human history, the technology that’s upending how civilization functions is also memorializing its own impact on the world through the input of the 3.2 billion people who currently use it. We are constantly creating records and telling each other stories about what life was like around the turn of the millennium, and we’re doing most of it online.
But it’s becoming clear that the web is rapidly changing. Video and audio have become more prominent, which makes it more taxing for archiving efforts like the non-profit Internet Archive to afford to store increasingly large files. We’re also seeing efforts to create a new form of the internet with technologies like the blockchain. And political differences threaten to remake the global web. Google’s former CEO Eric Schmidt recently predicted that there will be two distinct versions of the internet by 2028—one led by the Chinese, the other by the rest of the world.
Politics and a general cultural concern about the effects of internet platforms have also led to increased efforts to delete online content entirely. Whatever your feelings about, say, Infowars conspiracy rants, the YouTube shooter’s borderline avant-garde manifestos, and deranged algorithmically-generated kids shows, this controversial content has a historical and cultural value that should be available in the future.
Beyond intentional removals, websites simply die as people stop maintaining them and servers get cleared out to make room for more information. Internet Live Stats estimates that there are about 1.193 billion websites online at the moment, and that number rises every second. The site estimates that just over 13 percent of those sites are active today, indicating that the bulk of the internet is on death watch. It’s time to act before it’s too late.
Is it possible?
To figure out if turning the web into a national monument is even doable, we have to first set some parameters given that the web is constantly evolving. So let’s say we aim to archive all of the publicly-facing URLs that still exist from the birth of the web and set a cut-off date, like January 1, 2019. Exclude sites on the so-called dark web and password-protected content, and set a firm end date for the project so it wouldn’t just go on endlessly and cost unfathomable amounts of money.
This whole task would be complicated, of course. And the first question we’d have to ask is whether it’s even legally possible to do this. The short answer is: Probably.
The Antiquities Act, which gives presidents the power to create national monuments, was enacted in 1906 under pressure from scholars and scientists who didn’t want prospectors and settlers in the southwest destroying archeological artifacts. Under the law, a president has the right “to declare by public proclamation historic landmarks, historic and prehistoric structures, and other objects of historic or scientific interest that are situated upon the lands owned or controlled by the Government of the United States to be national monuments.” For the purposes of turning the web into a national monument, the line about “objects of historic or scientific interest” would appear to apply.
The first real test of the Antiquities Act came to the Supreme Court in 1920, in the case of Cameron v. United States. At that time, the court ruled that President Theodore Roosevelt indeed had the power to declare the Grand Canyon a monument that would be off-limits to mining because it falls under the category of “an object of unusual scientific interest.” That precedent and the court’s decision not to consider the size of the monument in its decision left the possibilities wide open. Certainly, we wouldn’t need server space that’s larger than the Grand Canyon to store the web. So, it would seem that a project of this scale would fall under the scope of powers given to the president to designate national monuments.
Jonathan Jarvis, who served as director of the U.S. National Park Service for eight years under President Obama, told Gizmodo that making the web a national monument wasn’t out of the question. “There have been national park units established by Congress around an idea and or a historical event without a physical footprint,” he said. Jarvis pointed to the Rosie the Riveter World War 2 Homefront National Historical Park as an example and explained that it was founded to celebrate the women who kept American industry running during the war. He also mentioned the many monuments that have been created to honor the civil rights movement.
Still, the web is a tricky thing. Going back to the text of the Antiquities Act, Jarvis said that the line regarding “objects of historic or scientific” interest would need to be read literally. “So I think there would have to be an ‘object’ to represent the internet in some way to make it a national monument,” he said. (Side note: When he was in office, Jarvis added, there actually was some discussion of making the garage where Steve Jobs and Steve Wozniak founded Apple into a national monument honoring the personal computer.)
As for the internet, the object could be a giant data center. If we really wanted to keep things simple, we could even just turn a copy of the Internet Archive’s Wayback Machine into a monument. That organization could continue its work of running a living archive, and we’d have a legally protected clone for the future.
LOCKSS
Mark Graham, director of the Internet Archive’s Wayback Machine, told Gizmodo he was certainly open to the idea of making more copies of his baby. Graham points to the archiving principle known as LOCKSS, an acronym that stands for “lots of copies keep stuff safe.” Stanford University oversees a LOCKSS program that works with institutions to implement a whole set of preservation principles, but the most important thing is to make as many copies as possible.
Graham hasn’t given the national monument approach a lot of thought, but he did say that he’s been privately exploring the idea of making a living web archive a United Nations World Heritage Site. Specifically, a web archive could fall under UNESCO’s Intangible Heritage designation that covers ephemeral cultural artifacts like Turkey’s whistled language and Mongolia’s coaxing ritual for camels.
The folks at the Internet Archive welcome competition. Graham emphasized that many countries (for example, Portugal) already have their own efforts to archive the web. But they each set their own criteria for what to archive, and they often focus on their nations’ specific cultural concerns. This is fine—even a little bit of archiving goes a long way. Lots of copies keep stuff safe. And the U.S. should ensure it has its own copy.
How big is the web?
One difficulty is that it’s hard to figure out exactly how large the web is. Graham thinks about this subject for a living, and even he’s not sure about a ballpark figure for its size. He’s only able to say it would take a large-scale research project to get a satisfying answer—something the federal government is good at.
The best he can tell us is that the Wayback Machine has been archiving a lot of the web for 22 years. It stores about 22 petabytes right now and is growing at the rate of about one petabyte every three months. These days, we tend to think of large storage in terabytes—a petabyte is a thousand terabytes, and a terabyte is a million megabytes. Let’s say a megabyte is represented by a book that’s one-inch-thick. If we stacked those books, the Wayback Machine’s archive would be 347,222 miles high—a distance that would stretch to the moon and half-way back.
Of course, we could create an analog archive rather than a digital one. In 2015, a Washington Post investigation found that if we printed the internet, it would take around 305.5 billion sheets of paper. Estimating what kind of deal the U.S. government could get on the largest single print job in the history of mankind is difficult. Let’s go for a high estimate: A single black and white text document printed at FedEx will cost you $0.69. (nice) Just printing it out would cost $2.4 billion. And that wouldn’t allow us to capture functionalities, connections, audio, and video, so this solution is far from ideal.
What is the web, anyway?
When I asked Graham about the ideal process for the government to follow when starting its own web archive, he answered with a question: What is the web? He explained that the criteria I’d laid out wouldn’t totally cover the challenges of defining the web.
Because of the nature of the web, it would be impossible to conclude that we’ve accomplished our goal of archiving it. Take Facebook, for example. As Graham explained, “every URL one enters into Facebook returns its own unique content, every time you enter it into Facebook.” In other words, entering the URL for a single profile page two times in a row will return different surrounding content. Of course, you could just set the rule that we take one capture of that page and be done with it, but his point was that the web is a living thing and it only becomes more slippery over time.
The answer to defining what should be archived, in Graham’s eyes, is archivists. “When archivists do what they do, they select… an archivist’s job isn’t to keep everything,” he said while admitting that his job has been unique in that he’s tried to keep everything that he can. The Internet Archive regularly curates collections and highlights selections from the Wayback Machine, but the role of selection will have to become more prominent as more and more information flows online. And it will likely have to evolve beyond the traditional scholarly approach to preserving historically important examples based on a canon to using approaches that can be expressed algorithmically.
Graham also emphasized that it’s important for any monument of the web to be readily accessible to the public. “We have a philosophy here with regard to web archiving which is why we think that ... a healthy archive is an archive that’s used,” he said. “So if you if you take an archive you store a lot of stuff in the closet and you never look at it and never use it, the probability that it’s going to be useful to people in the future is a lot less than if it’s used along the way.” He said, for example, that use is key to quality assurance. This is a practical concern as well as a philosophical one.
The necessary tools
There are different approaches that the U.S. government could take to execute such an enormous project, but Graham thinks it would go about the task in much the same way that the Internet Archive already does it. He said that the project would likely use web crawlers that scan the web and take snapshots of individual pages. But this would have to be augmented with other technologies. He points to the problem of complex pages that are essentially applications that require a whole stack of data to function, in which case all the functional data has to be scooped up and held together over time. Oftentimes, the related files will be part of a back-end that can’t just be scraped by a web crawler. In situations like that, the Internet Archive is using a platform called Docker to make the backups.
The federal government would also likely be able to corral tools it already has and leverage its influence to directly clone servers at certain companies. That scenario would raise questions about how much of this information we want the government to have in one place. But if that idea bothers you, you probably don’t want to contemplate the likelihood that a lot of the archiving project is probably already complete at the NSA’s secretive data center in Utah where communications and web traffic are reportedly scooped up, analyzed, and stored. Why don’t we just make a copy of that archive, you ask? Well, it would probably be easier to start fresh than it would be to try to disentangle private and classified data from public and innocuous information.
The political will
The practicalities of this kind of project come with a lot of uncertainty, and that’s equally true of the politics that would surround it. It would be expensive, we know that. Budget hawks wouldn’t be pleased. And like the space program, there will always be critics who believe that the money could be better spent elsewhere.
The good news is that making a national monument of the web wouldn’t have to fight through our gridlocked political system because only the president can unilaterally decide to do it. The bad news is that Donald Trump is the president, and it’s still unclear if he even knows how to use a computer.
Still, having spent the last few years getting to know Trump, I think there are a few ways he could be convinced to do it. First of all, remind him that his tweets as president are automatically archived by the government, but his personal tweets before that time could still be under threat of deletion if Twitter ever decides to truly enforce or change its terms of service.
Next, argue that the internet is an American achievement. Though the U.S. played an enormous role in funding the research and infrastructure of the internet, it’s an achievement that belongs to the world. But as we saw with the recent Niel Armstrong biopic’s decision to not show the American flag being planted on the moon, the right wing really likes to claim mankind’s achievements as uniquely American, so we could just give ‘em that one as a compromise for getting this thing done.
Another upside for Trump would be that it would give him something positive to show in regards to the protection of national monuments. The Trump administration has been fighting to shrink the size of monuments that are already protected so that private industry can come in and harvest those precious natural resources. By establishing the web as a monument, at least the president could point to new preservation initiatives that not even oil lobbyists could object to. If he really insists, he can put his name on the data center in big gold letters.
Ultimately, surpassing the technical and political challenges in establishing the web as a monument is currently infeasible. But that doesn’t mean we shouldn’t try. The best reason to create this unprecedented kind of monument is that it’s the right thing to do. As Graham told us: “A society has a right to remember its own history.”