Patricia Kim, a History of Art PhD candidate studying Hellenistic Greece at the University of Pennsylvania, is the first to admit she’s probably not what comes to mind when you imagine an academic researcher scrambling to archive federal climate data that might be erased by the Trump administration. But for Kim, information transparency is an issue that transcends disciplinary boundaries.
“As an archaeologist, I rely on data falling under the umbrella of environmental data,” Kim told Gizmodo in a phone interview. “But more to the point, I’m an academic. I believe in fact-based arguments, and [open] data, and sharing information. I don’t understand how that is ever political, but here we are.”
Kim was among dozens of academics, scientists and teachers who gathered in a stuffy conference room at Penn’s Van Pelt Library last Friday and Saturday for the first Philadelphia-based meeting of DataRefuge, a key player in the grassroots, nationally-distributed resistance movement to the Trump administration’s anticipated war on climate science. Their task? To scour the National Oceanic and Atmospheric Administration’s website for precious datasets on ocean temperatures, greenhouse gas concentrations and the like, and to place those datasets out of harm’s way.
Similar data rescue events are taking place across the country this week—in Chicago on Tuesday, in Indianapolis on Thursday, and in Los Angeles on Friday. During the first such archive-athon, held at the University of Toronto on December 17th, volunteers managed to scrape “pretty much the entire EPA website,” according to Kim, extracting troves of information on everything from water quality to air pollution. The various efforts are coordinating closely, both to avoid repetition, and to share standard protocols for how to accurately download and reproduce data.
Within the next few months, DataRefuge, in partnership with the like-minded Environmental Data Governance Initiative (EDGI), plans to launch a searchable, citeable repository of hard-to-access federal climate and environmental data, in addition to developing custom scripts for grabbing data that is difficult to find, or exists in unwieldy formats. DataRefuge and EDGI are also partnering with End of Term Harvest, an effort to back up information from federal sites during presidential transition periods, to seed the Internet Archive with datasets that are easily crawlable using off-the-shelf browser extensions.
Ultimately, the task at hand is monumental—archiving potentially petabytes of data used not just in scientific research, but in areas as diverse as urban planning, agriculture, weather forecasting, and real estate. Who will pay for all of the decentralized and distributed server space to store it is still being worked out, although EDGI is currently raising money to that end.
“We are light years ahead of where we were ten years ago in terms of public access to information, and it would be a huge disservice for that access to go away,” Mike Halpern of the Union of Concerned Scientists, a nonprofit science advocacy group which is helping to amplify DataRefuge’s message, told Gizmodo. “This is a better safe than sorry approach.”
“The scientific community has never been more ready to defend the role of science,” added the UCS’ Gretchen Goldman, speaking to reporters on a press call this week.
The idea to build a refuge for climate data, which many scientists fear is at risk under the tutelage of a President-elect who trades in conspiracy theories and has fingered climate deniers for key cabinet positions, was conceived piecemeal, by concerned academics and policy-makers around the country. For instance shortly after the election, a group of students in Penn’s Program in the Environmental Humanities (PPEH) began meeting to discuss the precocity of climate data and strategies for backing it up off government servers, eventually giving themselves the moniker of DataRefuge. Meanwhile, scientists at Harvard and elsewhere, motivated by the same concerns, formed the Environmental Data Governance Initiative (EDGI) to archive data and monitor changes in government websites.
It only took a few weeks for these homegrown resistance efforts to snowball into something much larger. Help escalating came from Trump’s transition team directly, which sparked an uproar in early December when it sent the Department of Energy a 74-item questionnaire asking for the names of all federal employees who had worked on some aspect of Obama’s climate agenda.
“It was incredibly important,” the UCS’s Andy Rosenberg told reporters this week, when asked about the effect that Trump’s Inquisition-like memo had on the scientific community. (The memo ultimately led to a brief standoff, which ended when the DOE basically told the transition team to go screw itself. The transition team later backed down, saying the questionnaire was “not authorized.”) “People are nervous and uncomfortable, because the signals have been very negative,” Rosenberg said.
“I think the main motivation for me is knowing how hostile the incoming administration is to climate science and science in general,” meteorologist and climate journalist Eric Holthaus told Gizmodo.
Shortly after the DOE debacle, Holthaus went on a weekend-long tear crowdsourcing information from scientists on Twitter, and building a catalog of the most important public climate datasets. After his Washington Post article on the archiving binge went viral, Holthaus merged his work with the complementary efforts at DataRefuge and EDGI. Almost overnight, a national movement blossomed.
Most of the people I spoke with for this article do not expect the feds to start throwing filing cabinets full of climate data in the nearest river next week. Instead—considering what’s happened under past conservative administrations, like that of George W. Bush—they are anticipating that datasets will become more difficult to access, or will degrade in quality, as funding is stripped from the agencies charged with maintaining them.
“I think apocalyptic imaginations of a total destruction of all data are a red herring,” science historian Etienne Benson said at a Penn DataRefuge panel discussion on Friday. “What’s much more likely is an increase in friction,” he added, using the term to describe how differences in standards and formatting can interfere with the ability of scientists to integrate datasets, into, say, a climate model. “An increase in friction could have very serious effects, even if bit by bit it looks like minor interventions.”
In Halpern’s view, higher-profile federal climate and environmental data sets, such as NASA and NOAA’s global temperature records, will be relatively safe. “It’s the critical information that’s under the radar that’s most vulnerable,” he said. “Because nobody has a list of all the data the Feds own.”
An example of just how fine-grained the issues can get, the wildlife tracking database MoveBank relies on snow cover data collected by NASA’s MODIS satellite. According to MoveBank data curator Sarah Davidson, minor versioning changes in the way MODIS snow cover data is formatted can lead to many days of work ensuring wildlife tracking tools utilized by thousands of researchers worldwide don’t break. “This is a super nerdy, technical thing in terms of vulnerability, but our tools take a lot of work and effort to put together,” she said.
Ultimately, it may be impossible to back up all of the valuable records scattered across numerous federal websites. Even getting the big stuff is going to take time: according to PPEH director Bethany Wiggin, 1.7 terabytes of data was backed up at Penn’s archive-athon this weekend. “There is a lot more work to be done,” she said.
What’s more, as DataRefuge volunteers have pointed out, there’s loads of state and local climate data that doesn’t have a digital record at all. Still, Halpern hopes that by building the infrastructure for a government data archive now, the scientific community will be better positioned to monitor the Trump administration’s actions down the line.
“You’re not going to be able to download everything or anticipate every action, because the government scientific enterprise is so vast and complex,” he said. “Creating a system that allows for accountability, and communicating out to the public why this kind of activity benefits them, are the main goals at this point.”
As the scientific resistance presses on toward Trump’s inauguration, the fears that motivated its inception are already being realized. On Tuesday night, InsideEPA.com reported that the Trump administration’s EPA transition team “intends to remove non-regulatory climate data from the agency’s website, including references to President Barack Obama’s June 2013 Climate Action Plan, the strategies for 2014 and 2015 to cut methane and other data.”
“I think we’re stepping into a phase of history that’s not quite like anything we’ve seen before,” Roland Wall, Senior Director for Environmental Initiatives at Philadelphia’s Academy of Natural Sciences said during a discussion following a DataRefuge panel on Friday. “Certainly nothing I’ve seen in the last sixty days indicates that the administration isn’t going to be exactly what it says it is, which is in many cases openly contemptuous of science and scientific knowledge, particularly on subjects like climate change.”