Scientific Data Is Disappearing All the Time

When a study gets published and its results enter our collective body of scientific knowledge it feels like it's there to stay. But without the raw data behind the study, it's hard to revisit the research and use it to take new ideas to the next level. Which is why it's such a problem that old data is disappearing.

A new study in Current Biology shows that the raw data underlying 516 biological studies from between 1991 and 2011 was only available for 23 percent. And for the papers that were written more than 20 years ago, there was a 90 percent chance that no data was available.

It may sound meta to do a study studying studies, but it's important since the scientific method is supposed to revolve around reproducibility. Timothy Vines, a zoologist at the University of British Columbia, who oversaw the research, told Smithsonian that:

Everybody kind of knows that if you ask a researcher for data from old studies, they'll hem and haw, because they don't know where it is. But there really hadn't ever been systematic estimates of how quickly the data held by authors actually disappears.

The group tracked anatomical plant and animal measurements recorded in 25-40 papers for every other year between 1991 and 2011. And when they went searching for the data driving each paper, they often found that abandoned email addresses and unresponsive researchers got in their way for 25 percent and 38 percent of the investigated papers respectively.

Vines points out that data stored on outmoded technology like floppy disks is also an issue. And in addition to wanting the data for the scientific process, it also should be more available in many cases if it was paid for with public funding that stipulated general availability.

Smithsonian adds that some journals, like Molecular Ecology where Vines is managing editor, are now requiring that authors submit raw data with their papers. But journal archives, while perhaps more stable than those of individuals, can still disappear over time. Time for a digital pit where everyone can dump their data for long-term storage. [Smithsonian]

Photo courtesy of Picasa