The latest efforts to reproduce scientific findings, this time in the field of social sciences, has produced lukewarm results.
If the results in a published study can’t be replicated in subsequent experiments, how can you trust what you read in scientific journals? One international group of researchers is well aware of this reproducibility crisis, and has been striving to hold scientists accountable. For their most recent test, they attempted to reproduce 21 studies from two of the top scientific journals, Science and Nature, that were published between 2010 and 2015. Only 13 of the reproductions produced the same results as the original study.
But efforts to reproduce older studies do seem to be changing science for the better. The challenges faced in these 21 papers, and those elucidated by other reproducibility studies, may lead to changing “policies and practices in social-behavioral sciences,” according to an email sent to journalists by Brian Nosek, a psychology professor at the University of Virginia and executive director of the Center for Open Science.
The goal of the scientific method is to learn something new about the world by testing a hypothesis through an experiment. But if the experiment can’t be repeated, if the hypothesis is biased by the data, or if the results are different when someone else conducts the test, then we haven’t really learned anything. Scientists have realized this, and there are efforts underway to recreate past experiments to see whether the results are truly reproducible.
While there’s no standard criteria to determine whether a study has been successfully replicated, the researchers were able to repeat seven of the 21 studies with the exact same methods; 12 of the studies with small deviations in the methods; one study with unintended differences in the methods; and one study where the reproducers made a mistake in data collection. All 21 reproductions had larger sample sizes than the originals. The researchers then performed several statistical tests to see whether the original studies’ conclusions held.
The researchers successfully replicated and demonstrated similar conclusions for 13 of the studies, which is better than previous reproduction attempts in psychology and on par with another recent effort in experimental economics. Even for the 13 successful replications, the results of the initial study better supported the hypothesis than the results of reproduction did.
Among the failed studies, “there was essentially no evidence for the original findings,” according to the paper published today in Nature Human Behavior.
These were mainly human behavioral studies. One study that failed to be reproduced, for example, found that looking at pictures of the Rodin sculpture “The Thinker” lowered “self-reported religious beliefs.” Another “found that participants assigned to read literary short stories performed better than those assigned to read non-fiction on a test of theory of mind.”
The researchers also took a survey of nearly 400 researchers in the social sciences, asking whether they thought the included papers would be reproducible. The survey results generally predicted which studies would be reproducible and which would not be, suggesting that “the research community could predict which results would replicate and that failures to replicate were not the result of chance alone,” according to the paper.
These results are important. “It suggests that findings of studies published in high-impact journals are just as unlikely to be replicated successfully; so the impact factor of the journal in which work was published is no guarantee that the findings are true,” neurologist Malcolm R. Macleod, from the University of Edinburgh in the UK, wrote in a Nature Human Behavior commentary.
Each of the teams whose results weren’t reproduced offered responses. One team stood by their initial results, pointing to other successful replications. Another pointed to potential factors that may have skewed the initial results. One team went as far as to replicate several of their studies on human financial behavior, and agreed with the replication study that that particular study was a false positive. They noted that studies should be replicated even before they appear in a research journal, something they now wish they had done.
There were some limitations to this replication study—obviously it’s a small sample from the top journals, and the top journals might not be typical of the whole field. Additionally, the replications only tested one hypothesis, so papers with multiple hypotheses aren’t fully tested.
But it falls in line with a larger pattern of replication efforts, and demonstrates that there really is a reproducibility problem in science.
Things do seem to be changing, Nosek wrote. He pointed out that, of 33 psychology journals which previously had no transparency policies in 2013, 24 have now adopted some, while 19 have adopted “pretty assertive policies.” More scientists now share their data and preregister, meaning they commit to an experimental design prior to collecting any data.
Nosek wrote: “This cultural change anticipates that reproducibility rates will increase over time.” And hopefully they continue to change, because we need to bolster confidence in science now more than ever.