Humanity would understand very little about cancer, and be hard-pressed to find cures, without scientific research. But what if, when teams recreated each other’s research, they didn’t arrive at the same result?
That’s what the Reproducibility Project: Cancer Biology of the Center for Open Science is attempting to do—redo parts of 50 important cancer studies and compare their results. They released their first five replications today, and it turns out that not all of the data is matching up. At least once in every paper, a result reported as statistically significant (the way scientists calculate whether an effect is caused by more than chance alone) was not statistically significant in the replicated study. In two of the cases, the differences between the initial and replicated studies were even more striking, giving the Center for Open Science researchers cause for concern.
“I was surprised by the results because of all that homework that we did” to make sure the studies were being reproduced accurately, Tim Errington, Project Manager at the Center for Open Science told Gizmodo. “We thought we were crossing every T and dotting every I... Seeing some of these experimental systems not behave the same was something I was not expecting to happen.”
The impetus for the project began around 2011 and 2012, when biotechnology companies like Amgen and Bayer were having trouble recreating important cancer studies. The Center for Open Science worked alongside Science Exchange, a “marketplace for outsourcing science research,” to catalogue and replicate some of the most-cited cancer studies. They whittled 4oo papers down to 16 or 17 each from the years 2010, 2011 and 2012. Then, the groups painstakingly recreated the methods of each study by speaking with the authors and other researchers. They even published a peer-reviewed paper detailing how they planned to replicate the initial studies, called a registered report.
Today, the Center for Open Science published the first five replicated papers as well as peer reviewers’ comments in the journal eLife, redoing parts of these five linked papers. While some minor differences popped up in each replication, a few major discrepancies between the original and redone studies stood out. In part of one paper, scientists were trying to test how long it took for tumors to grow in mice after mutating a gene. The original paper found that the tumors grew after nine weeks—the replicated study saw tumors growing after just one week. In another, after following the first experiment’s protocol as best they could, the scientists behind the replicated study noticed untreated tumors, made from the same cells that the first scientists used, spontaneously shrank on their own. If the experimented-on tumors spontaneously shrink, one can’t really say whether a treatment worked or not.
None of the folks I spoke to at Science Exchange or the Center for Open Science blamed malice. Instead, they thought that our scientific culture incentivizes clean, sexy results in top peer-reviewed journals. “The reward system is ‘give me a paper that’s clear and understandable with an exciting headline and results that advocate for this message, theory or drug,’” said Errington. This causes scientists to leave out some of the intricacies of their methods. “The truth is that we all know that science doesn’t happen that way,” with one pristine result following another. “It’s messy and there are fits and starts.”
Scientists shouldn’t be surprised that having their data replicated leads to differing results, Elizabeth Iorns, CEO of Science Exchange, told Gizmodo. “People assume that there’s 100 percent certainty in a 1-off study that’s published. That’s obviously not correct, because... there’s always uncertainty in the results,” she said. “We can reduce the uncertainty by doing independent replications.”
Scientists involved with the original papers have begun reacting to the replications’ outcomes. “Our original research has been reproduced in at least 13 peer reviewed articles by independent research groups,” Erkki Ruoslahti of the Sanford Burnham Prebys Medical Research Institute in California, behind this replicated study, told Gizmodo in an email. Ruoslahti’s study found that a certain molecule could increase the efficacy of certain cancer drugs and help shrink tumors, but the replicated study didn’t notice a statistically significant effect on tumor weights.
“We are aware of two additional groups in the US that are in the process of publishing results that confirm and extend [our] results,” Ruoslahti continued. “Science is self-correcting—if a finding can’t be repeated it will vanish-and that hasn’t happened to our technology. Our current focus is on moving this promising technology forward to the clinic.” Rousalhti hopes efforts like that of the Center for Open Science won’t dissuade scientists from “pursuing innovative research that has the potential to benefit patients.”
Iorns was surprised by the personal reactions from some of the scientists on seeing their results called into question. In the comments section of some of the replicated papers, scientists have criticized the Reproducibility Project’s methods. “We’re not attacking the authors,” she said. “We’re just trying to put our results out there and let the community draw their own conclusions.” She felt that researchers should not to get too attached to their results, and instead foster a culture that encourages studies be reproduced.
Issues of reproducibility are not unique to cancer biology. In 2015, the Reproducibility Project: Psychology also had difficulty replicating the results of 100 psychology studies. “The incentives driving research behaviors are the same across all disciplines in science,” Brian Nosek, Executive Director of the Center for Open Science told Gizmodo. He offered a few solutions. He thought more scientists should publish registered reports to review their methods before carrying out an experiment, and publish their data openly so their experiments are easier to reproduce.
Ultimately, the Reproducibility Project: Cancer Biology has only reproduced five of the fifty planned papers. But if the first five are of any indication, a clear cultural shift needs to occur in order to ensure that scientists can replicate each other’s experiments and that results can be easily verified.
I’ve reached out to the corresponding authors of all five of the reproduced studies and will update the post when I hear back.
Update 1:11PM: Marina Sirota, first author of this study now at the University of California, San Francisco, sent us this email.
Thanks so much for reaching out. In October 2013, we were pleased and honored to see that our original 2011 publication was chosen as one of the top 50 influential cancer studies for the reproducibility study and is one of the first 5 to be completed. The goal of the original study was to develop and evaluate a novel computational technique that used publicly available gene-expression data to identify potential off-indication therapeutic effects a large set of FDA approved drugs where we chose one example prediction to validate. We are truly impressed with how much Figure 1 in the reproducibility paper matches Figure 4c in our original paper demonstrating the key finding that cimetidine has a biological effect between PBS/saline (the negative control) and doxorubicin (the positive control) and the raw p-value from that experiment is indeed statistically significant. Due to difference in initial independence assumptions and the study purpose however our team applied a different mathematical criteria for judging statistical significance of the findings, leading to differing interpretation of the results. I believe that reproducibility efforts such as this one are extremely important for both research and the public communities, however it is also crucial to understand the goals and assumptions of the original work. Please see our full response as well as support of Robert Tibshirani, a well known biostatistician at the bottom of the page here.