Image: Daniel Dionne/Flickr

‚ÄúScience‚ÄĚ might mean something crazy to you, like groundbreaking new treatments, wild new animals, explosions in space, or crazy chemistry. But at its core, science is nothing more than ruling out hypotheses based on evidence. A new debate is flaring about one of science‚Äôs important concepts: How we decide what constitutes a positive result.

At the center of the debate is the concept of ‚Äústatistical significance.‚ÄĚ Much of science involves testing a control versus an experiment, like a die versus a weighted die. The ‚Äúnull hypothesis‚ÄĚ means that the experimental outcome was exactly the same as the control. ‚ÄúStatistically significant,‚ÄĚ on the other hand, means that after collecting all of the data, the experiment and control were different enough and the sample was large enough that the null hypothesis can reasonably be ruled out. In other words, the experimental treatment had a real, measurable effect.


Currently, scientists gauge statistical significance using a number called the p-value: If the p-value is less than .05, that means there‚Äôs a 5 percent chance the control alone would have produced the results that the experiment produced.¬†But a growing number of researchers aren‚Äôt comfortable with that .05 value, and one team is now proposing redefining statistical significance to a p-value of .005‚ÄĒonly a .5 percent chance of the control producing the results observed in the experiment. In short, these researchers are calling for scientists to adopt much higher standards for what they deem to be ‚Äėreal‚Äô results.

This could have implications for experiments in many fields like biology and medicine and could require scientists to work much harder to prove their hypotheses.

‚ÄúThe lack of reproducibility of scientific studies has caused growing concern over the credibility of claims of new discoveries based on ‚Äústatistically significant‚ÄĚ findings,‚ÄĚ a group of 72 scientists writes in a paper that will be published in the journal Nature Human Behavior. ‚Äú...We believe that a leading cause of non-reproducibility has not yet been adequately addressed: Statistical standards of evidence for claiming new discoveries in many fields of science are simply too low. Associating ‚Äėstatistically significant‚Äô findings with P < 0.05 results in a high rate of false positives.‚ÄĚ


The researchers admit that defining statistical significance as .005 is about as arbitrary as using .05‚ÄĒit‚Äôs just a threshold used to reduce the likelihood of false positives in an experiment. But just think, particle physics uses a p-value of p=0.0000003, according to a Scientific American blog post. This means that, in a particle physics experiment, when scientists compare their control (the laws of physics without new particle) to the experiment (the laws of physics including the new particle), there‚Äôs only a 0.00003% chance the laws of physics without the new particle would produce the results they see. Particle physics does not let new particles in easily.

The researchers call out the fact adopting a stricter p-value as the standard for statistical significance would put a lot more work onto scientist‚Äôs plates‚ÄĒthey‚Äôd need to take seventy percent more data, according to the new paper, since taking more data is a way to make the experiment better stand out from the control. Nor would the changing the threshold for statistical significance combat ‚Äúp-hacking,‚ÄĚ a controversial practice where a scientist tests multiple hypotheses at the same time with the hope that one of them just ends up with a p-value less than .05 based on luck alone, or other biases. They also point out that papers with p-values higher than .05 and less than .05 should be labeled ‚Äúsuggestive evidence.‚ÄĚ

Obviously, there‚Äôs a lot to discuss. Microbiologist Jonathan Eisen from the University of California, Davis said he wasn‚Äôt ‚Äú100% certain‚ÄĚ as to whether he supported the revised p-value in a blog post. After all, taking more data costs more money and takes more time. Some have worried about how this might affect the costs of drug trials as Science reports, or that it is the ‚Äúleast of our problems‚ÄĚ in science at our current era in history, as psychologist Timothy Bates from the University of Edinburgh wrote a blog post.


At this point, we know there’s a reproducibility crisis in science. Those trying to get the same results as past cancer and psychology studies are coming up without the reported effects. So for now, just know that there’s conversation brewing to address this, and folks want to see change.

[PsyArXiv via Science]