These Scientists Are Debating How We Decide What Results Count

“Science” might mean something crazy to you, like groundbreaking new treatments, wild new animals, explosions in space, or crazy chemistry. But at its core, science is nothing more than ruling out hypotheses based on evidence. A new debate is flaring about one of science’s important concepts: How we decide what constitutes a positive result.

At the center of the debate is the concept of “statistical significance.” Much of science involves testing a control versus an experiment, like a die versus a weighted die. The “null hypothesis” means that the experimental outcome was exactly the same as the control. “Statistically significant,” on the other hand, means that after collecting all of the data, the experiment and control were different enough and the sample was large enough that the null hypothesis can reasonably be ruled out. In other words, the experimental treatment had a real, measurable effect.

Currently, scientists gauge statistical significance using a number called the p-value: If the p-value is less than .05, that means there’s a 5 percent chance the control alone would have produced the results that the experiment produced. But a growing number of researchers aren’t comfortable with that .05 value, and one team is now proposing redefining statistical significance to a p-value of .005—only a .5 percent chance of the control producing the results observed in the experiment. In short, these researchers are calling for scientists to adopt much higher standards for what they deem to be ‘real’ results.

This could have implications for experiments in many fields like biology and medicine and could require scientists to work much harder to prove their hypotheses.

“The lack of reproducibility of scientific studies has caused growing concern over the credibility of claims of new discoveries based on “statistically significant” findings,” a group of 72 scientists writes in a paper that will be published in the journal Nature Human Behavior. “…We believe that a leading cause of non-reproducibility has not yet been adequately addressed: Statistical standards of evidence for claiming new discoveries in many fields of science are simply too low. Associating ‘statistically significant’ findings with P < 0.05 results in a high rate of false positives.” The researchers admit that defining statistical significance as .005 is about as arbitrary as using .05—it’s just a threshold used to reduce the likelihood of false positives in an experiment. But just think, particle physics uses a p-value of p=0.0000003, according to a Scientific American blog post. This means that, in a particle physics experiment, when scientists compare their control (the laws of physics without new particle) to the experiment (the laws of physics including the new particle), there’s only a 0.00003% chance the laws of physics without the new particle would produce the results they see. Particle physics does not let new particles in easily.

The researchers call out the fact adopting a stricter p-value as the standard for statistical significance would put a lot more work onto scientist’s plates—they’d need to take seventy percent more data, according to the new paper, since taking more data is a way to make the experiment better stand out from the control. Nor would the changing the threshold for statistical significance combat “p-hacking,” a controversial practice where a scientist tests multiple hypotheses at the same time with the hope that one of them just ends up with a p-value less than .05 based on luck alone, or other biases. They also point out that papers with p-values higher than .05 and less than .05 should be labeled “suggestive evidence.”

Obviously, there’s a lot to discuss. Microbiologist Jonathan Eisen from the University of California, Davis said he wasn’t “100% certain” as to whether he supported the revised p-value in a blog post. After all, taking more data costs more money and takes more time. Some have worried about how this might affect the costs of drug trials as Science reports, or that it is the “least of our problems” in science at our current era in history, as psychologist Timothy Bates from the University of Edinburgh wrote a blog post.

At this point, we know there’s a reproducibility crisis in science. Those trying to get the same results as past cancer and psychology studies are coming up without the reported effects. So for now, just know that there’s conversation brewing to address this, and folks want to see change.

[PsyArXiv via Science]

These Scientists Are Debating How We Decide What Results Count

Sign up for our newsletters

Latest news

Burnt Grass in Australian Cave Hints at Prehistoric ‘Magic’ Rituals

Hugging Face Said Last Week It Was Attacked. An Unreleased OpenAI Model Did It, OpenAI Now Says

Social Media Ban For Kids Approved in France in First For an EU Country

Volkswagen Thinks Your E-Bike Needs a Pair of Smart Glasses

Google Introduces Gemini 3.6 to Remind You It Has an AI Model, Too

US Treasury Chief Threatens Sanctions on Chinese AI Labs Over ‘IP Theft’ Concerns

Garmin’s New Screenless Wearable Could Be a Serious Whoop Competitor

Tropical Storm Bertha Is Arriving at the Worst Possible Time

Latest Reviews

‘Splatoon Raiders’ Isn’t What the Switch 2 Needs Right Now

Alienware AW3426DW Review: Gaming Monitors Get Thrown a Curveball

Anker Solix S2000 Review: The Little 2kWh Battery That Could

SwitchBot Home Dashboard Review: An E Ink Smart Display for the Weather-Obsessed

Asus ROG Kithara Review: A Huge Gaming Headset With Even Bigger Sound

Geekom A9 Max (2026) Review: Not Much ‘Max’ About It

The Best Budget Laptops Under $1,000 for Back to School

Roborock Saros 20 Review: Jack of All Trades, Master of Most

Related Articles

These Scientists Are Debating How We Decide What Results Count

Sign up for our newsletters

Burnt Grass in Australian Cave Hints at Prehistoric ‘Magic’ Rituals

Hugging Face Said Last Week It Was Attacked. An Unreleased OpenAI Model Did It, OpenAI Now Says

Social Media Ban For Kids Approved in France in First For an EU Country

Volkswagen Thinks Your E-Bike Needs a Pair of Smart Glasses

Google Introduces Gemini 3.6 to Remind You It Has an AI Model, Too

US Treasury Chief Threatens Sanctions on Chinese AI Labs Over ‘IP Theft’ Concerns

Garmin’s New Screenless Wearable Could Be a Serious Whoop Competitor

Tropical Storm Bertha Is Arriving at the Worst Possible Time

‘Splatoon Raiders’ Isn’t What the Switch 2 Needs Right Now

Alienware AW3426DW Review: Gaming Monitors Get Thrown a Curveball

Anker Solix S2000 Review: The Little 2kWh Battery That Could

SwitchBot Home Dashboard Review: An E Ink Smart Display for the Weather-Obsessed

Asus ROG Kithara Review: A Huge Gaming Headset With Even Bigger Sound

Geekom A9 Max (2026) Review: Not Much ‘Max’ About It

The Best Budget Laptops Under $1,000 for Back to School

Roborock Saros 20 Review: Jack of All Trades, Master of Most

Related Articles

Back to School: The 8 Best Alternatives to Buying a TV

The Best Budget Laptops Under $1,000 for Back to School

The Best Tech to Level Up Summer 2026

Don’t Be Afraid of Self-Improving AI, Says a16z-Backed Startup Mirendil

Nobel Prizes: 5 Unlikely Winner Reactions, From the Unbothered to the Downright Mad

An Artist Claims to Have Created Paint in a ‘New’ Impossible Hue Conjured by Scientists