If you’ve seen the words “statistically significant” to describe scientific results, you may have thought that meant the results were important, or at the very least correct. But really, statistical significance doesn’t mean either of those things, and scientists’ reliance on the concept is indirectly eroding public trust at a time when science has never been more important, a new paper argues.
“Science is severely hindered when there is over-reliance on statistical significance to sort out what is worth knowing and publishing,” says Ronald Wasserstein, executive director of the American Statistical Association and co-author of a new ASA statement, published in the American Statistician, that highlights the sometimes-pernicious role statistical significance has played, especially in fields like epidemiology, political science, and health.
Originally, statistical significance was conceived as an additional check on the validity of scientific results, but over time significance and its antecedent—a statistic known as the p-value—became a gatekeeper for publishing results in medicine, social science, and so on. As such, it’s been credited with fostering a reproducibility crisis in psychology and other fields (though, in keeping with scientific tradition, exactly how much of a crisis actually exists remains a matter of debate).
Statistical significance gives everyone the false sense that results are clean and final.
So what are these things, and why are they so controversial? Roughly speaking, a p-value is the probability of observing a set of data given a particular scientific theory. Importantly, it is not the probability a theory is correct. Further complicating matters, p-values are usually derived from theories researchers are looking to disprove. P-values, in other words, have essentially nothing to do with proving theories correct. Statistical significance is even more dubious—all it means is that the p-value crosses an arbitrary threshold, typically 0.05. But as a practical matter, that designation is what many journals require before they’ll publish a scientific result.
Because of this, thinking about statistical significance encourages scientists—along with science reporters, and through them the general public—to think about results as simply right or wrong, important or unimportant. And that’s simply not the right way to think about science. “Science is messy and incremental,” Wasserstein says, and statistical significance gives everyone the false sense that results are clean and final.
The reality is more graded, and pretending otherwise could erode the public’s trust in science. If researchers report coffee is good for you one day and that it’s bad for you the next—or report that global warming will raise temperatures by three degrees one day and two the next—scientists start to look like flip-floppers or even liars.
“If p-values … with an arbitrary fixed threshold like 0.05 are the arbiters of truth, many [supposed] discoveries are going to be incorrect,” Wasserstein says. “What I can see is people stop believing in science” if researchers stay on their current path.
Quick Studies is an award-winning series that sheds light on new research and discoveries that change the way we look at the world.