It’s a result of an academic culture that encourages scientists to publish at too high of a frequency. And fixing the problem requires institutional change.
By Nathan Collins
(Photo: julochka/Flickr)
If you read science news, you know there’s been a crisis brewing in medicine, psychology, and other sciences—actually, a full-blown disaster, according to some observers: Evidence is mounting that many well-regarded scientific results are poorly supported by the data, if not flat out wrong. What happened? According to a new study, it’s the result of a culture that promotes scientists who publish frequent, high-impact papers—and fixing the problem requires institutional change, not just individual action.
“This is not a new argument,” Paul Smaldino and Richard McElreath write in the journal Royal Society Open Science. “But we attempt to substantially strengthen it.”
The problem is subtle, as Smaldino and McElreath point out. It’s not just shady dealings with sugar lobbyists or the irredeemably bad methods and stonewalling behind an influential study of chronic fatigue syndrome. It’s also that the basic research methods most scientists learned are flawed, whether they knew it or not. But what’s truly odd is that some scientists have been saying this for decades: Psychologists Paul Meehl and Jacob Cohen noted fatal flaws in psychological research in the 1960s, yet little seems to have changed since then.
Evidence is mounting that many well-regarded scientific results are poorly supported by the data, if not flat out wrong.
The reason, Smaldino and McElreath argue, is that there’s a kind of cultural natural selection that actually favors bad research methods. Chief among the selection pressures is that academic hiring and promotion committees tend to focus on the number of articles a candidate has published and the journals they’re published in, rather than actually reading the papers to evaluate their quality and creativity. Meanwhile, journals also tend to publish more exciting, counterintuitive results—and sometimes even have explicit policies against publishing papers that attempt to replicate previous experiments.
That approach favors poor research methods, Smaldino and McElreath explain, largely because of its indirect effects on experimenters’ sample sizes. Large sample sizes produce more reliable results—the more people participating in a psychology experiment, the greater the likelihood that any observed effects are, in fact, real, rather than statistical flukes.
But, from the point of view of publishing papers, large sample sizes have two drawbacks. First, studying more people takes more time and effort. Second, smaller experiments are actually more likely to produce results—albeit incorrect ones. Imagine, when flipping an ordinary coin, you’re testing the hypothesis that heads is much more likely than tails. If you test this hypothesis by flipping thousands of coins, you’re unlikely to find any evidence for the idea. Flip just a dozen or so, and you might get lucky. The consequence is that it’s actually easier to produce—and publish—a wrong-but-exciting claim (heads is more likely!) than it is to produce a correct-but-boring claim (heads and tails are equally likely, like we always thought).
Smaldino and McElreath probe this all further using a computational model of the evolution of science research. As they expected, they find that academic publishing and promotion typically results in studies with small sample sizes and frequently incorrect results. Importantly, that’s not because the system explicitly favors sloppy researchers—it’s just that they’re more likely to produce exciting results that at first glance seem valid.
Real change won’t come from individual scientists doing better work, as solid research could actually end up hurting young scientists. “If we are serious about ensuring that our science is both meaningful and reproducible, we must ensure that our institutions incentivize that kind of science,” Smaldino and McElreath write.