Scientific literature is in a sorry state of affairs. More and more research is being published every year, while widespread concerns about its credibility remain. According to a 2016 survey conducted by Nature, more than 70 percent of scientists have failed at reproducing another researcher’s work and more than 50 percent have failed to successfully reproduce their own work at some point during their career. But a new study takes aim at science’s reproducibility problem, suggesting a straightforward intervention: audits.
Randomly auditing under 2 percent of all scientific studies could significantly improve science’s credibility, predicts the study, which was published in PLoS One last month. The new work was conducted using a simulated world that mimics 100 research laboratories as closely as possible.
The simulated laboratories were originally created by Paul Smaldino, a cognitive scientist at the University of California–Merced, and Richard Mcelreath, an evolutionary anthropologist at the Max Planck Institute in Leipzig, Germany. In the simulations, each lab chooses areas of study, runs experiments, and publishes papers. The labs differ in how much effort they spend testing their ideas. Spending more effort results in fewer reliable papers in the model, and positive findings are easier to publish than negative ones—a phenomenon known as publication bias in academia. Each lab also produces “child labs”—representing a successful lab member starting up their own lab—that mimic the practices of the parent lab. Smaldino and Mcelreath found that the virtual labs shifted toward low-quality methods, widespread unreliable results, and little effort.
The new study adds random audits to Smaldino and Mcelreath’s virtual labs and finds that auditing about 2 percent of all research is enough to maintain quality by spreading the fear of being audited and removing labs that report high levels of false positive findings.”These labs were then removed and, importantly, the surviving parent and child labs increased their quality because of the fear of being audited,” says study co-author Adrian Barnett, a statistician at the Queensland University of Technology in Australia. “The power of the audits spread beyond the audited lab.”
The study predicts that audits would reduce the number of false positive results from 30.2 per 100 papers to 12.3 per 100—far from perfect, but a significant improvement.
The paper also finds that the auditing process would not be overly burdensome for academics: Each audit would require one extra researcher to review every 10 papers, Barnett says. He says the process might even be useful in weeding out systematic errors and that successful audits could eventually even become an indicator of a lab’s prestige. These researchers could also be paid for their work.
So what’s the hold up?
One problem is that auditing scientific research in the real world won’t be cheap. Barnett and colleagues predict that auditing 1.94 percent of papers funded by the National Institutes of Health—1,824 papers out of a total of 94,000 published through money from the agency in 2013—would cost $15.9 million every year.
Others argue that auditing publications doesn’t address the underlying culture that incentivizes bad science in the first place. “I do think we need to spend more time on post-publication evaluation of findings,” says Christopher Chartier, a psychologist at Ashland University in Ohio who runs projects that make it easier to carry out replication studies. But he believes that, in this case, honey might be a better approach than vinegar. “I think that providing incentives for practices that improve the reproducibility of science will be more effective than a small percent chance of receiving punishment for practices that reduce reproducibility,” he says.
Brian Nosek, the executive director of the Center for Open Science in Charlottesville, Virginia, is slightly more optimistic about audits in science despite being skeptical about some aspects of the new study. “It is a proven method in other professional practices when it is done well,” says Nosek, who’s played a key role in tackling irreproducibility. But “if the audits are weak or not focused on the right constellation of behaviors, then they could have no impact or even further skew dysfunctional incentives.”
For it to be rolled out on a large scale, the system will likely need backing from governments as well as researchers, Barnett says. “The reason we are discussing this kind of change to the research world is because we have a very big problem.”