Reproducibility is a bedrock principle of science. The results in a scientific paper are not supposed to be the reflection of an individual scientist’s unique, personal views; they are supposed to be a general claim about how the world works—one that can be verified by others. But lately, social and biomedical scientists are worried that much of the research published in their fields fails to adhere to this bedrock principle of science. There are signs that a lot of published research may not be reproducible at all.
Last January, the two top-ranking officials at the National Institutes of Health wrote that “the checks and balances that once ensured scientific fidelity have been hobbled” by a growing tendency to cut corners. They announced that the NIH is planning “significant interventions” to ensure that we can trust the results that are published.
“Science can only self-correct if there is awareness of what needs correcting.”
The NIH isn’t alone in its concern. A privately organized consortium called the “Reproducibility Initiative” hopes to get scientists to prioritize reproducibility in their work by offering what is essentially a seal of quality assurance. The idea is that researchers submit their experiments to the consortium, which, for a fee, has them replicated by an independent lab. Results that go through this quality assurance process can then be published with a badge certifying that the work has been “independently validated.” A related effort is the “Reproducibility Project,” a large collaborative project which aims to replicate key results from a set of highly cited recent papers in cancer biology and psychology.
These public and private efforts to make published research more reproducible are going to require a lot of resources from many scientists. Is the problem really that bad?
It’s not completely clear. Much of the recent discussion was sparked in 2011, when researchers at the biopharmaceutical companies Bayer and Amgen reported that they could not reproduce key results from dozens of highly cited papers. However, these claims themselves weren’t verifiable because the companies didn’t publish their data or methods. There is some evidence that the retraction rate of scientific papers is on the rise, but retracted papers are only a minuscule fraction of all published research. In fact, despite the hand-wringing in the scientific community, most of the direct evidence for non-reproducible research is largely anecdotal and, with some exceptions, not particularly convincing. This hasn’t stopped people from speculating about how things managed to get so bad. But the truth is, we don’t know how serious the problem is.
The most compelling lines of evidence for a reproducibility problem are indirect. For example, much published research in the social and biomedical sciences looks too good to be true. Considering the limited statistical power of most studies to detect a positive result, we’re seeing too many positive results and not enough negative ones. Poor statistical practices and experimental design are common in some fields, which means that many of the published positive results may in fact be non-reproducible flukes. Sometimes, as in the field of genomics, the sets of data produced are large and complex, leaving a lot of room for researchers to find spurious patterns in the statistical noise.
But because researchers rarely publish an attempt to replicate someone else’s experiment, we don’t really know what fraction of published studies aren’t reproducible. This is where the Reproducibility Project comes in—it is the first systematic attempt to reproduce experiments from some of the most influential papers in the literature. As the project’s leaders recently wrote in the journal eLife: “Science can only self-correct if there is awareness of what needs correcting.” The outcome of this project may show that much of the scientific literature is not to be trusted. Or it could show that researchers’ fears are overblown.
By taking a first crack at rigorously attempting to replicate important published results, the Reproducibility Project will bring some much needed hard data to the discussion. But when the results come in, researchers will still have to decide what to make of them. There is a good reason why scientists rarely bother to reproduce another lab’s results: It’s not always worthwhile. Chris Drummond, a research officer at Canada’s National Research Council, argues that it is important not to conflate reproducibility, which is essential in science, with the more narrow idea of “replicability,” which Drummond says “is not worth having.” “One should replicate the result not the experiment,” Drummond writes. A focus on exact replication of experiments “would cause a great deal of wasted effort,” because researchers would spend time and money recreating specific experimental details that are not broadly relevant. A reproducible result, on the other hand, is one that holds up even when you change the details.
Precisely replicating another lab’s experiment can be incredibly difficult, even with that lab’s full cooperation. Earlier this year, two teams of researchers, one from Lawrence Berkeley National Laboratory and the other from the Dana-Farber Cancer Institute, discussed their extensive attempts to replicate each other’s data. “A set of data that was supposed to be completed in a few months took two years to understand and sort out,” they wrote. In the end, their troubles came down to a small but crucial difference in how each team stirred its samples during a routine preparatory step. Experiences like this one are familiar to nearly all scientists, which is why, in their actual day-to-day practice, most of them implicitly accept Drummond’s idea that reproducibility is more important the replicability. There are, however, clear cases where exact replicability is important, such as clinical trials, drug development programs, and large genomics studies, and in these cases, the scientific community should develop the funding, procedures, and journal space to publish replication studies.
There is one question that the Reproducibility Project can’t answer: Has published research become less reproducible than it was in the past? We’ll likely never know, but the history of science is filled with examples of researchers arguing over the reproducibility of a published result—and then stumbling onto a completely new discovery. So by all means, scientists should push each other to do better and to avoid sloppy statistics and flawed experimental designs. But we should be careful not to take the drive for reproducibility too far. Not all published research will be reproducible, especially on the frontier of a new field, and we shouldn’t always expect it to be.