Science is supposed to be self-correcting. Unlike auto manufacturing or food production, science is expected to produce trustworthy results without explicit standards of quality that are imposed from the top down. Quality control is supposed to be baked into the scientific process itself. Before it’s published, research is peer-reviewed to root out flaws; and even after that, a finding is not considered truly trustworthy until it is replicated by other experts in the field. In other words, there is no official seal of approval—quality assurance comes from the expert judgment of communities of scientists, who are supposed to be able to filter out the good from the bad on their own.
But are science’s built-in quality controls good enough? There is growing concern that science is failing to self-correct, and that much of the research that’s published, especially in the biomedical sciences, doesn’t stand up to scrutiny. While there is no definitive study showing that this is true, evidence of the problem includes claims by drug company scientists that they are often unable to replicate the relevant published findings when they begin to develop a new drug, and surveys reporting that a shockingly large fraction of studies use less-than-optimal study designs and poor statistical practices.
A promising finding from an ultimately flawed animal study will needlessly risk patient safety while also squandering time and money as medical researchers and pharmaceutical companies chase false leads.
In response to the evidence that science’s inherent quality-control processes are failing, some researchers have called for formal standards in order to ensure that studies don’t fall prey to common and preventable mistakes. The leadership of the National Institutes of Health agreed, acknowledging that “the recent evidence showing the irreproducibility of significant numbers of biomedical-research publications demands immediate and substantive action.”
Last summer, the National Institutes of Health brought together its staff and the editors of many major science journals in order to develop a set of explicit quality standards. The standards they developed are targeted at “pre-clinical” research, meaning studies done using laboratory animals rather than humans. One reason for the focus on pre-clinical research is that it currently operates under relatively few formal standards, unlike human clinical trials, which are governed by an extensive set of regulations. Shoddy pre-clinical research is a serious problem because pre-clinical studies are the foundation on which costly drug development programs and human clinical trials are built. A promising finding from an ultimately flawed animal study will needlessly risk patient safety while also squandering time and money as medical researchers and pharmaceutical companies chase false leads.
The NIH released its new guidelines last fall and asked scientific journals to sign on to them. Many did. But not all—in April, several journals politely but pointedly refused. The editors of one of them, Biophysical Journal, explained their refusal, noting that while they agreed with the intent of the guidelines, the guidelines themselves “are not pertinent or applicable to the types of science published by Biophysical Journal.” The president of the Federation for American Societies for Experimental Biology, an umbrella organization whose member societies publish dozens of scientific journals, was more blunt in his open letter to the NIH: “FASEB finds these guidelines to be premature, to lack the appropriate flexibility to be applicable across disciplines, and to likely produce significant and in some cases unjustified burden associated with the preparation and review of scientific manuscripts.” Other major societies and journals, including the United Kingdom’s Royal Society and the American Chemical Society, have also refused to endorse the guidelines, according to Nature News.
Why are NIH’s new quality-control guidelines so objectionable to some? On a first read, the guidelines seem hard to argue with. They specify that journals should check the statistical accuracy of submitted manuscripts, ensure that experimental methods are described completely, require that published data be freely shared, and introduce best practice guidelines for especially problematic issues, like evaluating data that comes from image processing. The new policies also mandate that journals develop a policy that takes responsibility for the accuracy of each paper they publish by being willing to issue subsequent refutations of that paper.
In general, these are all good ideas, and, in fact, most journals already have policies in place that support many of them. The real problem lies in their implementation, and it’s here where the debate over the NIH guidelines reveals why formal quality-control standards are so hard to establish in science.
The editors of Biophysical Journal and the president of FASEB both noted that the guidelines don’t apply well across the many different types of research that get lumped under the broad heading of “biomedical sciences.” As the Biophysical Journal editors wrote, the guidelines “are primarily directed at large correlative statistical preclinical and clinical studies,” which diverge from the types of basic research published in their journal. The key point of contention here is the distinction between basic, exploratory research, which is aimed at making new discoveries, and pre-clinical research, which is carried out with the specific intent to test new drugs or treatments. It sounds simple, but in practice, the dividing line between the two modes of science is fuzzy. The president of FASEB noted this fuzziness and argued that “there remains a great deal of confusion regarding what research is covered under these new guidelines.”
Many of the requirements laid out in the NIH guidelines are important for effective pre-clinical studies, but impractical or downright irrelevant for exploratory studies. For example, cancer researchers studying the effect of some new drug in mice should follow the guidelines and randomly assign animals to receive either the drug or the placebo in order to avoid unconsciously biasing the results. But this procedure makes no sense for cancer researchers conducting a study to understand the function of a new mutation—you can’t randomly assign a mouse to acquire a mutation. Asking journals to adopt a set of quality-control standards that don’t accommodate these different modes of research is, at best, a pointless administrative burden, and, at worst, counter-productive.
So how should we go about enforcing high scientific standards? By simply raising the issue, the NIH may already be doing some good—even scientific journals that declined to accept the new guidelines have re-evaluated and updated their standards. But ultimately, quality control in research does really come down to science’s intrinsic ability to self-correct. Individual journals and scientific communities must set their own quality standards to fit the work that they cover.
Despite the claims that there are widespread problems in biomedical research, the overall quality of such a widely varying set of research practices is almost impossible to measure–you can’t reliably make sweeping claims about how much published research is “false.” The methods and types of data that count as convincing evidence in pre-clinical cancer research or psychology are very different from the types of evidence used to reach conclusions in biophysics or cell biology. Most importantly, it’s critical to recognize that scientific methods are constantly changing. That means, in the words of the Biophysical Journal editors about their own journal’s policy, “as research and techniques evolve, so too will the guidelines.”
Inside the Lab explores the promise and hype of genetics research and advancements in medicine.