Skip to main content

Even Systematic Reviews of Medical Science Are Mostly Bunk

Even meta-analyses are often unreliable.

By Nathan Collins


(Photo: Jenni C/Flickr)

There are a fair number of reasons to doubt any one particular scientific study. Perhaps, for example, it was funded by a drug company looking to get approval for a new drug. Or maybe the original researchers fished around for some interesting result without taking appropriate precautions against false positives. Still, systematic reviews and meta-analyses ought to guard against such problems, precisely because they draw on dozens of studies for their conclusions—right?

Wrong, according to Stanford University’s John Ioannidis. “Possibly, the large majority of produced systematic reviews and meta-analyses are unnecessary, misleading, and/or conflicted,” he writes in a paper published in the Milbank Quarterly.

Ioannidis is most famous for his bluntly titled 2005 paper, “Why Most Published Research Findings Are False,” in which he argued that even large, randomized, double-blind medical studies would likely be wrong around 15 percent of the time. Meanwhile, small studies—early-phase clinical trials, for example—would be wrong around roughly three-quarters of the time, even if their authors were impervious to various sources of bias.

“The large majority of produced systematic reviews and meta-analyses are unnecessary, misleading, and/or conflicted.”

Those results leave some hope for meta-analyses, which collect many individual studies together to see what can be gleaned from the literature as a whole. In theory, a well-conducted meta-analysis would take account of each study’s quality, statistical power, and so on.

In practice, Ioannidis writes, there are several problems. First, the results often depend on which studies meta-analysts choose to review, and the criteria researchers use to make those decisions could be biased. Second, the vast majority of studies are never replicated exactly and instead focus on different outcomes or use slightly different procedures, making a quantitative comparison of their results difficult at best.

Ioannidis cites the research on anti-depressants such as Zoloft and Paxil as a particularly bad example. Psychiatric drug research already has something of a bad name, in part because of the now infamous Study 329, which purported to show that Paxil was an effective drug for treating depression in adolescents, when, in fact, it had no effect and could actually increase the risk of suicide.

More generally, anti-depressant studies are often small, get most of their support from companies with obvious conflicts of interest, and use dubious research methods, Ioannidis writes, making the individual studies frequently questionable. But even if individual studies are solid, different teams reviewing a collection of research papers can reach different conclusions. Meta-analysts have, for example, rated Paxil anywhere between best and 10th best among 12 anti-depressants, depending on who did the meta-analysis and exactly how they performed it—and those were the conclusions of respected experts in meta-analysis, Ioannidis points out.

That’s not to say that meta-analysis is unnecessary—it is necessary, as a way of figuring out where a field of research stands, Ioannidis argues—but right now a “major overhaul is needed.” One potential fix: going through coordinated sets of studies that foster both replication and meta-analysis (although that won’t always be possible). In the meantime, many of the same factors that produce good primary research—transparency, data sharing, and pre-registering studies’ aims—may help lead to better meta-analysis as well, Ioannidis writes.