Why the National Institutes of Health Should Replace Peer Review With a Lottery

Publish date:
Social count:

A new study shows that peer-review scores for grant proposals are random anyway.

By Michael White


A lottery would be more effective. (Photo: jeremybrooks/Flickr)

The federal government spends $60 billion dollars each year on scientific research. To ensure that this money is spent effectively, federal science agencies rely on an extensive system of peer review to select the research projects that are supposedly important, innovative, and well-conceived — the best possible science out there.

But how well does peer review actually work in practice? This is a critical question because reviewing grant proposals absorbs a tremendous amount of time, effort, and money on the part of federal science agencies and the scientific community. The National Institutes of Health, which funds more science than any other agency, reviews more than 60,000 grant proposals each year. To evaluate them, the NIH relies on hundreds of specialized peer-review panels. These review panels typically consist of about 20 scientists and usually meet three times per year. In 2014, peer review at the NIH involved 2,500 review panel meetings and 24,000 scientists. According to a 2006 estimate, the annual cost of NIH peer review is $50 million — a small fraction of the agency’s $30 billion budget to be sure, but still a major investment. And that estimate doesn’t include the productivity lost by having thousands of scientists spend their time doing something other than research.

After weaker proposals are eliminated, peer review is actually worse than useless: It hides a fundamentally capricious system under a meritocratic veneer.

The resources invested in peer review are surely worthwhile if they ensure that the NIH funds the best science. But multiple studies have found that the current peer-review system is unable to reliably identify the most promising research proposals. These findings add weight to the widespread sense among scientists, even successful ones, that the peer-review system is badly flawed.

But what is the problem, exactly, and why can’t we fix it? It’s not that the NIH is unconcerned — NIH officials work hard to make peer review both efficient and effective, and in recent years have implemented multiple reforms to improve the quality of the review process. Yet the problem may be that peer review, no matter how well conceived and executed, can’t actually do what we want it to achieve. If that’s true, then the solution isn’t to reform peer review, but to replace it with something else altogether.

Two studies, one published in February and the other last year, clarify what peer review can and can’t do. In order to help the NIH select the best science, peer review must accomplish two things. First, it needs to weed out proposals that are seriously flawed or unimportant. And second, it needs to sort out the best research proposals from the merely good ones, because the budget isn’t large enough to fund all good proposals. The two recent studies of peer review show that peer review succeeds at the first task but not at the second.

In the earlier study, a pair of researchers at Harvard University and Boston University looked at peer-review scores and the subsequent publications of 130,000 research projects that were funded by the NIH between 1980 and 2008. Their results seemed to vindicate peer review: They found that “better peer-review scores are consistently associated with better research outcomes.” This was surprising, because it contradicted the experience of many biomedical scientists, and the findings of earlier studies with smaller samples.

The second study resolved the contradiction. Re-analyzing the same set of data, Ferric Fang at the University of Washington, together with Anthony Bowen at the Albert Einstein College of Medicine and Arturo Casadevall at John Hopkins University, found that many projects funded in earlier decades had lower peer-review scores — below the top 20 percent — that would never be funded by today’s more competitive standards. When the researchers focused only on projects with competitive scores in the top 20th percentile or better, there was virtually no relationship between the peer-review score and the subsequent productivity of the research project. The only exceptions were projects with the very best scores, in the top 2nd percentile or above. These projects tended to produce more papers and receive more citations than projects with lower scores.

These results show that the NIH’s peer-review panels are, in fact, good at weeding out weaker proposals, and at picking a few obvious winners. But the problem is that most funding decisions happen between these two categories, where, the results show, peer-review scores are essentially useless. Today, the NIH generally only funds grant proposals with scores above the top 15th or even 10th percentile. But the results by Fang and his colleagues show that a funded project at the 8th percentile is not likely to be any better than an unfunded (and possibly never completed) project at the 18th percentile. What this means is that, after weaker proposals are eliminated, peer review is actually worse than useless: It hides a fundamentally capricious system under a meritocratic veneer.

So how do we fix the problem? One solution is to spend more money on science. After more than a decade of stagnating budgets at federal science agencies, success rates for grant proposals are at historic lows, which has created a corrosive, hyper-competitive struggle among researchers for funding. The results of Ferric and his colleagues suggest that much good science is currently left unsupported. More money to fund more of the top-scoring proposals would increase the efficiency of peer review by reducing the effort researchers and reviewers spend on revising and re-reviewing rejected but perfectly good proposals that are submitted two or even three times.

Budgets at federal science agencies, however, are unlikely to increase much in the foreseeable future, meaning that these agencies won’t be able to fund all good research proposals. Rather than trying, yet again, to reform peer review, Fang and his colleagues suggested elsewhere, we should just own up to the fact that “winning a government grant is already a crapshoot,” and implement a lottery. Peer review could sort out the best applications from the weaker ones, and then funding would be awarded at random among the top proposals. The idea isn’t as outlandish as it may sound — New Zealand is currently using a lottery system to award some grants.

A major advantage of using a funding lottery would be that, by reducing its reliance on peer-review rankings, the NIH would have more room to address the urgent problem of bias. A suitably designed lottery system could help eliminate the small but persistent gender gap, and the larger racial gap in the funding it awards, by giving NIH officers more leeway to include a representative set of proposals. And importantly, a lottery would be an honest acknowledgement of what most scientists already sense: that, despite its reputation for basing decisions on merit, peer review is a much more random process than we would like to admit.