New statistical analyses of the National Institutes of Health's peer review process suggest that the current system may be missing the mark on funding the right proposals.
Reviews of as many as 25% of all proposals are biased, according to a study led by
Valen Johnson, from MD Anderson Cancer Center to be published tomorrow (July 29) in
Proceedings of the National Academy of Sciences.
Johnson collected about 14,000 reviewers' scoring data on some 18,000 proposals from two reviewing sessions in 2005. He developed a statistical tool that analyzed how reviewers changed their score for each proposal once a study group of five or six reviewers had discussed each application. Johnson found that certain reviewers judged consistently harsher, for example, and may have influenced how the rest of the reviewing study section rates a proposal.
Johnson also demonstrated that, based on reviewers' assessments, there isn't much difference in quality between proposals that scored in the low range. If that's the case, Johnson told
The Scientist, the cheapest one should be funded. By factoring in cost this way, NIH could fund more proposals, he added.
Antonio Scarpa, director of the NIH Center for Scientific Review (CSR), told
The Scientist that the peer review process is more complicated than one paper can take into account, and judging proposals is like critiquing a movie after having only read a paragraph description.
Last month the
NIH announced up-coming changes to the peer review process after last year's review of peer review. Scarpa said that the CSR is working to implement a ranking system (as opposed to an individual scoring system), and having each reviewer give their scoring criteria -- whether the reviewer values an investigator's achievements over the proposal itself, for example. This takes into account some reader bias that Johnson identified, Scarpa said.
"[Johnson] does a good job of identifying the weaknesses in his own model," Andrea Kopstein, director of planning, analysis and evaluation at the CSR, told
The Scientist. For example, "we really don't know the true proposal merits. And so many of the issues raised in this paper are already under study and changes are being implemented."
In a
paper published last week in the journal PLoS ONE,
David Kaplan, at Case Western Reserve, showed that it would take more than 30,000 reviewers to make a good, unbiased assessment about one proposal. Instead of hiring tens of thousands of reviewers, an obviously impossible solution, Kaplan told
The Scientist, he proposes changing the proposal grading system. The current system has 41 grades that the reviewer can assign. By only giving reviewers five grading options, for example, and by shortening the length of proposals to only a few pages, reviewers can quickly assess the value of a given application.
Both analyses suggest that the NIH could save money -- and may improve proposal scoring accuracy -- by having reviewers evaluate proposals on their own rather than in study groups, thus eliminating the need for meeting and travel expenses.