Francis Collins and Walter Willett are debating different models for constructing a huge cohort study resource to investigate the influence of genes and environment: On the pages of this week's Nature, Collins outlines his support for a made-from-scratch cohort of half a million people, while Willett backs a cheaper, faster model that combines existing cohort studies.
"If it only cost a few hundred million dollars to start a new cohort, there wouldn't be much of an issue at all," Willett, Chairman of the Department of Nutrition at the Harvard School of Public Health (HSPH), told The Scientist. But with a projected cost of $3 billion or more for a study that includes subjects representative of the country's ethnicities and walks of life, "then it's going to displace a lot of other research at a time when research is being hugely squeezed," he said.
The National Institutes of Health, where Collins directs the National Human Genome Institute, originally envisioned the massive cohort project as providing a vast database of genetic and biomarker data linked to information about participants' diet, geography, smoking habits, and many other factors.
Collins and his co-author, Teri Manolio, were not available for comment before deadline.
According to Collins' commentary, a new cohort study holds several advantages over a cohort that combines existing studies. For instance, researchers could apply exactly the same methods to gather data from each subject, include enough minorities and other groups without statistical adjustments, collect data on entire lifespans, and employ the latest technology, such as mass spectrometers.
One advantage of larger cohorts that was not explored in the commentaries is that their standardized approach makes it easier to combine data on subjects, while some detail gathered by small cohorts could be lost when they are pooled, said Willett. When researchers examine pooled data, they must use a "common denominator" of data, potentially ignoring variables that had been gathered by some studies, he said.
But this drawback is partly countered by another advantage of pooled studies: time. In addition to its high cost, a new, large-scale cohort study will take as long as 10 years to produce any results, due to time-consuming planning and enrollment, as well as the time needed to research common diseases, argues Willett.
In contrast, combining existing cohorts would take much less time, given that most of the data already exist, Willett says. He lists 13 studies -- such as the Nurse's Health Study -- that he feels are suited for pooling, including about 1.4 million subjects and 845,000 biological samples. Four of these are associated with HSPH, accounting for more than a quarter of the total patients and biological samples, putting Willett in significant control of any combined cohort project if it moves forward.
In a ballpark estimate, Willett told The Scientist that the kind of cohort study model he envisions might require a budget of $20 million to $50 million per year, with perhaps 50-75 percent coming from existing funding from smaller constituent cohort studies.
A pooled cohort project is "certainly a much more cost-efficient strategy," said Roberta Ness, chair of the University of Pittsburgh department of epidemiology. "It won't get you everything -- there will be real limitations with respect to the age composition of the cohorts, the baseline data collection, [and standardization of] the specimen-collection techniques," she said.
A pooled cohort's shortcomings will hinder the study of diseases that develop early in life, such as asthma and major psychoses, because it would lack adequate data from children, argues Collins. He is also concerned that research into changing environmental factors, such as emerging infections, will suffer, because researchers using a pooled cohort might not concentrate on updating their information past a number of basic variables.
This isn't an insurmountable problem, insisted Carolyn Williams, chief of epidemiology in the division of AIDS at the National Institute of Allergy and Infectious Diseases, who told The Scientist that gaps like this are routinely filled in with existing pooled cohorts, where researchers check study subjects for a new set of variables once the cohorts have been pooled.
The NHGRI has already taken expensive steps toward a large cohort study, including its $40 million Genes and Environment Initiative, which is developing genotyping facilities, data analysis infrastructure, disease biomarkers, and better tools for measuring environmental exposures.
Chris Womack
mail@the-scientist.com
Links within this article:
F. Collins, "Necessary but Not Sufficient," Nature, January 18, 2007.
http://www.Nature.com
W. Willett, "Merging and Emerging Cohorts," Nature, January 18, 2007.
http://www.Nature.com
Walter Willett
http://www.hsph.harvard.edu/facres/wlltt.html
Spivey, A. "Gene-Environment studies: What, How, When, Where?" Environ. Health Perspect. 114, A466?A467 (2006).
http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1551984
Francis Collins
http://www.genome.gov/10000779
F. Collins, "Delivering on the Dream: biomedical Research in the Genomic Era," The Scientist, v20: 2, p 46.
http://www.the-scientist.com/article/display/23067/
M. Anderson, "A 500,000-person study?" The Scientist, May 26, 2004
http://www.the-scientist.com/article/display/22202/
Genes and Environment Initiative
http://www.gei.nih.gov

[Comment posted 2007-01-23 05:51:08]
It won't be quite as big a deal for the people that run the study but it will still be big enough
Gordon Couger
[Comment posted 2007-01-21 12:44:40]
[Comment posted 2007-01-18 13:25:52]
As with all new research undertakings, it is valuable to learn what you need to learn, so to speak, before you jump in with both feet.
Start the project with the less expensive option of pooling existing cohorts. Use it not only to generate some data, but to solidify what questions you NEED to ask in a more detailed prospective cohort.
Once you have some experience working with real data, then you can more accurately design the prospective trial. This is a particularly appropriate approach given the cost of a large prospective trial in an environment of scarce resources.