Approximately half of the functional regulatory sequences in the human genome appear to lack conserved sequences, according to an analysis of functional elements in 1% of the genome. The finding comes from the four-year pilot ENCODE Encyclopedia of DNA Elements project, whose results are published in this week's Nature.
This lack of evolutionary constraint is "clearly one of the most interesting findings in the paper," said Eric Schadt of Rosetta Inpharmatics in Seattle, Wash., who was not involved in the work. It's possible that variations in regulatory sequences between people could help explain individual differences in disease susceptibility, giving these findings "huge implications," Schadt told The Scientist.
The ENCODE project also analyzed many other aspects of functional non-coding regions of the human genome. "Finally, we're going to be able to have some type of a map that will allow us to interpret the significance of any kind of human genetic variation, not just what is occurring in genes," said ENCODE co-chair John Stamatoyannopoulos of the University of Washington in Seattle. A series of papers on ENCODE data is also appearing this week in Genome Research.
The pilot phase of the project combined more than 200 new experimental and computational data sets from a consortium of 35 research groups to identify functional elements encoded in about 30 megabases of the human genome. Researchers analyzed DNA from diverse regions throughout the genome, Stamatoyannopoulos said, so that researchers can confidently extrapolate results to the entire genome. "This is the first time that so many different data types have been analyzed over the exact same regions in the exact same cell types," he told The Scientist.
The consortium's analyses revealed that most of the human genome is transcribed, even though just a small fraction of these transcripts are translated into protein. "The ENCODE work is really confirming that there is a significant amount of intergenic and intronic transcription," said Douglas Mortlock of Vanderbilt University in Nashville, Tenn., who was not involved in the work. "A lot of this has been suggested to exist, but it's sort of nice to see confirmation on a larger scale." However, the biological role of many of these transcripts is not yet clear, he added. "A lot of that transcription may not be functional."
The data also showed that the same regions of DNA are often transcribed multiple times in an overlapping fashion, confirming a previous hypothesis. "What we think of as genes are actually overlapping each other to a much greater extent than was previously thought," Stamatoyannopoulos said. "The whole genome appears to be connected together in some kind of a transcriptional network."
The researchers found consistent differences between histone modifications of promoters and non-promoter functional elements like enhancers, and they also discovered that transcription factors are equally likely to bind downstream as upstream of a target gene.
In general, many of the findings regarding chromatin structure, histone modifications, and transcription have been suggested by previous studies, Schadt said, but the size and depth of the ENCODE analyses provide "a tour-de-force, integrated analysis of all that information and [show] how an extensive annotation of the human genome might work," he told The Scientist.
The consortium next combined new data from various types of analyses -- including transcription, DNA replication timing, chromatin accessibility, and histone modifications -- and used computational algorithms to look for patterns of organization across the entire genome, Stamatoyannopoulos said. They found that all of these data sets confirm a patchwork pattern of transcriptionally active and inactive domains, Stamatoyannopouloss said. This pattern does not differ much between different cell types. "This is a very satisfying finding, because it indicates that there's some global structure of the genome," he added.
One of the project's most novel findings, Schadt and Mortlock agreed, is that about half of non-coding functional elements do not appear to be under evolutionary constraint, not only in humans, but across multiple mammalian species. (The researchers compared these regions with orthologous regions of 14 mammalian genomes.) Some researchers have suggested novel regulatory elements can be discovered by simply looking for conserved non-genic sequences between or within species, Stamatoyannopoulos said, but this result suggests that discovering them "really requires an experimental approach."
If these regulatory regions are variable between individual humans, they could be responsible for "common variations in disease and drug responses," Schadt told The Scientist. "I think it will be one of the more interesting things to explore" from the project's findings, he said.
Melissa Lee Phillips
mail@the-scientist.com
Links within this article
C. Choi, "Regulatory DNAs may be missed," The Scientist, March 24, 2006.
http://www.the-scientist.com/news/display/23246/
L. Pray, "Post-genome project launches," The Scientist, March 5, 2003.
http://www.the-scientist.com/article/display/21158
The ENCODE Project Consortium, "Identification and analysis of functional
elements in 1% of the human genome by the ENCODE pilot project," Nature, June 14, 2007.
http://www.nature.com/nature
Eric Schadt
http://www.rii.com/about/executives.html
M.L. Phillips, "Non-coding DNA adapts," The Scientist, October 20, 2005.
http://www.the-scientist.com/article/display/22805/
John Stamatoyannopoulos
http://www.gs.washington.edu/faculty/stamj.htm
Genome Research
http://www.genome.org/
Douglas Mortlock
http://phg.mc.vanderbilt.edu/content/mortlock
J. Cheng et al., "Transcriptional maps of 10 human chromosomes at 5-nucleotide resolution," Science, May 20, 2005.
http://www.the-scientist.com/pubmed/15790807

[Comment posted 2007-06-15 16:57:29]
"Junk" DNA as a scientific term is dead.
Important questions arise:
(1) after close to $50 M spent on the project how much taxpayer money should be allocated to beat a dead horse even more to death? (Spending is not necessarily linear, but if 1% cost $50 M, $5 Billion would be needed, and since ENCODE pilot took 4 years, linear extrapolation yields a run-time of close to a Milleneum...)
(2) rather than pretending "Junk DNA is dead, so let's continue Genetics as usual", what is the new paradigm-shift science and technology, that re-thinks, in a disruptive manner, "Genomics beyond Genes"? (see LINK
(3) science is the synthesis of gathering data according to theoretical predictions, and if data do not support the prevailing theory, disruptive theoretical advances are needed ("atom is the smallest integer of element - but when the atom splits, quantum mechanics and nuclear industry is born"). ENCODE effectively dispelled the dogma of "Genes versus Junk". With $50 M spent on data discarding a dogma, the next $50 M is to be spent on re-thinking (mostly, in algorithmic approaches and in silico modeling) "PostGenetics" (Genomics beyond Genes).
(4) While ENCODE stuck to data analysis on a grand scale, finally deciding in 4 years that "Junk" DNA was anything, but junk - e.g. FractoGene (LINK in a massively underfunded manner used the same 4 years to develop an algorithmic approach to what it actually may be. FractoGene built a new synthesis, to be ready when the "genes vs. junk" dogma melted down (See quantitative prediction and its experimental support at LINK .
P.S. It is essential to be focused on the Agenda, under whatever nomenclature.
pellionisz_at_junkdna.com