|
ALLENTOWN, PA — The potential for rapid discovery of new, disease-causing microbes, is on the horizon with successful results from a study with a new computerized technique — computational subtraction — that uses DNA matching to isolate and identify microbial gene sequences.
Computational subtraction is an in silico approach that takes advantage of the nearly completed DNA sequence of the human genome, made available through the Human Genome Project. By subtracting out the full complement of human DNA sequences from DNA libraries derived from human tissues, researchers at the Dana-Farber Cancer Institute and Harvard Medical School found they were left with a small number of DNA sequences, presumably of nonhuman origin. They could then search these sequences for evidence of microbial genes, and establish possible links between previously unknown organisms and human disease.
Senior author Matthew Meyerson and colleagues described the computational subtraction method — and how they used it to identify the sequences of known pathogens in human DNA libraries — in February Nature Genetics. A preprint, online version of the report appeared on January 14, 2002. The idea was to look "for one piece of foreign DNA that is consistently associated with a given disease," explained Meyerson.
Existing alternatives for identifying infectious agents and isolating foreign DNA species rely on microbial culture to grow the organisms, or are based on molecular techniques. These include polymerase chain reaction (PCR) to amplify conserved microbial sequences, or subtractive hybridization, which involves physically removing the human DNA from a sample using hybridization techniques and isolating the leftover sequences for further analysis.
The beauty of computational subtraction is that the computer does the work. It filters out human genomic sequences, leaving behind the genetic material of nonhuman origin. "The potential power of sequence-based computational subtraction lies in its ability to identify new nonhuman sequences in a comprehensive and unbiased manner," argued the authors.
This technique might hold the key to unlocking the suspected relationships between infectious agents and a variety of chronic diseases, including type 1 diabetes, multiple sclerosis, rheumatoid arthritis, lupus erythematosus and atherosclerosis. Although no definitive links have been demonstrated, the discovery in recent years that stomach and duodenal ulcers are caused by Helicobacter pylori infection, and not by stress and diet as had previously been assumed, lends credence to the belief that many more diseases may have an infectious etiology. This could include some cancers, with current circumstantial evidence pointing most strongly toward lymphomas, bladder cancer, and some forms of lung cancer.
Identifying such links will require screening a large sample of disease-specific tissues and looking for particular foreign DNA sequences that repeatedly associate with a specific disease state. These foreign gene sequences might or might not have become incorporated into the human genetic material. It could then be possible to express proteins from the microbial sequences and develop antibodies to these proteins.
In the report, the authors described one approach for applying computational subtraction. They chose as their starting material more than 3.2 million sequences from the GenBank human EST database. This library contains expressed sequence tags (ESTs), which are fragments of genes collected from many different people, including individuals with a variety of diseases. Using the MEGABLAST algorithm, computers then compared sequences from existing human and mouse databases to those in the human EST database, filtering out sequences that matched with a sufficiently high alignment score. This computational process resulted in 65,839 unmatched ESTs.
On comparing this set of ESTs to sequences contained in the GenBank nucleotide database, the team concluded that the unmatched sequences could have come from several sources: known pathogenic organisms, including bacteria, viruses and fungi (hepatitis B and C viruses, human papilloma viruses, cytomegalovirus, Kaposi sarcoma herpes virus and Epstein-Barr virus were among those identified), microbial contaminants from cell culture and other research materials (such as Escherichia coli, Pseudomonas aeruginosa, and Mycoplasma species), unsequenced regions of the human genome (the entire human genome sequence has not yet been completed) and poor quality sequences. Finally, among the remaining sequences, which do not share homology with the contents of existing sequence databases, could be gene fragments from novel microbes.
"I think we will find targets for lots of different diseases with this approach [for pathogen discovery]," suggested Meyerson. The group will now begin to make their own EST and genomic libraries from disease-specific tissues, sequence the clones, apply the computational subtraction technique to filter out human genomic sequences, and search for new pathogens from among the residual foreign gene sequences. They would then have to determine whether those pathogens played a causative role in the particular disease processes being considered.
The team pointed out that "the presence of a sequence in multiple independent samples from a given disease-tissue type, followed by experimental investigations of candidate sequences, will be essential" for distinguishing pathogenic microbes from contaminating organisms and for verifying the proposed relationship between a disease and a pathogen.
In the future, when a new, acute disease of unknown etiology appears (such as occurred with HIV infection and AIDS), this computational filtering technique could prove useful for rapidly determining whether a novel or re-emergent microbe was involved and for identifying at least portions of its genome.
Finally, the paper presented a proof-of-principle experiment, in which the authors evaluated an EST library from HeLa cervical carcinoma cells, which are known to carry human papillomavirus-18 (HPV-18). Using a variety of filters — including reference human RNA sequences, mitochondrial, vector and repeat element sequences, and human and mouse genomic sequences — and repeating these filters at increasingly higher stringencies, the team ultimately reduced the 7,073 sequences in the HeLa EST library to 22 sequences. Two of those remaining sequences matched sequences from HPV-18. The authors argued: "If HPV-18 were a new candidate infectious agent, we would need to verify its association with cervical carcinoma further. But, HPV-18 is already a known cervical carcinoma agent, present in 20% of cases."
The results of this exercise proved the feasibility of using computational subtraction to identify microbial gene sequences in genetic material derived from human tissues. The authors concluded: "Further experiments could then distinguish sequences of benign commensal organisms from pathogen sequences by assessing the strength of their association with disease."
References
|