It's time for a genomics reality check.
Despite the constant, glowing coverage of speedy, low-cost next-generation
DNA sequencing, whole-genome analysis, and consumer genomics, researchers still have
no idea what the vast majority of human genomic DNA does, nor the functional
consequence of variations in those sequences. Thus, few researchers actually need to
sequence entire genomes—yet.
For the moment, most next-gen projects have more limited aims, such as
"exome" sequencing (targeting that 1% of the genome, 250,000 exons or so, that
actually encodes protein), immunogenomics (profiling individuals' antibody gene
complement), or identifying variants in mere handfuls of genes.
Even if researchers actually desire whole-genome analyses, there's a
financial angle to consider: No matter how cheap gene sequencing gets, it's still
cheaper to sequence a fraction of the genome than to do the whole
thing—especially when studying large populations.
"Hypothetically, if you could enrich for 1% of the genome which captures 90%
of the most causative alleles, conceivably you could apply that to 100 times more
people," says George Church of Harvard University.
Until recently there was no easy way to do that enrichment; PCR is not easily
scaled. In the past few years, however, new techniques have emerged to make this
process, variously called "targeted resequencing" or "genomic partitioning,"
tractable.
The Scientist asked five researchers to give the pros and cons
of their chosen methods for fractionating genomes to feed the sequencing pipelines.
Here's what they said.
Raindance Technologies' RDT-1000 system merges droplets of PCR
primers (yellow) with your template genomic DNA (red) to create a pool of
aqueous micelles in oil—each pool essentially its own PCR reaction
chamber.
Researcher: Elaine Mardis, Associate Professor of Genetics and Codirector, the Genome
Center, Washington University, St. Louis, Mo.
Project: Sequencing cancer-associated genes in matched tumor and normal samples
from one or two hundred cancer patients. The goal: to identify rare variants
that cannot be found via traditional clonal sequencing approaches.
Technique: Mardis used Raindance Technologies' high-tech system, the RD T-1000. The
system makes a PCR vinaigrette: millions of individual reaction chambers, each
an aqueous bubble in an oil emulsion containing genomic DNA, reagents, and any
of 4,000 separate primer pairs, all in a single tube.
First, the team designs primer pairs for each desired genomic segment.
These are shipped to Raindance, which encapsulates each pair into 8-pL
microdroplets to produce a pool of aqueous micelles in oil. That emulsion is
then returned to the lab, where it is plugged into the instrument, along with
genomic DNA and other PCR reagents. Finally, the RDT-1000 merges the ingredients
into millions of discrete 22-pL droplets, each an individual reaction vessel,
which can be amplified in any thermal cycler.
"Our experience so far has been that it's well designed," she says. "The
interface is quite simple."
Considerations: Ideal for core facilities or small groups of labs, the method, says
Mardis, is "plug-and-play" and doesn't require a lot of optimization.
The technique requires a considerable upfront investment, both in terms
of hardware ($225,000) and oligos. Also, because each instrument run is limited
to 4,000 reactions, full exome amplification requires multiple runs.
Importantly, the process retains all the shortcomings of PCR, especially uneven
target amplification.
Finally, says Mardis, success (as with all genome-partitioning methods)
depends entirely on the uniqueness of the sequence being targeted. "Being able
to design a genome-unique set of capture probes or PCR products is not always as
easy as it sounds."
The targeting arms of the oligonucleotide strand (orange) identify
the sequence of interest on the genomic DNA (red) and facilitate the copying
and integration of the sequence (light red) into the circle. The
circularized target sequence can then be amplified.
Researcher: George Church, Professor of Genetics, Harvard Medical School
Project: The Personal Genome Project, a massive effort to read the genomes of up
to 100,000 individuals.
Technique: Though he hopes to be sequencing complete genomes by the end of the
year, for now Church uses "padlock probes" to focus exclusively on exonic
sequences (Nature Methods, 4:931-936, 2007).
Padlocks are 70-bp oligos containing two 20-base targeting arms flanking
a generic 30-base linker. When padlocks are incubated with single-stranded,
sheared genomic DNA, the targeting arms bind to the desired sequences, producing
a bimolecular circle in which the two arms flank the region to be sequenced. One
of the two arms then acts as a primer for DNA polymerase to copy the intervening
sequence, and ligase seals the circle. Finally, noncircularized DNA is digested
away, leaving only the circularized targets of interest, which are then
amplified with universal primers and sequenced.
"Basically, when the circle is formed and ligated, you can think of it
as locked" to exonucleases, says Church—hence the name.
Considerations: Church says the padlock method is both highly scalable—he uses it to
select all 258,000 human exons— and precise. Hybridization approaches always
pull down whatever off-target sequences (such as introns) the selected sequences
may carry with them, but "the padlock probe approach is precise to one base
pair," he says. "You get exactly what you want."
It also maximizes sequencing dollars by minimizing off-target
sequencing, Church says, though sequence bias—the "preferential capture of
certain sequences, hence requiring more sequencing to get the under-represented
sequences up to an adequate level"— remains a persistent, if diminishing,
problem.
Biotinylated oligonucleotides act as "bait" to catch the target
DNA sequences via hybridization. The entire complex is then captured on
streptavidin beads and amplified.
Researcher: Andreas "Andy" Gnirke, Research Scientist, Broad Institute of
Massachusetts Institute of Technology and Harvard University
Project: Finding an inexpensive way to read either exomes or specific
susceptibility loci in large numbers of affected and normal individuals.
Technique: Gnirke developed a method he calls "hybrid selection" (or solution
hybrid selection). He uses a custom 22,000-oligonucleotide microarray from
Agilent, in which each oligo contains a 170- base targeting sequence flanked by
15-base universal primer sites. First, the oligos are cleaved from the array and
PCR-amplified to introduce a T7 polymerase promoter at one end. The amplified
oligos are then transcribed in the presence of biotin-UTP to create a
biotinylated capture pool, or "bait." Next, the genomic DNA to be sequenced is
sheared and coupled to adaptors to create a "pond" of prey molecules, which
hybridize to the bait.
The resulting RNA:DNA hybrids are then captured on streptavidin beads,
eluted, and amplified to yield the sequencing template (Nature Biotech,
27:182-189, 2009). "We believe [this technique] currently gives the best balance
of specificity, uniformity, recovery of both alleles, and cost," he says.
Agilent has commercialized the method as the SureSelect Target Enrichment System
($600/reaction if you buy the 100-reaction kit).
Considerations: Because the method requires on-array synthesis, it works best for a
defined set of sequencing targets to be scanned over and over again. Adding new
probes can be expensive.
On the other hand, being able to create, test, aliquot, and store large
batches of oligos ups the technique's reproducibility, Gnirke says. With arrays,
by contrast, "there's no way you can really test the thing before you use it."
The technique also uses less DNA than arrays—about 500 ng instead of 10 μg.
The method is less precise than padlock probes, however. "We capture
fragments that have exon sequence and some flanking sequence," he says. "We
always get a certain amount of flanking bycatch that was specifically captured
but still not what you want."
Double-stranded oligonucleotide strands act as selector probes
which, when incubated with DNA , find the target DNA sequence. The probe's
vector sequence circularizes the DNA sequence of interest. The circles can
then be amplified.
Researcher: Hanlee Ji, Assistant Professor of Medicine, Stanford University School of
Medicine, and Senior Associate Director, Stanford Genome Technology Center
(SGTC)
Project: Optimizing genomic biomarker discovery with a cost-effective, scalable,
and generic sequencing pipeline that can process hundreds of patient samples.
Technique: Ji and SGTC director Ron Davis have devised several approaches to the
genome-partitioning problem; one uses what Ji calls "targeted genomic
circularization" or alternatively, "selector probes."
Selector probes are 80-nucleotide-long double-stranded oligos, with a
central 40-base generic "vector" sequence and two targetspecific overhanging
termini. Genomic DNA is digested with a restriction enzyme, denatured, and
incubated with the selector probe. When the ends of the probe find their target
sequences, the result is a partially double-stranded circle in which the two
ends of the genomic fragment are bridged by the probe. These molecules can then
be amplified and sequenced (PNAS, 104:9387-9392, 2007).
"The reaction is very simple," says Ji. "It's multistep, it can be done
in any molecular biology lab, and it is very easy to integrate with any
next-generation sequencing system."
Considerations: The selector probe strategy fills a different niche than do other,
exome-scale methods, says Ji. "We see this as something any group could use to
target anywhere from 10 to, say, 1,000 genes of interest."
That's because oligos are both inexpensive and stable, and because
normalization— the tricky process of ensuring that different sequences amplify
to the same degree—is easily accomplished by adjusting oligo sequence or
concentration.
The strategy does impose several design constraints. To be efficient,
the two probe ends must target sequences that are between 150 and 1,000 bases
apart. Also, one of the ends must correspond to a restriction site. Still, even
if some probes work better than others, you can simply use redundant oligos, Ji
says. "It's just a simple matter of overengineering to compensate for potential
failures of a given oligonucleotide."
Array-based capture, either made from a nitrocellulose filter or
obtained commercially, provides a simple and flexible
approach.
Researcher: Jonathan and Christine "Kricket" Seidman, Department of Genetics, Harvard
Medical School
Project: Searching for genetic variants of cardiac disease by performing deep
sequencing of cardiac disease–associated genes in large populations.
Technique: The Seidmans opted for surface capture-based hybridization, a solidstate
complement to Gnirke's approach.
They separately PCR amplify each desired exonic sequence—about 45,000
bases representing 11 genes in the case of hypertrophic
cardiomyopathy—concatenate them into a single linear molecule, and bind that to
a nitrocellulose filter. They then shear the genomic DNA to be sequenced, attach
flanking PCR primers, hybridize those molecules to the filter, and sequence what
bound. The technique, says Jonathan, "is absolutely based on Southern blotting
technology," and is just as simple.
Considerations: One primary strength of the approach is flexibility. PCR products are
easier to generate than long oligos, and anyone with even a modicum of technical
know-how can generate a filter and modify it at will, encompassing new genes as
they come up. Up to 30 MB can be thus targeted, says Jonathan.
The method is also inexpensive, requiring only PCR primers and
nitrocellulose, and capable of detecting copy number variants PCR-based methods
overlook. "We think these capture methods are going to be a useful tool for
measuring a two-fold change in the amount of sequence," he says.
If you prefer commercial methods, Roche/NimbleGen recently released an
array-based genome-partitioning product. The NimbleGen Sequence Capture 2.1 M
Human Exome array uses oligonucleotide probes to capture more than 180,000 exons
and microRNA sequences.