All Systems Go
Some peculiar microorganisms are showing systems biology can color in what's missing from models of biochemical and cellular networks.
n April 22, 2006, Nitin Baliga, a microbiologist at the Institute for Systems Biology in Seattle, was spending a lazy Saturday afternoon at home, when he noticed an enticing email in his inbox from his ISB collaborator Richard Bonneau. The subject line: "woooooohoooooo!" Baliga's team had just constructed a new model that could predict the molecular-level responses of a free-living cell to genetic and environmental changes. That cell, however, was not Escherichia coli or yeast. It was the little-known archaeon Halobacterium salinarum, a tiny extremophile that thrives in highly saline lakes such as the Great Salt Lake and the Dead Sea. The model was accurately predicting Halobacterium's dynamics at the genome scale. But could it predict new molecular-level responses to changes in environmental conditions not tested in the initial data used to construct the model? Yes, Bonneau had just found out, and he was so thrilled that he couldn't wait to share his findings—or finish his sentences. "Good news:" Bonneau wrote. "Over 130 new conditions the inf net trained on ~150 conds (depending on # of cols in bicluster) is predictive to the same degree we ...! It was not a fluke ...! ..." The email was almost unintelligible, even to Baliga. Still, the frequent ellipses and exclamation points all indicated one thing: The model was predictive of novel conditions. "That was extremely exciting but at the same time very puzzling," Baliga says. "I couldn't explain why that would be possible." It seemed almost magical, as if the model "knew" about experiments and environmental stimuli it had never encountered before. Baliga's model was not the first predictive model in systems biology, mind you; others had devised models to discover unidentified gene networks. But those were mostly based on compendiums of disparate datasets, and could only implicate novel genes that were linked to genes with known functions—so-called "guilt by association" predictions. The Halobacterium model was different; it could discover complex biological interactions that no one knew anything about, and on unprecedented temporal and spatial scales. "If you want to get into the nitty-gritty of the time dependence of how [genes] work," says Edward Marcotte, a computational biologist at the University of Texas in Austin, "you need a dynamic model." That's exactly what Baliga did. He studied everything simultaneously—gene expression levels, protein interactions and dynamics, and phenotypic characteristics—all at different time points under a range of changing but controlled environmental conditions. And all performed in-house: A lofty and expensive venture to be sure, requiring many years of research and several million dollars in funding from the National Science Foundation and the US Department of Energy; but one that eventually paid off.
In 2000, after Baliga finished his PhD at the University of Massachusetts, Amherst, where he focused almost exclusively on a single Halobacterium gene, the light-driven proton pump bacterio-opsin, he decided that he didn't want to continue dissecting his pet organism gene by gene. Instead, he was determined to understand all of the genes, and how they all fit together.
"In Halobacterium, anything we find is generally new." —Nitin Baliga
Baliga initially turned to studying Halobacterium because of the vast unexplored biology of archaea. "In Halobacterium, anything we find is generally new," he says. The same mindset applied when formulating his ambitious model. Since relatively little genetic and molecular information existed for Halobacterium, it served as the optimal test for the systems biology approach. The field certainly has something to prove. For example, last December, systems biology's most famous skeptic, Nobel laureate Sidney Brenner, described systems biology as a "substitute for thinking" in an address to the Indian Institute of Science in Bangalore. Part of the field's predicament is no doubt an identity crisis—no one can quite define what exactly it is. Some say it's a field of study, focused on the interactions between various biological components. Others describe it as a paradigm, one of integration instead of reductionism. Still others adopt a much more mechanistic definition, viewing systems biology as an iterative cycle of theorize, analyze, model, and repeat.
Nitin Baliga
In an abstract sense, all three operational definitions are correct, because they all aspire toward the same goal of discovering emergent biological properties. In systems biology's early days, however, no one had convincingly shown that the approach could live up to its promise. Most researchers had turned to well-trodden research organisms, such as E. coli, for which there was already a pre-existing wealth of data. This handed critics a good argument for castigating systems biology as little more than glorified molecular biology. What was new about this approach, people asked? In response to the naysayers, Baliga turned to Halobacterium. "You had to demonstrate that this [systems-wide] approach is successful by using something that is way away from what is considered normal," he says from his Seattle office, a short walk away from the celebrated Fremont Troll. "I think people would have been less excited if it were E. coli we had solved. They would have looked at all our discoveries and said that this is something that we knew. These relationships were previously figured out using molecular biology approaches, so why is this exciting?" Most microbes don't live in a relatively uniform environment like the human gut; a black mark against E. coli. In the real world, organisms must respond to the internal and external challenges of change and flux. E. coli is "not really set up to change its phenology and physiology in response to environmental change," says Jim Fredrickson, of the Pacific Northwest National Laboratories (PNNL) in Richland, Wash. Indeed, it's become increasingly clear that the pioneering molecular biologist Jacques Monod was wrong when he said, "What is true for E. coli is true for the elephant." The diversity of life argues against this. So, many researchers are now turning to out-of-the-mainstream microbes to help systems biology get off the ground. Starting with a clean slate lets scientists prove that the approach can, in fact, work. Halfway across Washington State from the ISB, Fredrickson leads a group of researchers that affectionately calls itself the Shewanella Federation, named after their bug of choice, the metabolically versatile Shewanella oneidensis. This bacterium grows naturally almost everywhere, both in oxygen-rich and oxygen-deprived surroundings, using a diverse array of electron acceptors, and on a broad range of carbon sources. "Shewanella found a way to live quite successfully at that interface where there are changing nutrients and physiological conditions," says Fredrickson. Two time zones away, Himadri Pakrasi, a biochemist at Washington University in St. Louis, is advancing another microbial system, the cyanobacteria Cyanothece. This marine microbe is one of the simplest organisms with a rhythmic circadian clock, capable of photosynthesizing in the daylight and fixing nitrogen in the dark. "These organisms are very good at doing something that very few organisms know how to do," says Pakrasi. Cyanothece is being developed as "a template for how to do systems biology," adds Jason McDermott, a PNNL computational biologist. "The techniques that you develop from a modeling standpoint in these simpler systems will extrapolate to more complex systems."
At the tail end of his PhD, Baliga was part of a large team that sequenced Halobacterium's small 2.6 Mb genome. The analysis of the first-draft sequence revealed that more than half of Halobacterium's 2,400-odd protein-coding genes had no known function, including close to 1,000 genes that were unrelated to any previously reported in other organisms.1 Tackling such a multitude of unknowns all at once in any organism, especially one from a peculiar domain of life, was sure to be difficult, Baliga admits, but he wasn't deterred. "My goal was to deconstruct the whole bug," he says. "How I was going to do it in detail, I did not know." Shortly thereafter, Baliga joined the then-fledgling ISB as a postdoc working with Leroy Hood, the so-called godfather of systems biology. He set to work developing the basic lab tools to achieve his then "pie in the sky" idea, as he calls it. The knock-out genetics needed improvement, no suitable protein expression system existed for Halobacterium, and the microbe's high salt environment denatured all the standard enzymes and antibodies. Once he got everything up and running, Baliga continued to amass data, including microarray transcript profiles, gene knockout responses, and vast protein catalogs. "All of these together gelled the whole effort rapidly," Baliga says. Not only did the lab techniques get better, faster, and cheaper, but importantly, analyzing the preliminary results helped identify critical computational problems that needed solutions before genome-scale modeling would be feasible.
Jim Fredrickson
Baliga set up his own research group at the ISB in 2002, and together with Bonneau and ISB physicists-turned-computational biologists David Reiss and Vesteinn Thorsson he devised a three-pronged approach for transforming the tangles of data culled from more than 150 different experiments into a cohesive model that could reconstruct global gene networks. They first developed an algorithm called cMonkey that grouped functionally related genes in an environmental condition-dependent manner. This is important because patterns of co-expression often vary significantly across diverse environmental settings, which leads to genetic relationships that are valid under some, but not all, conditions. This first step was critical for later modeling success, says Bonneau, now at New York University. "If you screw this step up, you're hosed." Next, the researchers integrated the predicted gene clusters with mRNA and protein time series data to create a dynamic gene network model with a temporal component. Inferelator—the tool Baliga's team developed, referred to as "inf" in Bonneau's ecstatic email—took the cMonkey gene clusters, incorporated real-time expression data, and created a suite of mathematical equations that could successfully predict many novel gene regulatory relationships. Lastly, to visualize and analyze the copious amounts of data, the researchers developed Gaggle, an open-source software system for integrating different bioinformatics tools and databases. "Then we just sat staring at [the model] for a year or two trying to figure out what the heck we had done and what it all meant," Baliga says. In the meantime, Baliga's team had accumulated additional data from around 130 new experiments, including novel environmental perturbations, unique gene-environment combinations, and different time series measurements. Since these data were completely unlike what was used to create the algorithms, they provided the perfect test of the model's predictive power. Bonneau crunched the new data and it worked. The model's new predictions of Halobacterium's cellular turnover matched the actual experimental results with the same precision as in the data used to fit the model. And it could spit out accurate predictions of the transcriptional responses of more than 1,900 genes (around 80% of the genome).2
"The techniques that you develop from a modeling standpoint in these simpler
systems will extrapolate to more complex systems." -Jason McDermott
"Their network is predictive in addition to providing [gene] topology," says Tim Gardner, associate director of computational biology at Amyris Biotechnologies, a bioenergy company in Emeryville, Calif. "If you stimulate gene A, it says what will be the likely responses in the rest of the cell." The Halobacterium gene network is probably the most powerful predictive regulatory model to date, adds Michael Laub, a microbiologist at the Massachusetts Institute of Technology. "It really demonstrates where the field is going and the sorts of things that are going to be possible." To Baliga, the reason the model can predict responses to so many novel stimuli is clear: Although there are near-infinite numbers of environmental factors, many of those different stimuli are linked. For example, radiation is connected to temperature, which in turn affects gas solubility, pressure, and salinity, to name a few. Thus, if the radiation changes, the bug's built-in wiring which has been molded by billions of years of evolution, anticipates future environmental changes in linked factors and adjusts its gene expression accordingly. So even though Baliga never primed the model with salinity, for example, he could accurately predict the relevant salinity-related genes as they are often the same ones that change with temperature or radiation. This interconnectedness "is the fundamental property of biological systems," Baliga says.
A fully predictive systems biology model for Shewanella is "not quite there yet," says PNNL microbiologist Alexander Beliaev. But the Shewanella researchers are narrowing in on one, although they're taking a different approach than Baliga's group. In addition to tackling a model focused on gene transcript levels—a so-called "gene regulatory network," for which the researchers have modeled more than 1,000 gene interactions—the group is pursuing a metabolic model that can predict growth and metabolism under various environmental conditions. Shewanella is ideal for such a model because it flourishes nearly everywhere.
Himadri Pakrasi
Of Baliga's Halobacterium model, Beliaev says, "This is a good approach. But what you have is a regulatory network. To have a [fully] predictive model, you also need to have information about the metabolic network, because in the very end … the metabolic model is what's going to predict your cell's behavior." Currently, the Shewanella metabolic model contains 774 reactions, 634 metabolites, 783 genes, and counting. This is a ways behind E. coli's best metabolic model with its more than 2,000 reactions, 1,000 metabolites, and 1,200 genes,3 but ahead of other microbes such as Clostridium acetobutylicum. Eventually, Beliaev hopes that by understanding metabolic responses to the environment, his team will be able to use Shewanella in toxic metal bioremediation and in biofuel production. Pakrasi and his colleagues are also hot on the heels of genome-scale models in Cyanothece. At the last count, their metabolic model was on par with Shewanella's, with 719 reactions, 749 metabolites, and 574 genes. "We're still definitely in the mid-stages of the [metabolic] model," says Jennie Reed, a bioengineer at the University of Wisconsin-Madison, who works on the project. Meanwhile, the gene regulatory model is progressing quite rapidly. Last year, Pakrasi's team published Cyanothece's genome and transcriptome, and now they're working on the proteome.
"To suddenly have an order of magnitude increase in data acquistion is a challenge. But it's a good challenge." -Dick Smith
The initial proteomic results have important implications for all biological research, says Dick Smith, a chief scientist at the PNNL who collaborates with Pakrasi. "In going from the transcriptome to the proteome there's not a one-to-one correlation," Smith says. Rather, there's only about a 50% overlap. "Where we find concordance [between the proteome and transcriptome], our models that are based on time series data work very well," adds Pakrasi. "But 50% of the time they're not working simply because we're not getting into more detailed [proteome] analysis." Smith is starting to make more detailed analyses possible by vastly improving the mass spectrometry techniques used for high-throughput proteomics. This has created the same obstacle encountered by Baliga's group before him, though—namely, a glut of data to sort out. "To suddenly have an order of magnitude increase in data acquisition is a challenge," he says. "But it's a good challenge." In response, McDermott developed a bioinformatics tool to graphically explore gene networks in Cyanothece, whose gene activity waxes and wanes in a predictable fashion throughout the day. Yet that's not always intuitively obvious when you stare at the heat maps or time plots that researchers have traditionally relied upon, McDermott says. So when he first created his illustration, he figured Christmas had come early: Not only was the software working, but through his graphical representation, which looked like a circular wreath, he could finally make sense of the immense Cyanothece dataset. "It portrays that temporal and cyclical nature of this data in a more intuitive fashion," he says.
Jason McDermott's 'wreath' of gene activation in Cyanothece
The image depicted a ring of interconnected genes, all mapped neatly onto time. If you look at the top of the ring, McDermott explains, you can find all the genes that are active at dawn. And at the bottom, sit all the genes that peak at mid-day. "In this way the wreath mimics a clock," he says (See figure to the left). Currently, the Cyanothece model can predict some aspects of the regulatory network, such as which genes are essential for survival, but the scientists can't yet model time-dependence or responses to novel stimuli. "The models that we've generated are predictive in a certain way," McDermott says, "but we'd need a few more datasets to get a good predictive model." At the moment, they only have overlapping transcriptomic, proteomic and metabolomic data from the same sample for one environmental condition—the standard 12 hour light, 12 hour dark cycle. Still, considering that Cyanothece's genome and first-pass transcriptome were only published last year, Pakrasi suspects that more predictive models won't be far off. "What we've been doing for the last three to four years is we were generating just the baseline data compared to other well studied organisms," he says. "We've made significant process… I hope 2009 is the year for a [fully predictive] model." Have a comment? E-mail us at mail@the-scientist.com
1. W.V. Ng et al., "Genome sequence of Halobacterium species
NRC-1," Proc Natl Acad Sci, 97:12176–81, 2000.
2. R. Bonneau et al., "A predictive model for transcriptional control of
physiology in a free living cell," Cell, 131:1354–65, 2007.
3. A.M. Feist et al., "A genome-scale metabolic reconstruction for Escherichia coli K-12 MG1655 that accounts for 1260 ORFs and
thermodynamic information," Mol Sys Biol, 3:121, 2007.
Advertisement
Rate this article
Why E.coli is just a well-studied benchmark ! by Vinay Rale [Comment posted 2009-03-30 04:25:22] Comparison to E.coli is to have a very well known benchmark. We know much more about E.coli.
For that matter every bacterium is versatile , till we find out its versatility in different niches and its own. Thanks. Vinay Rale Good article but by Matthew Grossman [Comment posted 2009-03-03 15:28:34] Good article but the comments regarding E. coli, other than it is very very well studied, are sheer nonsense.
E. coli is one of the most metabolically versatile microbes and certainly experiences a vast range of environmental changes in its daily life. Response by Nitin Baliga [Comment posted 2009-03-03 14:29:18] I agree with the comment that E. coli (for that matter any organism) is capable of adapting to environmental changes. I also do not subscribe to the idea that there are housekeeping genes that are constitutively expressed and not regulated. There is ample evidence that every gene in any organism is under some type of regulation; for example, in our study of Halobacterium over 80% of genes were included in the model suggesting they were differentially regulated in some or all of the environments we tested.
With regard to the nature of the model: we have shown that disregarding the time component can reduce the predictive power of the Inferelator model suggesting that regulatory influences in the inferred network are causal. Furthermore, the model recapitulates and extends a lot of known biology and has also provided experimentally testable hypotheses that have led to novel biological insights. These issues have been addressed very carefully in the Cell paper as we think the model should eventually represent true operational relationships within a cell. Echo E. coli coment and ... by Ellen Hunt [Comment posted 2009-03-03 12:11:24] E. coli doesn't navigate the extremes of halobacteria, but it definitely changes expression.
I would be more curious about the system developed because I worked with AI and pseudo-AI approaches to other problems at one time. This sounds like some type of training dataset into an "inference engine". But how was that inference system made? A problem with such systems is that they can present you with results without being able to enlighten you as to what exactly is going on. In other words, predictive is not necessarily intelligible. E. coli by Joan Slonczewski [Comment posted 2009-03-03 08:47:19] This article about systems biology of haloarchaea is very interesting, but it incorrectly states that E. coli lives in a uniform environment. In fact, E. coli navigates extremes of pH, oxygen, nutrient levels, and in some cases salinity. The global stress response studies of E. coli pioneered by Fred Neidhardt and colleagues laid the groundwork for studies such as those reported here. |
Register for FREE Online Access
Subscribe to the Magazine