MetaNetX - Automated model construction and genome annotation for large-scale metabolic networks
Today, metabolic networks are well enough characterized that it is now possible to construct and analyze mathematical models of their behavior at a whole-genome level. This is not due to rich, extensive datasets – the existing data is far from comprehensive and even in the best-understood organisms, most kinetic parameters remain undetermined. Whole-network metabolic modeling has largely been enabled by new computational methods that involve the identification and mathematical definition of constraints due to physical laws, environmental conditions, and cellular regulation. However, even for the extremely well studied yeast Saccharomyces cerevisiae, only ~1,200 out of the predicted ~2,200 metabolic genes are contained in the most advanced network model, and there, only ~70% of the modeled reactions are functional. The situation is even worse for higher organisms – for plants, for instance, no genome-scale model exists to date.
The formalization of constraints distinguishes network knowledge bases from formal network reconstructions (models) that consistently represent genomic information, reaction biochemistry, and thermodynamics, and link this information to cell physiology. This integration en-ables quantitative and qualitative, experimentally testable predictions to close the systems biology cycle. Over the past decade, genome-scale metabolic network models – primarily for microorganisms - were highly successful in many application areas. By characterizing and predicting network structures and behavior, constraint-based models lead the way toward a fundamental goal in systems biology: a computational model of an entire cell. The reconstructions neglect dynamics, but they are an essential first step for any large-scale systems modeling effort. Their availability, quality, and coverage, thus, are critical for the entire field of (metabolism-related) systems biology.
The MetaNetX Technology Development project aims at developing integrated computational methods and tools for the automated reconstruction of genome-scale metabolic networks, including the prediction of novel reactions and pathways and the leverage of this information for refined genome annotation. Current reconstruction technologies have several severe limi-tations regarding network coverage and accuracy as well as unexploited potential for furthering annotation of genomic information, and for elucidating novel metabolic pathways and functions. For instance, there is a clear need for systematic methods for model assessment, construction and validation. Unknown reactions/metabolites and the necessary validation of database entries pose the most important challenges for model development.
Our key, novel argument is that overcoming the current technology limitations requires a tight integration of genome annotation methods development, systematic characterization of potential biochemical reactions, and development of computational methods for network reconstruction and validation. In particular, leveraging the potentials for genome annotation and function prediction, however, requires an unprecedented integration of computational and experimental approaches. Then, an evaluation of (structurally and thermodynamically) possible biochemical pathways in the network context can lead to precise and experimentally testable predictions of novel reactions and metabolites.
Two application areas include iterations with experimental analysis and hypothesis testing, aiming at a high-quality genome-scale recon-struction of budding yeast metabolism with focus on the network periphery, and at a first plant reconstruction for the more complex Arabidopsis thaliana metabolism.

