Bioinformatic resources for investigating clostridial metabolism

Solventogenic clostridia offer a promising and sustainable alternative to petroleum-based production of butanol — an important industrial chemical feedstock and fuel additive or replacement1. They also draw our attention for their potential in reducing the emission of greenhouse gases and relieving the threat of global warming. It is of paramount importance to elucidate the metabolism of clostridia for metabolism engineering and industrial applications of gas fermentation. In addition to standard experimental approaches, bioinformatics provides us with an efficient way to identify targets (genetic or biochemical) that can be controlled to improve the product formation. In this post, I briefly summarise bioinformatic resources that are publicly accessible to date for interrogating clostridial metabolism.

1. Annotation and model databases

  • BiGG Models (Biochemical, genetic and genomic knowledge base) is by far the largest repository for genome-scale metabolic (GSM) models1. It stores COBRA (Constraints Based Reconstruction and Analysis)3 models in SBML, JSON and MAT formats. (Systems Biology Research Group, University of California, USA)
  • KEGG currently is comprised of 18 databases storing information at systems, genomic, chemical and health levels. (Kanehisa Laboratories, Kyoto University, Japan)
  • SEED is a database for genome annotations and metabolic models. Its information can be retrieved using the SEED Viewer. The website of SEED is however poorly maintained, making it hard to access information.
  • BioCyc is a database of genomic and pathway information. It also provides several tools for browsing the database (see “BioCyc Tools” on the homepage of BioCyc). (SRI International, USA)
  • MetaCyc is a comprehensive reference database published in 2017 for storing information of metabolic pathways and enzymes from all domains of life9. (SRI International, USA)

2. Published clostridial genome-scale metabolic models

Solventogenic clostridia offer two advantages over engineered Escherichia coli and Saccharomyces cerevisiae: tolerance to high concentrations of butanol and the ability to co-ferment pentose and hexose sugars (primary products in lignocellulosic hydrolysates)1. GSM models are instrumental in guiding metabolic engineering such as prospective design and modification of metabolic networks1, 2. Published complete genome-scale metabolic models of clostridia include (draft models exist for some clostridia as well):

  • iCM925: built for C. beijerinckii strain NCIMB 8052 (KEGG genome accession: T00547), the parental strain of the mutant strain BA101 (which achieves the highest known tolerance to n-butanol amongst all microorganisms — 17–21 g/L)1. The model is distributed along with the article by Milne et al. in the SBML format.
  • iHN637: built for C. ljungdahlii strain DSM 13528 by Nagarajan et al2. C. ljungdahlii has been shown to be capable of carrying out microbial electrosynthesis2. The genome-scale metabolic model of strain DSM 13528 is available in BiGG in SBML, JSON and MAT formats.
  • iCac802: built for C. acetobutylicum strain ATCC 824 by Dash et al7. The model (SBML format) is only available as a supplementary file of the authors’ research article.
  • iSR432: built for a thermophilic anaerobe C. thermocellum strain ATCC 27405, which is a cellulolytic ethanologen10. The model is released in the SBML format as a supplementary file of the authors’ research article.

Note the nomenclature of models: i[abbreviation of first author’s name (given name - surname) or species name][number of genes in the model]. For instance, the model created by C. B. Milne and comprised of 925 genes is called iCM925 accordingly1.

3. Computational tools for metabolic modelling

  • ModelSEED: a web server for rapid reconstruction, comparison and analysis of metabolic models4. Nagarajan et al used ModelSEED to generate a draft metabolic model of C. ljungdahlii before manual curation2.
  • Simpheny Perl Application: a program used for generating SBML files with simulation specific information. Nagarajan et al used its AutoModel functionality to create a draft metabolic model from genome annotations of C. ljungdahlii2.
  • SMILEY: an algorithm for pathway gap-filling5 and it has been implemented in the COBRA toolbox v2.0. Solutions from SMILEY should be validated6.
  • There are several MATLAB toolboxes used for bottom-up GSM reconstruction or inspection. They can be accessed on GitHub.
    • COBRA Toolbox v3.0 (constraint-based reconstruction approach)11: this version was published recently (Feb 2019) as an update of the widely used COBRA v2.0 Toolbox.
    • RAVEN v2.0: performs de novo GSM reconstruction based on either a combination of the MetaCyc pathway database and the KEGG database or homology with an existing GEM8. RAVEN and COBRA Toolbox are two major MATLAB-based packages for constraint-based metabolic modelling8.
    • AutoKEGGRec: a fast and easy-to-use function designed for model reconstruction with genome annotations and functional information from the KEGG database12.
  • Agren, Wang, et al. list and compare other software (AutoGraph, IdentiCS, GEM system, MEMO Sys, FAME, Microbes Flux, CoReCo, Pathway Tools and merlin) in articles about RAVEN v1.0 and v2.013, 8. See their articles for details.

4. Protocols for genome-scale metabolic modelling

Since reconstructing a GEM model is labour and time intensive (Thiele et al, 2010), it is of vital importance to follow a sensible protocol to ensure the model’s quality.

  • Thiele, I. & Palsson, B. Ø. A protocol for generating a high-quality genome-scale metabolic reconstruction. Nat. Protoc. 5, 93 (2010). This is a profound and comprehensive review and protocol proposal for reconstructing GSM models. It also lists widely used data sources for metabolic reconstruction.


  1. Milne, C. B. et al. Metabolic network reconstruction and genome-scale model of butanol-producing strain Clostridium beijerinckii NCIMB 8052. BMC Syst. Biol. 5, 130 (2011). This is a well written, comprehensive, insightful and thought-provoking research article for beginners to learn microbial butanol production and how to build a metabolic network and subsequently a genome-scale metabolic model from annotations of a complete bacterial genome.
  2. Nagarajan, H. et al. Characterizing acetogenic metabolism using a genome-scale metabolic reconstruction of Clostridium ljungdahlii. Microb. Cell Fact. 12, 118 (2013).
  3. Schellenberger, J. et al. Quantitative prediction of cellular metabolism with constraint-based models: the COBRA Toolbox v2.0. Nat. Protoc. 6, 1290 (2011).
  4. Henry, C. S. et al. High-throughput generation, optimization and analysis of genome-scale metabolic models. Nat. Biotechnol. 28, 977 (2010).
  5. Reed, J. L. et al. Systems approach to refining genome annotation. Proc. Natl. Acad. Sci. 103, 17480 LP – 17484 (2006).
  6. Rolfsson, O. & Palsson Bernhard Øand Thiele, I. The human metabolic reconstruction Recon 1 directs hypotheses of novel human metabolic functions. BMC Syst. Biol. 5, 155 (2011).
  7. Dash, S., Mueller, T. J., Venkataramanan, K. P., Papoutsakis, E. T. & Maranas, C. D. Capturing the response of Clostridium acetobutylicumto chemical stressors using a regulated genome-scale metabolic model. Biotechnol. Biofuels 7, 144 (2014).
  8. Wang, H. et al. RAVEN 2.0: A versatile toolbox for metabolic network reconstruction and a case study on Streptomyces coelicolor. PLOS Comput. Biol. 14, e1006541 (2018).
  9. Caspi, R. et al. The MetaCyc database of metabolic pathways and enzymes. Nucleic Acids Res. 46, D633–D639 (2017).
  10. Roberts, S. B., Gowen, C. M., Brooks, J. P. & Fong, S. S. Genome-scale metabolic analysis of Clostridium thermocellum for bioethanol production. BMC Syst. Biol. 4, 31 (2010).
  11. Heirendt, L. et al. Creation and analysis of biochemical constraint-based models using the COBRA Toolbox v.3.0. Nat. Protoc. 14, 639–702 (2019).
  12. Karlsen, E., Schulz, C. & Almaas, E. Automated generation of genome-scale metabolic draft reconstructions based on KEGG. BMC Bioinformatics 19, 467 (2018).
  13. Agren, R. et al. The RAVEN Toolbox and Its Use for Generating a Genome-scale Metabolic Model for Penicillium chrysogenum. PLOS Comput. Biol. 9, e1002980 (2013).