gbk2tsv.py: tabulating genomic features in GenBank files

I finally got some time this morning to write a Python script gbk2tsv.py, which converts several GenBank files into tab-delimited feature tables (plain text files with an extension “.tsv”). It can be a useful tool when we need to summarise genome annotations or acquire nucleotide and protein sequences of certain genomic features. Although the Holt Lab, where I did my PhD, has an in-house script to do a similar job, it is inappropriate for me to use or share that intellectual property for projects outside of the Holt Lab without a specific permission. Therefore, I decided to create a script from scratch after a discussion on genome annotation with Hao Luo, a PhD student at the Chalmers University of Technology, Sweden, during a lunch break of the course MESB19.

Notes of an online metagenomics course

In this post, I compile my notes of the course Metagenomics applied to surveillance of pathogens and antimicrobial resistance. This three-week course is offered by the Technical University of Denmark and is freely accessible at Coursera. As a graduate researcher working on antimicrobial resistance (AMR) in bacterial populations, I have read countless pieces of literature about bacterial population genomics, surveillance and metagenomics in the most recent four years, and I am supposed to be familiar with the content of this course. Nevertheless, the course remains quite helpful to me since it leads me to build a comprehensive knowledge framework of metagenomics from individual concepts. Here, I focus on knowledge that was once unfamiliar or ambiguous to me, and it may be new to some readers as well. More information can be found in course materials on Coursera.

To tree or not to tree: an introduction of phylogenetic networks

Phylogenetic reconstruction is of crucial importance to elucidate bacterial population structure, epidemiology and evolutionary histories. By far phylogenetic networks and trees are the most common approaches used for studying the evolutionary history of a bacterial population. However, concepts and methodology underlying phylogenetic reconstruction can be challenging to beginners. As such, I share my notes on relevant literature in this post to address these obstacles. In particular, I compare different kinds of phylogenetic networks to show their pros and cons under various conditions.

A tutorial for microbial genome-scale metabolic modelling: [3] understanding the SBML format

The systems biology markup language (SBML) is a community-driven, software and platform independent standard for expressing and exchanging systems models between different simulation and analysis software. It is defined using the unified modelling language (UML) and represented using the extensible markup language (XML)1. The SBML does not aim to produce model files that can be readily read by humans, but to provide different software with a unified medium for exchanging models. Each piece of software can then translate imported models into its own internal format1. It is important to understand the SBML for metabolic modelling and engineering because this data language has quickly become the most popular standard of model files since its first publication in 20031.

A tutorial for microbial genome-scale metabolic modelling: [2] building a draft metabolic network from genome annotations of a single bacterial isolate

In my first post of this tutorial, I have demonstrated basic ways to inspect a genome-scale metabolic model (GEM). Now, let’s get our hands dirty — to reconstruct a draft metabolic network from genome annotations, which is the starting point of the protocol proposed by Thiele and Palsson for bottom-up GEM construction1, 2. In this post, we will be using several bioinformatic tools to reconstruct draft metabolic networks from annotations of the fully resolved chromosomal genome of Clostridium beijerinckii str. NCIMB 8052 (KEGG organism code: cbe; PATRIC genome ID: 290402.41), a well-known butanol-producing microorganism. A GEM iCM925 of this strain has been published by Milne et al3.

Bioinformatic resources for investigating clostridial metabolism

Solventogenic clostridia offer a promising and sustainable alternative to petroleum-based production of butanol — an important industrial chemical feedstock and fuel additive or replacement1. They also draw our attention for their potential in reducing the emission of greenhouse gases and relieving the threat of global warming. It is of paramount importance to elucidate the metabolism of clostridia for metabolism engineering and industrial applications of gas fermentation. In addition to standard experimental approaches, bioinformatics provides us with an efficient way to identify targets (genetic or biochemical) that can be controlled to improve the product formation. In this post, I briefly summarise bioinformatic resources that are publicly accessible to date for interrogating clostridial metabolism.

A technical overview of C1 gas fermentation

Carbon dioxide (CO2) and methane (CH4) are abundant one-carbon (C1) components of greenhouse gases and their atmospheric concentrations have seen a drastic increase since the industrial revolution. Besides industrial activities, agricultural practice also causes substantial emissions of these two kinds of gases. Nowadays it is an urgent demand to reduce emissions of greenhouse gases for controlling global warming. Biological conversion of C1 gases to industrial high-value hydrocarbon-based chemicals (such as the ABE — acetone, butanol and ethanol1) via fermentation has been proven to be an effective approach to meet the demand without completing for photosynthetic resources (e.g., food) or land2. Through converting waste C1 gases into biofuels, we can reduce our reliance and demand on fossil fuels, which in turn reduces our total carbon emission. Because of this great environmental benefit, here I outline technologies that recycle waste C1 gases for industrial and environmental purposes.

Understanding SRST2 outputs

SRST2 is a widely used tool screening Illumina reads of bacterial genomes for known genes (that is, targeted gene detection). Its capability includes MLST profiling and detection of known antimicrobial resistance genes (ARGs), virulence genes, plasmids, etc. I have been using SRST2 throughout my PhD project and coded my package GeneMates on the grounds of SRST2’s outputs. Here, I explain the output formats of SRST2 in order to help users to gain a better understanding of this versatile tool. Comments and corrections from readers are welcomed since this post is based on my own understandings and experience.

Population structure, phylogenetic signal, recombination vs. mutations

Recently, I reviewed several concepts about recombination and mutation in bacterial genomes when I was revising my manuscript of GeneMates. In this post, I summarise my understandings to two groups of terms and two measures (r/m and ρ/θ) that are relevant to these biological events, and tabulate values of these measures in six bacterial species.