This actively developing guidebook covers bioinformatics methods for processing sequencing data from MinION flow cells.
It was a disaster before the New Year that I got a positive result from my predeparture SARS-Cov-2 RT-PCR test at a commercial test centre (Site 1) and had to cancel my international flights for the reunion with my wife. I was shocked by the result as I had been self-isolating for more than 10 days before the test with limited outdoor activities (such as shopping for groceries) and I had no COVID-19 symptoms at all. After this moment of confusion and disappointment, I did a lateral-flow-device (LFD) antigen test at home and got a negative result. The second LFD test on the other day also returned a negative result. In the afternoon of the same day, I had my second PCR test at a different site (Site 2), where a professional swabbed my tonsil and nasal cavity so thoroughly that I even smelled a hint of blood. The result came quickly in the next morning and I immediately booked my third PCR test from another site (Site 3) for the same morning. All the three sites are accredited by the UK Health Security Agency (UK HSA). All my results were reported to the NHS for test and tracing.
Despite the popularity of GFF3 format for genome annotations, to my knowledge there is no published tools for extracting DNA sequences of contigs from the GFF3 files and store them in a multi-FASTA file. EMBOSS
seqret is only able to pull out the last contig from the GFF3 file, whereas other tools aim to extract the DNA sequence per feature. Therefore, I develop two Linux one-liners in this post for extract contig sequences from a GFF3 file and transfer them to a FASTA file.
Here are my notes of the article about ClonalFrameML, a program that detects recombined regions in a multi-sequence alignment, infers phylogenetic relationships when correcting for recombination, reconstructs ancestral state, and imputes SNPs under a maximum-likelihood (ML) framework.
SRST21, ARIBA2, and KmerResistance (web service, code)3 are three widely used pieces of standalone software for read-based detection of target genes in bacterial genomes. Published in 2014, SRST2 is recognised as the pioneer amongst these three tools2, 3. In this post, I compare methodologies underlying these tools in a concise manner to shed light on the selection of appropriate software for gene detection. Particularly, I herein presume that detecting antimicrobial resistance determinants is the only use case. Whenever unspecified, software versions referred to in this post are: SRST2 v0.2.0, ARIBA 2.14.4, and KmerResistance v2.2.
Linux is a popular family of operating systems (OS) used in bioinformatics. Amongst its numerous distributions, Xubuntu is a lightweight derivative of ubuntu Linux, and aims to run on a machine with low system requirements. As a Windows user, I often need to switch to a Linux environment for program development and test. To this end, VirtualBox offers an easy-to-use but low-in-resources alternative to a dedicated physical machine or disc space (dual OS). This post records my key steps for setting up Xubuntu in VirtualBox for basic bioinformatic work.
The Python script gbk2tbl.py in my GitHub repository BINF_toolkit is a popular tool for preparing input files of NCBI Sequin from GenBank files. Nonetheless, this script has only supported Python 2 since its first release in 2015, causing inconvenience to some users. Today, I got some time to make the script compatible to Python 3 with the tool 2to3 and some manual adjustments. The new script has been tested under Python 3.5.2 and pushed to my GitHub.
Reliable and up-to-date databases play a pivotal role in reference-based detection of antimicrobial resistance genes (ARGs) in bacteria. Nonetheless, these databases differ in their content and quality, making it challenging to decide an appropriate reference database for a particular research project. In order to address this challenge, this post offers a review of several ARG databases that are publicly available and widely used in bacterial genomics. Particularly, I focus on databases that are still undergoing regular maintenance, and I do not discuss any program released with these databases for sequence search or statistical analysis.
I finally got some time this morning to write a Python script gbk2tsv.py, which converts several GenBank files into tab-delimited feature tables (plain text files with an extension “.tsv”). It can be a useful tool when we need to summarise genome annotations or acquire nucleotide and protein sequences of certain genomic features. Although the Holt Lab, where I did my PhD, has an in-house script to do a similar job, it is inappropriate for me to use or share that intellectual property for projects outside of the Holt Lab without a specific permission. Therefore, I decided to create a script from scratch after a discussion on genome annotation with Hao Luo, a PhD student at the Chalmers University of Technology, Sweden, during a lunch break of the course MESB19.