A generalised Bayesian model for the probability of getting a false-positive PCR result

It was a disaster before the New Year that I got a positive result from my predeparture SARS-Cov-2 RT-PCR test at a commercial test centre (Site 1) and had to cancel my international flights for the reunion with my wife. I was shocked by the result as I had been self-isolating for more than 10 days before the test with limited outdoor activities (such as shopping for groceries) and I had no COVID-19 symptoms at all. After this moment of confusion and disappointment, I did a lateral-flow-device (LFD) antigen test at home and got a negative result. The second LFD test on the other day also returned a negative result. In the afternoon of the same day, I had my second PCR test at a different site (Site 2), where a professional swabbed my tonsil and nasal cavity so thoroughly that I even smelled a hint of blood. The result came quickly in the next morning and I immediately booked my third PCR test from another site (Site 3) for the same morning. All the three sites are accredited by the UK Health Security Agency (UK HSA). All my results were reported to the NHS for test and tracing.

Linux one-liners converting GFF3 to FASTA files of contigs

Despite the popularity of GFF3 format for genome annotations, to my knowledge there is no published tools for extracting DNA sequences of contigs from the GFF3 files and store them in a multi-FASTA file. EMBOSS seqret is only able to pull out the last contig from the GFF3 file, whereas other tools aim to extract the DNA sequence per feature. Therefore, I develop two Linux one-liners in this post for extract contig sequences from a GFF3 file and transfer them to a FASTA file.

Notes about ClonalFrameML

Here are my notes of the article about ClonalFrameML, a program that detects recombined regions in a multi-sequence alignment, infers phylogenetic relationships when correcting for recombination, reconstructs ancestral state, and imputes SNPs under a maximum-likelihood (ML) framework.

Comparisons between SRST2, ARIBA, and KmerResistance

SRST21, ARIBA2, and KmerResistance (web service, code)3 are three widely used pieces of standalone software for read-based detection of target genes in bacterial genomes. Published in 2014, SRST2 is recognised as the pioneer amongst these three tools2, 3. In this post, I compare methodologies underlying these tools in a concise manner to shed light on the selection of appropriate software for gene detection. Particularly, I herein presume that detecting antimicrobial resistance determinants is the only use case. Whenever unspecified, software versions referred to in this post are: SRST2 v0.2.0, ARIBA 2.14.4, and KmerResistance v2.2.

Installing environment modules on Xubuntu

In addition to Conda, [environment modules](https://en.wikipedia.org/wiki/Environment_Modules_(software)) provide users with a convenient approach to switching software environments on Linux machines. This approach is widely used on computer clusters that offer computational services to a large number of users, and the environment modules are shared by authorised users. These modules, however, are not Linux kernel modules, which are automatically launched by the OS at start-up, and they should be manually loaded to the OS by users. I learnt how to use module commands for bioinformatic analysis when I was studying at the University of Melbourne. Loading a module essentially modifies your environmental variable `$PATH`. In this post, I set up a module manager for users of my Xubuntu system.

Setting up Xubuntu in VirtualBox for bioinformatic work

Linux is a popular family of operating systems (OS) used in bioinformatics. Amongst its numerous distributions, Xubuntu is a lightweight derivative of ubuntu Linux, and aims to run on a machine with low system requirements. As a Windows user, I often need to switch to a Linux environment for program development and test. To this end, VirtualBox offers an easy-to-use but low-in-resources alternative to a dedicated physical machine or disc space (dual OS). This post records my key steps for setting up Xubuntu in VirtualBox for basic bioinformatic work.

Script gbk2tbl.py now supports Python 3

The Python script gbk2tbl.py in my GitHub repository BINF_toolkit is a popular tool for preparing input files of NCBI Sequin from GenBank files. Nonetheless, this script has only supported Python 2 since its first release in 2015, causing inconvenience to some users. Today, I got some time to make the script compatible to Python 3 with the tool 2to3 and some manual adjustments. The new script has been tested under Python 3.5.2 and pushed to my GitHub.

Popular reference databases of antimicrobial resistance genes

Reliable and up-to-date databases play a pivotal role in reference-based detection of antimicrobial resistance genes (ARGs) in bacteria. Nonetheless, these databases differ in their content and quality, making it challenging to decide an appropriate reference database for a particular research project. In order to address this challenge, this post offers a review of several ARG databases that are publicly available and widely used in bacterial genomics. Particularly, I focus on databases that are still undergoing regular maintenance, and I do not discuss any program released with these databases for sequence search or statistical analysis.