Notes about ClonalFrameML
Here are my notes of the article about ClonalFrameML, a program that detects recombined regions in a multi-sequence alignment, infers phylogenetic relationships when correcting for recombination, reconstructs ancestral state, and imputes SNPs under a maximum-likelihood (ML) framework.
Reference: Didelot, X., & Wilson, D. J. (2015). ClonalFrameML: Efficient Inference of Recombination in Whole Bacterial Genomes. PLoS Computational Biology, 11(2), e1004041. https://doi.org/10.1371/journal.pcbi.1004041.
-
Two sources of recombination related to the population under study
- Internal (intra-population recombination): does not introduce new polymorphism, but results in homoplasy (which is not caused by de novo mutations but homologous recombination) and genetic incompatibility.
- External (inter-population recombination): introduces new polymorphism, which is particularly prominent when genomes under study are all from a single lineage or even clone.
-
Quantities
- Rate of point mutation: θ/2 per site per coalescent unit of time t0
- Recombination rate: R/2 per coalescent unit of time t0
- Coalescent unit of time t0=Ne∗g (that is, effective population size times the duration of a generation)
- Recombination-to-mutation rate: R/θ
-
Assumptions
- Length of recombination region follows an exponential distribution whose probability density function (PDF) is λe−λx. Let the mean δ=1/λ and per site substitution probability v.
- Constant parameters R/θ, δ, and v of all branches.
-
Overall steps of the ClonalFrameML algorithm
-
An initial ML tree taken as input.
-
Ancestral sequence reconstruction for internal nodes and base-call imputation for input sequences. The next three steps can be skipped if the option
-imputation_only
is turned on. -
Estimating recombination and tree parameters using an ML approach
-
Importation inference for each site using an ML approach
-
Estimating uncertainty of parameter estimates using a parametric bootstrap method.
-