Speaker
Description
A near-ubiquitous approach to phylogenetic reconstruction has been to model variation in nucleotide substitution rates amongst genomic sites using a discretised Gamma distribution. We have recently demonstrated that this model introduces a bias in reconstructed branch lengths, such that their magnitude is largely driven by the number of sequences in the dataset. The alternative “FreeRate” model, implemented in state-of-the-art maximum likelihood phylogenetic packages, is not subject to the problem. What has previously been unknown is the full extent of the influence of it on time tree inference. It is intuitively unclear whether bias in branch length estimation would affect the dating of internal nodes, molecular clock rate estimates, or both.
Having newly implemented FreeRate in the BEAST package for Bayesian time tree inference, we explore the effect of the branch length expansion phenomenon on the outputs for a wide range of viruses, including HIV, SARS-CoV-2, hepatitis B virus, hepatitis C virus, influenza A virus, and measles virus. We show that its effect is largely confined to the molecular clock rate. Analyses that differ only in the choice of rate heterogeneity model generally do not significantly disagree in their estimates of lineage divergence times. This is a encouraging finding for the robustness of past phylogenetic dating studies, but it does mean that viral molecular clock rates have been consistently mis-estimated for many years. Dating that has relied upon strong priors on clock rates derived from a separate analysis done with the gamma model may have been unreliable.