Speaker
Description
Bayesian phylodynamic analysis of genomic datasets has been key for elucidating the evolutionary and transmission dynamics of pathogens. However, topological convergence of these analyses on large viral datasets has not been comprehensively assessed. By carefully re-running and analyzing 15 classic large phylodynamic analyses we show that that: 1) convergence and mixing issues are widespread in tree topology sampling, which makes phylodynamic inference more computationally challenging; 2) despite apparent failure in topological convergence, for most datasets only a small fraction of the clades in the tree and a small fraction of the sites in the genome alignment exhibit poor mixing, suggesting that the inefficient exploration of tree space may frequently stem from a small number of viral sequences that possibly have undergone recombination or convergent evolution; 3) conflicting information in the substitution and branching processes may further exacerbate the difficulty in sampling viral time trees, reflected by the negative correlation between phylogenetic and phylodynamic likelihoods observed in most datasets, and; 4) the inferred tree-wise molecular and demographic processes appear to be minimally affected by poor exploration of tree space, whereas impacts on the estimated origin time and introduction history of particular clades are more pronounced. We identify specific biological properties of the viral datasets that may impede tree exploration and suggest new sampling mechanisms targeting these properties to improve the computational performance of Bayesian phylodynamic inference. Outputs from our long analyses (over one trillion generations across all datasets) will facilitate phylodynamic inference of large viral datasets. It may stimulate the design of new tree proposals, serve as a comprehensive training dataset for machine learning based phylogenetic methods, and provide valuable benchmarks for assessing the performance of phylogenetic inference mechanisms.