Jun 19 – 22, 2024
Squamish, BC, Canada
Canada/Pacific timezone
This conference is now SOLD OUT for in-person registration. Virtual registration is still available.

CAUSES AND IMPACTS OF THE WIDESPREAD LACK OF TOPOLOGICAL CONVERGENCE IN BAYESIAN PHYLODYNAMIC INFERENCE ON LARGE VIRAL DATASETS

Not scheduled
20m
Squamish, BC, Canada

Squamish, BC, Canada

Oral Phylodynamics & phylogeography

Speaker

Jiansi Gao (Fred Hutchinson Cancer Center)

Description

Bayesian phylodynamic analysis of genomic datasets has been key for elucidating the evolutionary and transmission dynamics of pathogens. However, topological convergence of these analyses on large viral datasets has not been comprehensively assessed. By carefully re-running and analyzing 15 classic large phylodynamic analyses we show that that: 1) convergence and mixing issues are widespread in tree topology sampling, which makes phylodynamic inference more computationally challenging; 2) despite apparent failure in topological convergence, for most datasets only a small fraction of the clades in the tree and a small fraction of the sites in the genome alignment exhibit poor mixing, suggesting that the inefficient exploration of tree space may frequently stem from a small number of viral sequences that possibly have undergone recombination or convergent evolution; 3) conflicting information in the substitution and branching processes may further exacerbate the difficulty in sampling viral time trees, reflected by the negative correlation between phylogenetic and phylodynamic likelihoods observed in most datasets, and; 4) the inferred tree-wise molecular and demographic processes appear to be minimally affected by poor exploration of tree space, whereas impacts on the estimated origin time and introduction history of particular clades are more pronounced. We identify specific biological properties of the viral datasets that may impede tree exploration and suggest new sampling mechanisms targeting these properties to improve the computational performance of Bayesian phylodynamic inference. Outputs from our long analyses (over one trillion generations across all datasets) will facilitate phylodynamic inference of large viral datasets. It may stimulate the design of new tree proposals, serve as a comprehensive training dataset for machine learning based phylogenetic methods, and provide valuable benchmarks for assessing the performance of phylogenetic inference mechanisms.

Primary authors

Jiansi Gao (Fred Hutchinson Cancer Center) Andrew Magee Luiz Max Carvalho (School of Applied Mathematics, Getulio Vargas Foundation) Marius Brusselmans (Rega Institute, KU Leuven) Marc Suchard (UC Los Angeles) Guy Baele Erick Matsen (Fred Hutchinson Cancer Center)

Presentation materials

There are no materials yet.