Speaker
Description
Genomic data is frequently used to understand the transmission and evolutionary dynamics of viral populations. One popular approach is to use phylogenetic trees to reconstruct the past dynamics of viral populations (i.e. ‘phyodynamics’). The availability of tens of millions of SARS-CoV-2 genomic sequences pose unique challenges to this approach because of the high computational cost associated with phylogenetic tree construction with a large number of sequences. Here we present a less-computationally expensive tree-free approach based on graph theory. Under this approach, we analyzed the pairwise distance matrices derived from sets of genomic sequences from each SARS-CoV-2 lineage, and show that the eigenspectra of the graph Laplacian of these matrices are associated with the rates of recent population change of these lineages. This means that a computationally inexpensive metric, i.e. the eigenspectrum derived from a set of sequences from a lineage, can be used to estimate a key epidemiological parameter, i.e. the rate of population expansion/decline, for this lineage. We further used this approach to demonstrate its ability for identifying newly-emerged and rapidly expanding lineages in the UK and the US. We applied this approach to the several waves of emergence of SARS-CoV-2 variants of concerns, and estimated the rate of spread of each newly emerged lineage using early available sequences retrorespectively. The model is able to identify the fast-growing and future-dominating lineages when their lineage frequencies were still low (e.g. <5%). Overall, this approach offers a computationally inexpensive approach that can be used for estimating key epidemiological parameters as well as for SARS-CoV-2 lineage monitoring and risk assessment.