Speaker
Description
Background: Understanding the determinants of SARS-CoV-2 transmission is pivotal to inform control efforts. As transmission events aren’t observed, this remains challenging. Pathogen sequences can provide insights into the proximity of infections within a transmission chain. Phylogeographic approaches have helped characterize epidemic spread. However, they can produce biased results when sampling is uneven, don’t scale past a few hundred or thousand sequences. We thus critically need new tools to characterize fine-scale transmission dynamics from large genome datasets.
Methods: We analyze 116,788 SARS-CoV-2 sequences from Washington state (WA) genomic sentinel surveillance sampled from patients with known age and geographical unit of home location between March 2021 and December 2022 to unravel the spatial and social determinants of spread in WA. We develop a modelling framework describing the relative risk of pairs of identical sequences of falling in specific subgroups of the population (age groups, county of residence).
Results: We find that identical sequences have a higher risk of being observed in the same county than expected from the spatial distribution of sequences. This signal decays as genetic distance increases. We find a strong signal for local spread with identical sequences more likely to be observed in geographically close counties. We find that cellphone mobility predicts the location of pairs of identical sequences (70% of variance explained). Outliers in the relationship between mobility and genetic data can be explained by large clusters of identical sequences observed in zipcodes with penitentiary facilities. We find that the risk of observing identical sequences in two age groups is explained by that of contacts between these groups (89% of variance explained).
Implications: We develop a novel tree-free framework to characterize pathogen spread from large-scale sequence datasets. This provides valuable new tools to characterize epidemic dynamics from sequencing data.