
The 33rd International Dynamics & Evolution of Human Viruses conference will be held May 19th-22nd, 2026 at the University of British Columbia Okanagan Campus, in British Columbia, Canada. This will be a hybrid conference, which will include a live in person meeting and a virtual option. Scientific sessions will be May 20th-22nd, 2026. The conference gala dinner will be held on the evening of May 20th at Grey Monk Winery's "The Lookout Restaurant", which overlooks Okanagan Lake.
This meeting series was designed to promote discussion between specialists in quantitative and computational approaches in two areas in the field of virology where these are particularly important:
Many of these approaches were originally developed for HIV but are now applied to many viruses where extensive data are available. We encourage the submission of abstracts relating to work on HIV, HCV, mpox, Influenza, SARS-CoV-2 and any other human viruses. We consider topics on statistical, mathematical, computational, and integrative approaches to analyzing the dynamics and evolution of human viruses within the scope of this meeting.
We actively encourage participation of researchers from around the globe, including junior scientists and members of minority groups. A limited number of travel grants may be available for attendees from underserved populations - see Registration for details.

The COVID-19 pandemic witnessed a massive surge in data made available for genomic surveillance. However, variations in data sampling can generate biases when making epidemiological inferences based on this collected data. Here, we show how data collected from the various nation-wide sampling strategies conducted within the United Kingdom result in differing estimates of early prevalence and relative early growth rate advantages for different SARS-CoV-2 variants. We used data from three UK surveillance strategies: The Office for National Statistics Coronavirus Infection survey (ONS-CIS), the COVID-19 Genomics Consortium (COG-UK) and the Real-Time Assessment of Community Transmission (REACT-1) study. These strategies differed greatly in their sampling: the ONS-CIS was a community representative study with samples collected regularly throughout the pandemic; REACT was also a community representative study but with samples collected in distinct rounds; and COG-UK generated sequences from samples collected from hospitals (Pillar 1) and the community (Pillar 2). We observed higher early prevalence and earlier signal of rapid growth of Alpha when using the ONS-CIS compared to COG Pillars 1 and 2, whilst Delta was detected earlier and with a higher growth rate advantage in COG Pillar 2. We observed slightly higher proportions for all Omicron variants in COG Pillar 2 while all other strategies performed almost similar for Omicron. The estimates obtained from REACT aligned well with the other surveillance strategies later in the wave. For COG Pillar 1, we observed a lag for Alpha and Delta but no lag for Omicron. Thus, these differences in early detection for different lineages observed from these studies are fundamentally due to differences in the populations sampled, where a variant first emerged and the severity of infection, which makes each study possess different powers to detect early growth under different scenarios. These insights can help guide the design of surveillance strategies for future pathogens, including Pathogen X.
G1P[8] rotavirus strains circulated continuously in Malawi for over two decades (1998–2019), persisting despite Rotarix vaccine introduction in 2012. Whole-genome analysis revealed pre-vaccine dominance of two Wa-like lineages, L1 (1998–2004) and L2 (2005–2012), followed by a transient DS-1-like lineage, L3 (2013–2014). To investigate post-vaccine persistence and clinical impact of the G1P[8] strains, we analysed 78 G1P[8] strains collected between 2015 and 2019. We identified two novel Wa-like lineages L4 and L5 an extension from our previously described three lineages, with L4 rapidly predominating until a bottleneck in 2019. L4 strains carried specific lineage-defining VP7 and VP4 substitutions, including VP7 N147D within a known antigenic epitope, and were associated with higher Vesikari severity scores compared with DS-1-like L3 strains. Phylogenies indicated that L4 strains were closely related to contemporary G1P[8] strains from Mozambique, India, and South Africa, consistent with external importation followed by local diversification into Malawian subclades. These findings reveal a dynamic post-vaccine evolutionary trajectory of G1P[8], marked by reversion to Wa-like backbones, intra-genotype reassortment, and emergence of antigenic variants. Continued genomic surveillance will be critical to monitor viral adaptation, assess vaccine performance, and inform development of next-generation vaccines for high-burden settings.
Poliomyelitis remains a major global health challenge despite decades of vaccination-driven progress. While wild polioviruses (WPVs) have been nearly eradicated, outbreaks caused by vaccine-derived polioviruses (VDPVs) now dominate global polio incidence. VDPVs arise when the live-attenuated oral poliovirus vaccine (OPV) genetically reverts through substitutions at vaccine attenuation sites, generating virulent strains phenotypically indistinguishable from WPVs. Enteroviruses are known to recombine, and prior research indicates that recombination with co-circulating enterovirus C (EV-C) strains may play an important role in the evolution of OPV towards virulence. However, previous analyses of VDPV recombination have largely relied on limited genomic datasets, often focused on individual outbreaks, and have not explicitly linked recombination to reversion at vaccine attenuation sites.
Here, we systematically characterize VDPV recombination dynamics using all publicly available EV-C whole-genome sequences, providing the most comprehensive assessment to date of recombination processes shaping VDPV emergence and evolution. First, we investigate EV-C genetic diversity using Nextstrain to reconstruct phylogenetic relationships across full genomes and individual genomic regions. Secondly, we identify and characterize recombinant VDPV sequences using a novel hidden Markov model-based method. Our analysis reveals characteristic VDPV recombination patterns, including common breakpoint hotspots and recombination partners across all three poliovirus serotypes. Finally, to assess the impact of recombination on the reversion of vaccine strains to virulence, we examine key substitutions at attenuation sites and evaluate how frequently these substitutions are associated with recombination compared to point mutations.
Unlike the historical Sabin vaccines, which rely on only a few attenuation mutations, the novel oral polio vaccines feature numerous attenuation changes to prevent reversion by single-point mutations. However, these sites remain vulnerable to simultaneous reversion via recombination, underscoring the need for comprehensive genomic surveillance and recombination analysis. Studies like this are therefore essential for advancing our understanding of recombination as an evolutionary mechanism shaping phenotypic adaptation.
Human Respiratory Syncytial Virus (HRSV) is a leading cause of respiratory infections in young children. HRSV cases declined dramatically in the initial stages of the COVID-19 pandemic, followed by a resurgence in 2022/2023. To understand the patterns of viral spread, we characterized the genetic diversity and evolutionary dynamics of HRSV sequences at national and global levels, including newly sequenced samples from pediatric patients in a major Florida medical center. A total of 48 pediatric symptomatic patients (0-18 years) with HRSV positive RT-qPCR were included and 32 samples were sequenced. We observed increased numbers of pediatric HRSV cases at a major medical center in Florida in the 2022/2023 “post-COVID-19” period. Our strains sequences clustered within multiple clades, with no correlation between specific clades and clinical or epidemiologic characteristics. Phylodynamic analysis demonstrated that, post-COVID-19 pandemic, both HRSV-A and HRSV-B, following multiple introductions from European countries, underwent a change in the frequency of bottleneck events. We also detected an increased viral diversity in known G-protein epitopes. The results suggest that in the COVID-19 pandemic aftermath the virus has undergone genetic changes that may potentially enhance the ability to evade immune responses and impact transmission.
Naming matters, especially when viruses evolve faster than our labels. Non-polio enteroviruses (NP-EVs) cause diseases ranging from mild infections to severe neurological illnesses in children. Yet their clade nomenclature often reflects chance sampling and outdated circulation patterns rather than current evolutionary understanding. Subjective naming systems complicate communication between researchers, clinicians, and public health authorities, especially when clades are associated with increased virulence or recombination.
To address the issue of inconsistent clade assignments, we developed an algorithmic nomenclature framework for NP-EVs. This framework replaces manual, non-scalable clade designation with reproducible, parameter-based phylogenetic criteria. The framework is designed to be transferable across enterovirus species, providing consistent clade definitions while remaining flexible to virus-specific evolutionary dynamics.
To ensure that this work goes beyond a purely conceptual nomenclature proposal, we embedded the framework directly into Nextclade, a practical, real-time viral surveillance tool. Originally developed for SARS-CoV-2, Nextclade offers quality control, phylogenetic placement, and clade assignment, all in one tool. We created the first Nextclade datasets for NP-EVs, focusing on four epidemiologically important viruses: Enterovirus D68, Enterovirus A71, Coxsackievirus A16, and Coxsackievirus A10. The new algorithmic nomenclature is provided alongside existing naming systems to enable direct comparison and gradual community adoption.
Building these datasets also revealed a lesser-discussed challenge. Many NP-EV reference sequences date back to the 1950s and they are poorly suited for modern phylogenetic inference, alignment, and mutation calling. Thus, we explored using inferred ancestral sequences as functional references to improve consistency and interpretability across analyses.
Our work emphasizes the importance of reproducible nomenclature, integrated analytical tools, and updated reference strategies. These elements are essential for conducting scalable evolutionary analyses and translating genomic data into real-time NP-EV surveillance.
HIV-1 persistence during antiretroviral therapy (ART) remains the main barrier to a cure. Eliminating these reservoirs is extremely challenging, in part because HIV-1 exhibits extraordinary genetic variability, generating heterogeneous populations capable of drug resistance and immune evasion. Although ART dramatically reduces viral population size, reservoirs retain sufficient genetic diversity to enable rapid rebound after therapy interruption, even after years of effective viral suppression. This residual diversity supports viral fitness recovery following population contraction, facilitating successful rebound. However, the genetic and evolutionary features permitting long-term persistence and rebound remain poorly defined.
HIV-1 populations are best described as viral quasispecies – a swarm of closely related variants or haplotypes. Long-read deep-sequencing of host-integrated proviral DNA and viral RNA allows quasispecies reconstruction by capturing longer stretches of viral genomes and preserving per-haplotype information inaccessible with short-read sequencing. However, long-read sequencing is prone to higher error rates, complicating haplotype reconstruction by confounding technical errors with true biological variation. Existing computational tools are optimized for short-read sequencing, assume short-read-specific error profiles, are computationally expensive, and/or underutilize the haplotype resolution provided by long reads.
To address these limitations, we developed OPOSSUM (Optimized Polishing of Observed reads for Strain Separation with Unsupervised Methods), a method for viral quasispecies reconstruction from long-read sequencing. OPOSSUM integrates robust quality control, dimensionality reduction, and unsupervised clustering to generate refined haplotype sequences while maximizing long-read resolution. We validated OPOSSUM using long-read sequencing of defined mixtures of HIV-1 plasmids and single-genome HIV-1 from clinical samples. OPOSSUM accurately reconstructed haplotypes, correctly estimated their abundance, and identified true mutations from noise. We then applied OPOSSUM to proviral DNA from SIV-infected macaques, enabling reconstruction of intra-host viral dynamics during ART and post-ART rebound. OPOSSUM provides a framework to infer reservoir dynamics and guide targeted cure strategies toward the cell populations and tissue reservoirs responsible for rebound.
The increasing frequency of infectious disease outbreaks, driven by changing Earth systems and globalization, underscores the need for reliable short-term forecasting methods to inform public health decision-making. However, most short-term case count forecasting approaches depend on extensive historical data for model training and are therefore poorly suited for emerging pathogens, early epidemic phases, or situations where episodic selection drives the emergence of new genetic variants, resulting in unexpected case surges.
To address this limitation, we developed a novel forecasting framework that leverages synthetic data from fast-evolving pathogens. Using the phylodynamic simulator MutAntiGen, we generated a comprehensive synthetic library of coupled viral evolution and epidemiological dynamics by systematically varying evolutionary, epidemiological, and immunological parameters. These simulations produce phylodynamic time series of overall and variant-specific cases across outbreak scenarios.
We used these synthetic time series, together with historical non-COVID-19 respiratory surveillance data, to pre-train a transformer-based forecasting model. We then showed that integrating synthetic phylodynamic time series with observed respiratory data substantially improved short-term forecast accuracy of U.S. state-level COVID-19 cases relative to models trained on observed data alone, and outperformed real-time COVID-19 forecasting efforts. We further showed that total case count forecasts were more accurate when composed of individual forecasts for variant-specific cases. Improvements were greatest near epidemic peaks and during epidemic decline.
By using synthetic outbreaks for model training, our framework enables forecasting for fast-evolving pathogens while mitigating data limitations, offering value for low-data and early epidemic settings. The framework is also readily extensible to incorporating genomic leading indicators of epidemic growth such as nucleotide-based population genetic summaries and pathogen-specific functional features, which has become relevant with the growing availability of pathogen genomic data in near real time.
Direct-from-sample sequencing is an essential tool in modern clinical and public health microbiology, particularly for sequencing viruses or unculturable pathogens, and for diagnostic metagenomics. Targeted metagenomics uses sets of oligonucleotide probes (up to 100,000s) to selectively sequence targeted loci or genomes from hundreds of pathogen species simultaneously, removing the need to identify or isolate the pathogen first. Targeted metagenomics is undergoing a rapid uptake globally, especially for applications such as syndromic surveillance for respiratory or vector-borne infections. However, uptake is currently limited by a lack of efficient tools for assay design.
In this work we present DAMPA (Diversity Aware Metagenomic Panel Assignment), a user-friendly probeset design pipeline that uses evolutionary relationships between targeted loci, represented as a pangenome graph, to design an optimal probeset. Conserved regions require only a few graph nodes, while rapidly evolving regions are represented by complex graph structures with more nodes. In this way, DAMPA ensures that the diversity of all regions is represented in the final probe set while minimising probe redundancy, and maintains probe performance as evolutionary distance increases.
We evaluated the performance of DAMPA on a range of datasets including viruses with diverse genome structure and size, including Dengue, HIV, Enterovirus, Epstein Barr virus (EBV), Japanese Encephalitis Virus (JEV) and a bacterium (Bordetella pertussis). DAMPA generated probe sets with complete coverage for all targeted organisms. DAMPA ran on datasets containing large genomes (B. pertussis, 4Mb) or large sample numbers (Dengue, 16,000) and was 6x faster than existing tools, generating probesets in under 30 seconds on a standard laptop.
Experimental validation in clinical samples positive for Enterovirus, which is notoriously prone to coverage dropouts due to high genotype diversity, showed that DAMPA-designed probes were highly sensitive, quantitative and generated more even read coverage across genomes than the commercially available Twist Comprehensive Viral Research panel.
Reconstructing the phylogenetic trees of pathogens responsible for disease outbreaks has become a standard tool among public health agencies. But particularly when sampling is non-constant and non-uniform, it can be challenging to link sequences and corresponding phylogenetic trees to epidemiological quantities of interest such as rates of infection, reproduction numbers, or serial intervals. We present ECHO (Estimating Cryptic Hosts from Outbreak trees), a simple model that relates the total branch length of an observed timed phylogenetic tree to the total number of cases related to that tree. Using commonly known or estimated information about the epidemiology of the disease, ECHO uses the total phylogenetic time to infer the number of infections that were not sampled. This could allow public health practitioners to estimate the number of unobserved cases in an ongoing outbreak, using a timed phylogeny together with information about the duration of the latent and infectious periods and the effective reproduction number. ECHO is designed to be robust to the sampling fraction, which is often unknown and can be difficult to estimate, as well as to sampling regimes that vary through time or by clade. As a demonstration, we apply ECHO to phylogenetic trees constructed from a measles outbreak that occurred in the United States in 2021 during Operation Allies Welcome. We find that ECHO is able to accurately identify the number of missing cases and is able to combine information from multiple lineages within one outbreak into a single analysis.
Genomic language models (gLMs) have emerged as powerful tools for learning numerical representations of DNA sequences. Most existing models, however, are not trained on viral genomes, or limited viral references and lack systematic evaluation frameworks tailored to virology. Here, we introduce vir2vec, a 422-million-parameter decoder-only genomic language model obtained through continual pretraining of Mistral-DNA on a curated pan-viral corpus comprising 565,747 complete genomes from 295 viral species. vir2vec operates on byte-pair–encoded DNA subwords and produces fixed-length, genome-level embeddings via pooling over contextualized token representations, enabling reuse across heterogeneous downstream tasks without task-specific fine-tuning. We evaluate vir2vec on a prediction task benchmark we created, Viral Genomic Understanding Evaluation (vGUE), spanning multiple levels of viral organization: organism-level discrimination (virus vs non-virus genomes and reads), genome-wide evolutionary signatures (DNA vs RNA viruses and host-range prediction), intra-genus species separation (HIV-1 vs HIV-2), fine-grained variant and subtype typing (SARS-CoV-2 lineages), and phenotypic context signal detection (HIV-1 brain vs plasma tropism). vir2vec achieves outperforms competing approaches, i.e., a genomic foundation model and a viral-specific embedded based on ModernBERT. Performance is particularly strong in genome-wide and evolutionary tasks, with balanced accuracies of 0.96 for DNA vs RNA virus discrimination; 0.84 for host prediction; 1.00 for HIV-1 vs HIV-2 classification; and 0.99 for SARS-CoV-2 lineage identification. By coupling a domain-specialized genomic language model with a standardized viral benchmark, vir2vec and vGUE provide an open foundation for future viral genomic models, surveillance applications, and discovery pipelines.
It is 20 years since the perplexing observation that HIV evolves around five times faster when measured within, compared to between, hosts. Emergence of CTL- and antibody-escape mutations within individuals, followed by their reversion after transmission, has been proposed to explain the mismatch in evolutionary rates at nonsynonymous sites, but a compelling explanation for the mismatch at synonymous genomic positions has remained elusive. Using data from longitudinally sampled transmission pairs, we provide evidence that toggling of synonymous mutations during the course of infection can explain the mismatch at synonymous positions. Specifically, we observed slightly deleterious synonymous mutations hitchhiking with immune-escape mutations to high frequency, before linkage is lost due to back-mutation or recombination, after which the synonymous mutations decline in frequency. If sampling is sufficiently frequent, these toggles are captured and contribute to high within-host evolutionary rates, whereas the transmitted virus is more likely to consist of founder-like alleles. To support this conclusion, we nested within-host agent-based simulations into between-host transmission networks, which were parameterised using data from the transmission pairs. These simulations confirm that toggling of nonsynonymous mutations between hosts, combined with the hitchhiking of synonymous mutations with these nonsynonymous mutations within-host, is sufficient to explain observed mismatches in evolutionary rates.
Between 2010 and 2019, new HIV diagnoses in the United States declined 14.8% from 42,665 to 36,349 diagnoses. The federal Ending the HIV Epidemic (EHE) Initiative was developed in 2019 to accelerate the decline of annual HIV diagnoses in the United States by 90% by 2030. In 2020, the number of HIV diagnoses fell by 16.4%, coinciding with the first waves of the COVID-19 pandemic. However, HIV diagnoses in the United States in 2024 reflected a 6% increase since 2019.
We investigated whether this recent increase in HIV diagnoses was reflective of (i) delayed diagnosis of people who acquired HIV during or before 2020 but did not seek HIV testing or (ii) an increase in HIV transmission since 2020. We did this by analyzing public health surveillance data for 23,619 people diagnosed in New York City (NYC) between 2009-2024 with a reported HIV-1 subtype B sequence (prot/RT). We inferred a maximum likelihood tree in FastTree2 and inferred the age of internal nodes using TreeTime. To characterize the relationship between HIV transmission and diagnosis, we identified virus from pairs of individuals who were both diagnosed within a year of each other and shared a common ancestor within a year prior to the latter diagnosis date; this shared recent ancestor serves as a proxy for a recent transmission event followed by an HIV diagnosis. We then performed multivariate regression to determine whether the relationship between the inferred transmission events above annual diagnoses had changed since 2020; a change in this relationship post-2020 would indicate a shift in the number of delayed diagnoses.
Based on the projected linear trend of decreasing HIV diagnoses in NYC between 2010-2019, there was an excess of 1,631 (95% CI: 973–2289) subtype B diagnoses between 2020-2024, roughly equivalent to the annual number of new diagnoses in NYC in recent years. We were unable to detect a change in the relationship between annual diagnoses and transmission events post-2020, suggesting that the recent rise in diagnoses is indicative of an increase in HIV transmission and not a product of increased delayed diagnoses. This finding was consistent after controlling for race/ethnicity, transmission category, country of birth, and age at diagnosis and in down-sampled replicates (n=10; adjusting for the variation in HIV-1 sequence reporting completeness over the analytic period), and after exploring transmission in a 2-year window prior to the diagnoses. We validated our approach for identifying recent transmission events, demonstrating that people diagnosed following these events were more likely to be diagnosed with an acute HIV infection (Odds Ratio: 1.48; p<0.001).
This analysis suggests the post-2020 increase in HIV diagnoses may reflect an increase in new HIV infections and not delayed diagnosis of previously acquired infections, a situation that requires continued surveillance and investigation to ensure NYC’s progress toward EHE goals.
Flavivirus infection causes >100,000 annual cases of encephalitis and meningitis worldwide, with case fatality rates of 20-50%. Encephalitic flaviviruses include West Nile virus, which caused widespread outbreaks in Europe and the US in 2025, and Japanese Encephalitis virus, which entered temperate Australia in 2022 and became endemic within a single season. These highly pathogenic flaviviruses infect native mosquito species and their animal hosts, utilising pre-existing transmission pathways for rapid spread. Such transmission pathways may be exposed by studying the evolution and vector associations of related endemic viruses.
Edge Hill virus (EHV) is a less-pathogenic flavivirus of the Yellow Fever Group, endemic to Australia and frequently detected in the most populous state, New South Wales (NSW). Genomic data on EHV is scarce, with only 2 complete and 16 partial genomes published, and none since 2010. To understand flavivirus transmission pathways, we investigated the spatiotemporal genetic diversity of EHV and its associated mosquito vectors. Over 120,000 mosquitoes collected in 2021-2024 by the NSW Arbovirus Surveillance Program were pooled by date and location and sequenced using COSMOS, a targeted metagenomics method that combines virus sequencing with vector species identification. EHV was detected in 36 pools, of which 32 produced complete genomes. Phylogenetic reconstruction based on whole genomes and the NS5 region, which enabled inclusion of historic sequences, revealed two contemporary EHV genetic lineages, with evidence of lineage replacement since 2010. Seven traps contained both lineages, indicating ongoing co-circulation. EHV was significantly associated with coastal Aedes vigilax mosquitoes (OR=7.8), However, inland detections in traps without Ae. vigilax strongly suggests additional involvement of Culex annulirostris, the Australian vector of encephalitic flaviviruses. EHV genomes collected west of the Great Dividing Range were non-monophyletic and interspersed with genomes from coastal locations, indicating rapid dispersion across >1200 km during breeding seasons, and exposing existing routes of flavivirus transmission.
Spillover of emerging pathogens increasingly challenges human health and societal stability. One of the deadliest endemic diseases in West Africa, Lassa mammarenavirus (LASV), shows increasing potential for international outbreaks. Despite this, LASV remains undersampled and lacks a standardized approach to classifying the current sampled diversity. To address this gap, we assembled and analyzed the most complete whole-genome phylogeny of LASV. We collected 2,811 publicly available sequences under the organism name “Lassa mammarenavirus” from the National Center for Biotechnology Information (NCBI), GenBank, and curated full genome and segment-specific alignments. Corresponding phylogenetic trees were inferred using a maximum likelihood model framework. In agreement with previous studies, we find that significantly greater diversity exists among LASV lineages compared to within them. These observations highlight the need for a lineage-specific reference panel. To select representative reference sequences, we use closeness centrality, a network-based centrality metric that quantifies each node's closeness to all others based on the shortest-path length to it. This statistic is ideal for identifying phylogenetically central sequences, as it reflects relative proximity to all other sampled genomes in a given lineage and thus the sequence most representative of the lineage in tree space. We show that our lineage-specific reference panel accurately classifies LASV sequences in BLAST and short read mapping applications relative to the previously used Pinneo reference strain. Our open-source reference panel accurately characterizes the known diversity of LASV, providing a framework for molecular epidemiological analyses of future outbreaks. We further analyze our alignments to show, consistent with previous work, that LASV diversity is strongly partitioned by geography rather than by host type. Then, in a Bayesian framework, we analyze the origins and spread of LASV in West Africa. Taken together, our analyses both provide novel epidemiological tools and advance our understanding of LASV, an emerging pathogen of concern.
SHUTTLES outside hotel at 610 to go to GREYMONK WINERY
DINNER: 630pm-10pm
The mpox epidemic in 2022 resulted in more than 100,000 cases across 122 countries. Low levels of case detection continued in 2023, during which New York City (NYC) experienced the largest number of infections per capita among all jurisdictions in the USA. By September 2023, an average of only 2.8 cases were detected per week. In January 2024, the rate increased 4.7-fold to 13.2 cases per week, suggesting a shift in transmission dynamics.
An examination of resurgence cases in NYC revealed no association with vaccine-status. To understand the evolutionary dynamics of the resurgence, we performed Bayesian phylogeographic analysis in BEASTX on 9,659 MPXV genomes, including 878 from the NYC Health Department, sampled between 2022-2024. We find little evidence for prolonged persistence of MPXV in any single jurisdiction within the United States since 2022. Rather, MPXV is repeatedly reintroduced from elsewhere in the USA and around the world. We also identified a large cluster (F.1; n=244), including samples collected between October 2023–December 2024. This cluster originated in NYC (P=0.9957) in September 2023 and became the dominant genotype identified across the USA in 2024, reaching a peak estimated monthly proportion of 72.4% of all cases in the country by August 2024. We performed Bayesian phylogeographic inference on F.1 and Episodic Birth-Death-Sampling (EBDS) on NYC F.1 genomes. This cluster experienced limited exportation from NYC before March 2024, when Re in NYC was >1; after March 2024, it resulted in 13 exportations to jurisdictions and 4 countries, seeding subsequent outbreaks.
Although the late-2023 resurgence of MPXV in NYC in the United States can be predominantly attributed to a single cluster, the continued persistence of this virus since 2022 is the result of repeated introduction and local extinction events. Understanding drivers of sustained transmission can help us understand MPXV persistence and illuminate strategies to control further spread.
Quantifying how fast pathogens spread across space is key to understanding epidemic dynamics and informing control strategies. Traditional approaches often rely on full phylogenetic reconstruction or spatially explicit models, which can be computationally demanding and sensitive to sampling biases between locations.
Here, we present a method to estimate the rate of geographical spread of pathogens between locations directly from pairwise genetic distance distributions, without requiring an explicit tree. The approach uses a continuous-time Markov chain model to link genetic divergence to spatial separation, enabling inference of the rate of spread from large genomic datasets. By leveraging summary statistics of pairwise distances, the method remains robust to geographically biased sampling and scales efficiently to thousands of sequences. It also provides a natural way to assess model fit.
Through a large simulation study, we demonstrate that the method accurately recovers known rates of pathogen spread across diverse epidemiological dynamics and sampling scenarios. Applying the method to >300,000 SARS-CoV-2 sequences sampled across Europe in 2020 shows its ability to capture spatial structure and temporal variation in the rate of pathogen spread, which are associated with air travel data.
This framework provides an exciting avenue to estimate rates of spread between locations in a robust and computationally efficient way, and explore drivers of spread at the population level.
West Nile Virus (WNV) is a mosquito-borne virus which primarily infects birds. Infections in humans (and horses) are unusual but serious, and can cause encephalitis and death. In Europe, WNV was first detected in Southern Portugal in the 1960s, with limited circulation in the most southern areas of the continent. Since the 1990s, circulation has increased, pushing further north, with human cases reported in the Netherlands, France and Poland and virus presence in mosquito surveillance in the UK. There has also been a corresponding increase in genomic surveillance efforts, which we can now leverage to understand WNV spread across Europe.
Previous work has used genomic data to look at WNV lineage 2 spread across Europe, including that factors related to land use were associated with velocity of spread. In this study, we expand on this by including three additional years of sequencing data, partial genomes and other European WNV lineages in a continuous phylogeographic analysis. We compared sources and sinks of transmission in the genomic model to risk maps generated using ecological niche modelling to integrate a novel data source into phylodynamic models. Finally, using the combination of ecological niche modelling and phylodynamic analysis, we generated a small predictive model to explore where we expect to most at risk in the coming years.
The emergence of SARS-CoV-2 has renewed interest in how coronaviruses evolve and transition to endemic circulation. The four seasonal human coronaviruses (hCoVs) 229E, NL63, OC43 and HKU1, differ in terms of their estimated times of emergence, receptor usage patterns and genome content. While they are frequently grouped clinically, these differences beg the question as to whether the evolution and epidemiology of these viruses is similarly distinct. Using a publicly available background dataset, as well as 1033 genomes sequenced from 2012 to 2022 in Slovenia, we sought to compare the rates and patterns of substitution and recombination across the hCoVs. We describe the distribution of recombination breakpoints across hCoVs, and the variation in persistence of recombinant lineages across the viruses. While some breakpoints are shared across hCoVs, others are specific to a given virus. We also characterise the extent of selection present within the spike gene, with increased substitutions in the receptor-binding domain typical of antigenic drift observed for 229E and OC43, but not NL63. Our results illustrate the complexity and variety among the four seasonal coronaviruses, with implications for our understanding of coronavirus evolution and the potential routes to endemicity.
Most pathogen phylodynamics analyses are done in the context of unclear sampling denominators, low sampling density, and limited participant meta-data. In contrast, the Rakai Health Sciences Program has embedded deep-sequencing of all HIV-viremic individuals into population-based surveillance, enabling linkage of genomic data with a large range of sociodemographic and behavioural covariates.
Our inferential targets are fractions of transmission attributable to specific subpopulations, and relative transmission rates (%sources/%infected in subpopulation). Alternative to birth-death or structured coalescent approaches, we constructed a panel of phylogenetically likely transmission pairs. We interpreted these pairs as realisations of a multi-type point process on a compact age-age domain, with the type encoding an unknown latent state: either truly unlinked, linked from i to j, or linked from j to i. We used Bayesian post-stratification to allocate age-age specific transmission probabilities by many covariates (age, gender, lifetime partnership history, primary occupation, community setting and sexual behaviour), providing a scalable approach for high-dimensional inference.
From 4,260 HIV-positive, successfully deep-sequenced participants, we compiled a list of 625 potential transmission pairs. Of these, the multi-type process model estimated 495 actual transmission events (posterior median). Transmission rates varied substantially within age groups by partnership status (never married, married, separated), typically ≽2-fold. Across inland, fishing, and trading communities key transmission flows (≽10%) originated from married men, both to partners within and outside households. Key transmission flows also included never married women aged 15-29 in trading communities, and married and separated women aged 15-29 in fishing communities. Partnership-specific underreporting of sexual behaviour data posed challenges in quantifying flows by self-reported sexual behaviour. By occupation, flows tended to mirror underlying occupational population structure.
Fitting well-developed statistical models to transmission pair data sets enables easy investigation of population-level transmission flows and rates across many individual-level covariates, and, unlike other approaches, remains scalable to data sets comprising >100k genomes.
Cluster and source attribution methodologies in genomic epidemiology are primarily used to identify pairs or groups of infected individuals sharing close proximity in the chain of transmission. It is generally overlooked that these methodologies also identify some number of individuals who are not linked to anyone else, and that the number of these is informative. If one is to take a random sample of infected individuals and are able to identify who infected who amongst the sample, the number of singletons is positively correlated with the complete size of the infected population. This is similar to the principle underlying the mark-recapture methodology, but where that requires multiple sampling rounds, this needs only one round and the ability to link individuals by transmission.
Here, I show how to properly formalise this insight in order to estimate the complete size of an infected population and/or the number of distinct lineage introductions to that population by using the observed number of pairs or clusters and singletons. This is implemented in a Bayesian importance sampling framework. I outline three potential uses of the procedure, firstly in estimation of those quantities, secondly in determination of the sample size needed to obtain a given number of transmission pairs for a separate analysis, and thirdly to test whether assumptions of random sampling have been violated.
BACKGROUND:
Transgender (trans) people are frequently marginalized in HIV surveillance. Trans women are often grouped with gay and bisexual men (GBM), while trans men and non-binary people are commonly excluded. This can perpetuate inequity in prevention and care.
METHODS:
We analyzed clinical data from 15,299 individuals living with HIV in British Columbia (BC) Canada (1999-2022). Gender was available for 15,122 individuals (including 91 trans women, 13 trans men, non-binary not collected). HIV pol sequences available for 10,724 individuals (not available for ART experienced clients) were utilized to infer phylogenetic trees under a maximum likelihood model framework using IQ-TREE. Transmission clusters were defined for >5 individuals with a pairwise patristic distance <0.02 substitutions per site and >90% bootstrap support. Using Mann-Whitney tests, 1000 permutations and Benjamini-Hochberg correction, clusters were compared between trans-inclusive and not, regarding their size and rates of reported transmission risk (mainly GBM, heterosexual (HET) or people who inject drugs (PWID)). Trans people’s sexual exposures with men were reported as GBM or HET, suggesting semantic ambiguity and misgendering.
RESULTS:
Clustered cases include 4,053 cisgender, 28 transgender, and 53 people of unknown gender. Among 209 clusters, 23 were trans-inclusive. Trans people were distributed across clusters dominated by different risk exposure categories. After accounting for size distortions through permutations, trans-inclusive clusters showed no statistically significant differences in size or composition, except for a modest enrichment for GBM members.
CONCLUSIONS:
The small number of HIV+ trans people in BC are embedded in diverse transmission networks, highlighting limitations of collapsing trans women into GBM and the near invisibility of trans men and non-binary individuals in surveillance systems. Our findings support the need for trans-inclusivity in existing harm reduction services, trans-specific HIV prevention and care strategies (such as gender affirming PrEP systems), and surveillance practices accurately reflecting diverse gender identities and risk contexts.
Mpox denotes a viral zoonosis caused by the Orthopoxvirus monkeypox (MPXV), which is endemic in West and Central Africa. In spring 2022, notable outbreaks of MPXV clade IIb were recorded in several high-income countries, predominantly affecting men who have sex with men (MSM). At the peak of the outbreak, over 200 new Mpox cases per week were reported in Berlin, which constitutes one of the largest MSM communities in Europe. Within the same year, the outbreak significantly declined, and it is unclear which factors contributed to these dynamics.
To investigate the concomitant effects of sexual contact networks, transient contact reductions and the effect of infection- vs. vaccine-derived immunity on the 2022 Mpox outbreak, we calibrated an agent-based model with epidemic, vaccination, contact- and behavioural data.
Our results indicated that the vaccination campaign had a marginal effect on the epidemic decline. Rather, a combination of infection-induced immunity of high-contact individuals, as well as transient behaviour changes reduced the number of susceptible individuals below the epidemic threshold. This emphasizes that, in addition to vaccination, timely and clear communication of transmission routes may trigger spontaneous protective behaviour within key populations; highlighting the importance of targeted sexual health education as a component of outbreak response.
While there were no reported Mpox cases in Berlin during the first half of 2023, clade IIb cases began to re-surge and stabilize in 2024 and 2025 in addition to a surge of clade I cases in 2026. By expanding the model with data up to the end of 2025, as well as genomic surveillance data, we intend to quantify the impact of return to pre-pandemic contact behaviour ("pandemic deficit"), waning vaccine protection, and extensive cryptic circulation on the persistence of clade II.
All animal models of HIV-1 are either immunocompromised or rely on the primate virus, SIV. We recently employed tools from population genetics and viral evolution to engineer a host switch of HIV-1 into fully immunocompetent Nancy Ma’s owl monkeys (Aotus nancymaae). These animals can be infected with a minimally modified form of HIV-1, where infection recapitulates key hallmarks of infection in humans. Here, we evaluate if the owl monkey model can be used to evaluate prophylactic protection by performing a passive protection study in the model. Twelve (12) owl monkeys received a single intravenous of the broadly neutralizing antibody 3BNC117 or an isotype control antibody. Twenty-four hours later, all animals were challenged intravenously with HIV-1. Plasma viremia was monitored weekly. For each animal, we assessed the time to peak viremia, peak viral load during the acute phase, viral setpoint, time to seroconversion, and acute-phase depletion of CD4⁺ T cells in the blood and in gut-associated lymphoid tissue (GALT). Infected animals exhibited key hallmarks of HIV-1 infection and pathogenesis as observed in humans. Animals treated with 3BNC117 exhibited significantly delayed or diminished infection parameters compared to controls, with one animal never becoming infected. Further, 4 out of 6 animals that were treated with 3BNC117 before infection became long-term elite controllers, even after the antibody had gone to sub-clinical levels. We employed short-term treatment with the steroid dexamethasone to demonstrate the presence of a reactivatable viral reservoir in these elite controller animals. Finally, we report on the pharmacokinetics of 3BNC117 in owl monkeys, including its presence at mucosal surfaces of the vagina and rectum. These findings underscore the possible future utility of owl monkeys as a model for evaluating prophylactic interventions, including the preclinical optimization of HIV-1 vaccines.
Broadly neutralizing antibodies (bNAbs) are an emerging HIV therapeutic approach undermined by frequent intra-host viral escape. Predicting escape pathways for different bNAbs is a crucial step in combining them into effective multi-component therapies, but identifying escape mutations from in vitro screens may not yield results that are portable across the diverse genetic backgrounds found in different hosts. We investigated the predictability of realized escape pathways in vivo during two monotherapy trials of bNAbs 10-1074 and 3BNC117, which respectively target the V3 glycan and CD4 binding site. We performed longitudinal, long-read deep sequencing (6729 sequences from 20 participants; median 42 sequences per sampling timepoint) using a SMRT-UMI method designed to preserve genetic linkage, providing a dataset well-suited for discovering escape mutations in the context of their intra-host genetic backgrounds. We found that escape pathways from 10-1074 exhibited almost no background specific effects: different viruses both across and within participants escaped by repeatedly producing identical mutations to eliminate the N332 V3 glycan to which 10-1074 binds. Notably, the pervasiveness of these escape mutations further allowed intra-host HIV populations to select those escape mutations that best mitigated fitness costs. In contrast, 3BNC117 escape was highly background-dependent, with heterogeneity in putative escape loci across participants. Despite this background specificity, we observe some shared escape pathways at shorter genetic distances, such as recurrent mutations at the intra-host level and shared escape loci across two study participants who constitute a transmission pair. Together, these results indicate that the predictability of bNAb escape pathways is shaped by their target region and inform when we can – and cannot – predict escape phenotypes from genetic data alone.
Reinfections with respiratory viruses are thought to be driven by ongoing antigenic immune escape in the viral population. However, this does not explain why antigenic variation is frequently observed in respiratory viruses and not systemically replicating viruses. Here, we argue that the rapid rate of waning immunity in the respiratory tract is a key driver of antigenic evolution in respiratory viruses. Waning immunity results in hosts with immunity levels that protect against homologous reinfection but are insufficient to protect against infection with an antigenically different (heterologous) strain. Thus, when partially immune hosts are present at a high enough density, an immune escape variant can invade the viral population even though that variant cannot infect solidly immune hosts.
We then examine the consequences of waning immunity driven by antigenic changes for the extent of pathology at the population level. Our models incorporate how susceptibility, infectivity and pathology depend on the level of immunity of the individual. We use the model to explore how the total disease burden (number of severe cases) depends on the extent of transmission of the pathogen and the rate of waning of immunity. We explore scenarios where reducing transmission can increase disease burden. We then consider the implications for vaccination and specifically when targeted vaccination could be more effective than mass vaccination.
Molecular mimicry, where pathogen proteins structurally resemble host antigens, is a central hypothesis in autoimmune pathology. However, current detection methods are largely limited to sequence homology or domain-level alignment, failing to identify the subtle, discontinuous structural "patches" that often drive antibody cross-reactivity. To resolve this computational bottleneck, we present JumpMASTER, a heavily optimized evolution of the state-of-the-art MASTER structural motif-matching algorithm. JumpMASTER achieves more than two orders of magnitude acceleration over existing standards, enabling the detection of 20 angstrom radii structural mimics for hundreds of pathogen queries across the entire human proteome in seconds.
We integrated this high-speed structural scanning with AlphaFold3 to computationally model antibody cross- reactivity from pathogen epitopes to predicted human mimics. Applying this pipeline to major viral proteomes—including SARS-CoV-2, HIV, RSV, and Epstein-Barr Virus (EBV)—we uncovered a broad landscape of both previously characterized and novel host-pathogen structural homologies, some under various levels of evolutionary selection. As a prominent example, we identified high-confidence mimicry motifs within both surface-exposed and intracellular targets, such as vesicular trafficking proteins. We propose that these intracellular mimics become accessible to circulating antibodies via viral-induced cellular stress or ectopic surface expression, leading to more inflammation.
These findings demonstrate JumpMASTER’s utility in unmasking non-obvious, structurally conserved epitopes that evade sequence-based detection. By providing a scalable framework to map the 'structural mimicry interactome,' this work generates new hypotheses regarding the etiology of post-viral syndromes and highlights new classes of potential therapeutic targets.
Over the course of the SARS-CoV-2 pandemic, several divergent variants emerged which carried insertion-deletion mutations (indels), suggesting indels may be crucial in enabling the virus to adapt to changing host environments. However, indels are challenging to study, especially at the within-host level, as the process of sequencing and alignment generation can introduce artefacts. We developed a realignment method that accurately determines the genomic location and within-host frequency of indels, which we applied to the 120,000 high-quality sequenced samples collected as part of the UK Office for National Statistics’ Covid Infection Survey (ONS-CIS). We found that a high proportion of samples had within-host polymorphisms for deletions in the N-terminal domain of non-structural protein 1 (NSP1), which is the host translation shut-off factor. Recurrent deletions in this region were also observed during persistent infections captured in the ONS-CIS. Deletions in this region have been previously linked to reduced interferon response and lower viral load. The deletions observed within-host in this region were almost always in-frame, suggesting they are genuine and not artefacts. At the consensus level, these deletions occur sporadically across many different lineages, are rarely clustered on the phylogeny, and appear at markedly different frequencies among samples depending on variant. Using the linked data of the ONS-CIS including household outbreaks, we will present evidence that the deletions generated at a high rate in this region in SARS-CoV-2 are transmitted, and that this region is subject to conflicting within-host and between-host evolutionary pressures shaped by the role of NSP1 in suppressing the interferon response. The role of indels in driving genetic diversity in RNA viruses is often overlooked, and this analysis provides new insight into this important but underappreciated evolutionary dynamic.
Despite effective antiretroviral therapy (ART), HIV-1 persistence remains the principal barrier to a functional cure, driven by long-lived viral reservoirs whose defining features—particularly within tissues—remain poorly understood. Here, we introduce the Virus Microenvironment (VME) as a conceptual framework to explain how HIV-1 establishes, diversifies, and maintains tissue reservoirs that fuel viral rebound.
Using an integrated experimental and computational framework combining viral deep sequencing, immunofluorescence-based viral detection, spatial transcriptomics, cell-type inference, cell–cell interaction analyses, and machine learning, we systematically identified viral, molecular, cellular, and spatial determinants of the VME associated with persistent reservoirs in the SIV/rhesus macaque model.
Following analytical treatment interruption (ATI), viral sequences derived from tissues displayed substantially greater genetic diversity, compartmentalization, lineage turnover, and convergent evolution than matched PBMC-derived sequences. Our analyses demonstrated declining proviral diversity in blood during ART contrasted with preserved and dynamically shifting viral lineages in gut-associated tissues, indicating that non-blood reservoirs maintain complex viral populations under ART and disproportionately seed early viral rebound. Spatially resolved analyses revealed that regions harboring viral foci during early rebound post-ATI were marked by activation of stress-response pathways and translational reprogramming, including disruption of cap-dependent translation, mitochondrial dysfunction, and extracellular matrix remodeling. Spatial cell-type inference linked viral localization to epithelial luminal niches and indicated that persistent reservoirs were uniquely embedded within immune landscapes resembling immunosuppressed germinal center–like structures. By contrast, transient reservoirs were associated with adaptive effector populations consistent with immune-active environments capable of rapid antiviral responses.
Together, these findings support a new tissue reservoir paradigm in which SIV/HIV actively promotes formation of a specialized virus microenvironment that enables local viral replication, diversification, and long-term persistence. The VME represents a physiological sanctuary that sustains rebounding virus while evading immune clearance, highlighting tissue microenvironments as critical targets for curative HIV strategies.
Background
Growing evidence suggests social interactions within viral populations may influence adaptation and persistence. For example, hepatitis C virus (HCV) is hypothesized to exhibit antigenic cooperation (AC), a kind of altruistic behavior enabling immune escape of specific variant populations. The AC model raises the question as to whether other chronic viral infections can be better understood through a social framework? For example, cooperative behavioral dynamics within measurably evolving organisms are often disrupted by mutation, requiring compensatory mutations (i.e., co-evolution) to restore homeostasis. Because HIV is one of the most rapidly mutating infectious viruses, occurrences of compensatory mutations within the host may occur with sufficient frequency to detect the presence of cooperative behavior and inform the interactions responsible. Yet, no computational tool exists that can distinguish population-level from individual genome co-evolution to investigate the putative role of viral cooperative interactions in disease.
Methods
We developed the Graphical mOdel of Social Interactions using Phylogenies (GOSIP), capable of capturing significantly co-evolving sites across branches of simulated viral populations with 98% accuracy. We applied this tool to S[imian]IV envelope sequences sampled longitudinally from multiple tissues within two cohorts of untreated macaques – one undergoing transient CD8+ cell depletion and rapid SAIDS onset, and the other allowed to naturally progress to SAIDS. The resulting co-evolutionary data were evaluated for tissue involvement, location within targeted immune epitopes, and potential role in disease progression.
Results
The major finding of this study was the significant inverse correlation among all animals of the time to SAIDS onset with the rate (per day) of amino acid co-evolution (R2=0.83, p<0.001).
Conclusion
Results point to viral population-level co-evolution as a more prominent biomarker of disease progression than prior markers, such as CD4+ T cell nadir or viral load set point. Determining whether the site dependencies described can be explained by AC is a critical next step, as an effective therapeutic regime in this context would require a multi-faceted immune attack that may not be achievable through a single broadly neutralizing antibody response.
Noroviruses are the leading cause of viral gastroenteritis, with most cases arising from genotype GII.4 infection. New GII.4 variants often emerge from long internal branches, a pattern similar to that observed for certain SARS-CoV-2 variants of concern, such as Omicron and Alpha. For SARS-CoV-2, diverse phylogenetic evidence suggests that prolonged infections are the source of these variants. Here, we evaluate whether prolonged infections may similarly give rise to new GII.4 variants by applying several analyses to population-level and within-host GII.4 sequence data, including phylogenetic analyses, root-to-tip divergence calculations, substitution rate estimation, and detailed sequence analyses. We first find that overall substitution rates for the GII.4 capsid gene exceed within-variant substitution rates. Analysis of prolonged GII.4 infections further shows that within-host evolutionary rates are often higher than those observed at the population level for matched variants. Many recurrent substitutions across prolonged infections occur in the P2 region of GII.4’s capsid gene, the dominant target of our immune response. This suggests that substantial antigenic change can arise within hosts experiencing prolonged infection. The majority of the identified recurrent substitutions in P2 are also ‘lineage-defining’ in that new GII.4 variants tend to carry substitutions in this region. Notably, and in contrast to SARS-CoV-2, most genetic differences between newly emerging GII.4 variants and earlier strains are synonymous rather than nonsynonymous. This pattern indicates strong purifying selection along these long branches, alongside positive selection acting on key antigenic sites. Consistent with this, capsid evolution within prolonged GII.4 infections also shows higher synonymous than nonsynonymous substitution rates. Together, these results are consistent with the hypothesis that prolonged infections may contribute to the emergence of new GII.4 variants. However, increased sampling at the population level will be necessary to more fully evaluate this hypothesis.
HIV-1 has a high rate of recombination that has significantly shaped its evolutionary history. Identifying these past recombination events will require accurate and scalable computational methods. We previously adapted the dynamic stochastic block model (DynSBM) from social network analysis to recombination detection, demonstrating greater accuracy on simulated data. We describe further improvements on DynSBM by applying an expanded dataset of HIV-1 genomes, including early sequences. We present a novel Bayesian implementation of DynSBM tailored for recombination analysis.
We partitioned an alignment of $n=718$ HIV-1 sequences into 73 windows, and inferred a graph for each window from the TN93 distance matrix. DynSBM identified $K=30$ communities from the resulting series of graphs. We assume a transition between communities as we traverse a sequence represents a putative recombination breakpoint. Transitions were removed if they reverted to the previous community after one window, retaining 65% inferred breakpoints. Based on the inferred transition rate matrix $Q$, we applied an elbow method to $Q^{-1}$ to cluster the communities into nine proposed subtypes, which were highly concordant with known subtypes (normalized mutual information = 0.90). Because DynSBM requires complete data, we incorporated partial genomes, including sequences from historical samples (1959-1978, $n=9$) by assigning communities from the $k$-nearest neighbours (TN93) for each window. We extracted putative non-recombinant intervals for sets of three windows from the alignment by removing sequences with any transitions. Intervals retained about 39%-62% of sequences, with substantial turnover among intervals.
Key limitations of DynSBM are that it is not designed for recombination detection and cannot quantify uncertainty. To address this, we have implemented a custom Bayesian version of DynSBM in R (bayblocks). With data represented as a sequence of adjacency matrices, we use the Gibbs algorithm to sample community memberships and transition rates, incorporating biologically realistic constraints.
The development of durable antibody-based HIV treatments requires understanding not only which mutations confer resistance, but also how frequently those mutations emerge. Prior work has characterized the effects of individual mutations on viral fitness and antigenic escape; however, these evolutionary outcomes are also constrained by the mutational supply of new variants. How these mutational biases shape the evolution of HIV escape pathways in vivo remains largely unquantified. Here, we present a site-specific mutational supply framework and use it to explain viral escape from broadly neutralizing antibodies (bNAbs).
We measured the in vivo mutation rate across the HIV NL4-3 genome using our Error Rate of Replication Sequencing (ERR-Seq) platform, revealing extensive site-specific heterogeneity, with pronounced mutational hot and cold spots. Modeling analyses revealed that this variation was explained by a combination of local nucleotide context, RNA secondary structure, and mutation-type biases. Notably, our data support a model in which APOBEC3 proteins frequently perform sub-lethal mutagenesis, increasing the effective mutation rate of HIV.
By integrating these measurements into a generative mutation supply model, we can predict the rate of every nucleotide change across the genome. We applied this framework to a prior study of AAV-delivered bNAbs, focusing on two competing bNAb escape pathways. In that study, the higher fitness cost escape was twice as likely to emerge as the lower fitness cost escape. We found that this outcome is explained by mutational supply, as the higher fitness cost pathway is favored by its higher mutation and recombination rates.
This data shows strong heterogeneity in mutational supply across the HIV genome and demonstrates that mutational biases can shape escape trajectories. Incorporating the mutational supply framework into models of viral evolution should improve quantitative predictions of escape pathways and assist in the design of more durable antibody-based interventions.
HIV integration into the human genome establishes long‑lived proviruses that persist despite antiretroviral therapy (ART) and remain a central barrier to cure. Comparative HIV integration site (IS) analyses across individuals have revealed persistent features of proviral landscapes and differences associated with age at exposure, timing of ART initiation and duration on ART. However, IS datasets are extremely sparse, each person typically contributes only tens to thousands of unique IS among billions of potential genomic loci and statistical approaches must therefore support inference under limited person level observations, small sample sizes and high dimensional categorical annotations.
Traditional IS analyses generally either pool sites across individuals or model person specific integration rates using regression type frameworks but both approaches face limitations under data situations where most categories have no observations (high zero counts). Here we present three new statistical methodologies developed that address these challenges through Bayesian and fiducial principles, each with an equivalent method that requires no prior specification
First, we introduce a simple method to compare how often integration sites occur in different genomic categories across groups, even when sample sizes are very small. We apply this to pathways and to well-known integration enriched genes such as BACH2 and STAT5B. Second, we present a pattern finding approach that identifies the smallest set of genomic categories that best distinguish one group from another, functioning like an easy to interpret classification tool without requiring modelling assumptions. Third, we develop a flexible modeling framework that can handle complex datasets where many categories contain no detected integration sites, along with a simplified version for analyses focused only on whether a site appears at least once. We demonstrate these methods using simulations and apply them to an unpublished dataset.
The short period of time prior to the establishment of a large and systemically replicating HIV-1 population provides a narrow window of time when the virus is at its most vulnerable to eradication. However, our understanding of these earliest stages of infection remains incomplete. To resolve these early dynamics, we used a non-human primate model to track the replication of clonal but genetically discriminable viral lineages across tissues and plasma.
Six animals were infected intravenously and necropsied at 4-5 days post-infection, with ~200 anatomical sites collected from each animal, spanning the gastrointestinal (GI) tract, GI-associated lymph nodes, peripheral lymph nodes, spleen, and additional organs. Viral DNA and RNA were quantified per site using qPCR, and next-generation sequencing was used to determine the proportions of individual barcoded lineages in each DNA-positive sample.
We developed a statistical framework to determine the sites of initial infection for individual lineages based on their replicative differences across tissues, identifying the putative tissue founder site for over 250 lineages. Replication at the initial site of infection varied widely, even among lineages originating within the same tissue sample, but did not differ significantly between tissue groups, indicating strong microenvironmental effects. Once a lineage expanded sufficiently at its origin site, dissemination was rapid and widespread, although early spread frequently followed local anatomical relationships.
Replicative success at the founder site was predictive of a lineage’s ultimate contribution to plasma viremia, with predominant plasma lineages originating from all tissue groups. Interestingly, in one animal, infection was overwhelmingly driven by a single GI-origin lineage that expanded far beyond all others highlighting how local inflammatory or target-cell conditions can amplify early stochastic differences. Together, our findings underline the importance of the earliest events in setting the course of the infection.
Numerous experimental evolution studies have suggested that adaptation rate of microbial populations evolving in stable environments decline over time. To investigate the characteristics of adaptation deceleration in a fast-evolving virus, we propagated HIV-1 in two human T-cell lines (MT-2 and MT-4) for approximately 4.8 years and tracked its genome evolution through next-generation sequencing. The curated whole-genome sequencing data can be accessed and explored via LTEEviz, an interactive web application. Time-resolved sequencing data indicated that despite constant fixation rate of 0.085 (MT-2) and 0.042 (MT-4) mutations per generation, the fixation kinetics of adaptive mutations changed considerably over time. The rate of fixation of adaptive parallel mutations decreased by 44% per 300 generations, while their conferred fitness gain diminished by 27% (MT-2) and 18% (MT-4) per every added adaptive mutation in their genetic background. Furthermore, we identified unique yet consistent patterns of sequence evolution among different regions of the HIV-1 genome: non-coding regions, essential for viral replication and packaging, appeared to be subject to the strongest positive selection, whereas pol and gag genes were subject to the strongest purifying selection. In contrast, nef and vpr accessory genes demonstrated patterns of random evolution, expected in the absence of selection. In line with this expectation, the evolving populations of HIV-1 acquired and fixed multiple loss-of-function mutations in nef and vpr. Additionally, the nef gene repeatedly underwent large deletions. In two populations, several independent deletions of varying lengths together removed approximately 400 bp (equivalent to 4.3% of the HIV-1 genome), exemplifying progressive genome shrinkage. These large nef deletions increased in frequency faster than expected under neutrality and in a length-dependent manner. Together, our results confirm that HIV-1 genomic evolution is characterized by a swift and substantial deceleration of adaptation, while highlighting genome shrinkage as one of the underlying adaptive mechanisms.
Background - British Columbia (BC), Canada, is rare in maintaining daily HIV phylogenetic monitoring for over a decade. Long-term monitoring enables evaluation of the impact large-scale prevention strategies have on HIV transmission suppression.
Methods - We analyzed HIV-1 pol sequences derived from 11,000+ individuals collected between 1996 and 2024, representing >80% of the estimated prevalence. Transmission clusters were inferred from a distribution of 100 phylogenetic trees using a patristic distance threshold of 0.02 substitutions/site. We quantified trends in changes in the viral population size (effective population size, Nₑτ), cluster birth rates, proportion of new diagnoses linked to clusters, and transmission intensity (lineage-level diversification rates). Changes were evaluated relative to implementation of province wide treatment as prevention (TasP; 2009) and pre-exposure prophylaxis (PrEP; 2018).
Findings – Phylodynamic estimates of Nₑτ revealed no decline during the ART era, a significant decline after introduction of TasP (slope change β=-0.051/yr,p<0.001) and further acceleration in decline after introduction of PrEP (slope change β=-0.052/yr,p<0.001). Declines in transmission were further evidenced by reductions in cluster births and declining proportions of new diagnoses linked to clusters. Reduction in clustering probability followed introduction of PrEP (odds ratio 0.24, 95%,CI:0.20-0.28). Cluster level diversification rates, reflecting transmission intensity, differed significantly across prevention eras (Kruskal-Wallis=26.85,p<0.001), with substantially lower rates after implementation of TasP compared with the pre-TasP period. Annual analyses demonstrated declines in log-transformed diversification rates subsequent to both generalized TasP and focused PrEP, despite persistence of a few established clusters. Residual transmission persisted within specific subpopulations.
Interpretation - This study provides empirical evidence that combined implementation of TasP and PrEP progressively fragments HIV transmission networks. Increasing fractions of non-clustered cases suggests contribution of multijurisdictional transmission. Sustained phylogenetic monitoring revealed durability of prevention gains, and persistence of residual transmission. Phylogenetic monitoring aided adaptation and focus of public health responses without exacerbating stigma.