Sequence data, such as nucleotides or amino acids, is essential for understanding biology. However, analyzing sequencing data and genotype-phenotype associations is challenging due to noise, nonlinear relationships, collinearity, and high dimensionality. While machine learning (ML) effectively detects patterns in this data, user-friendly tools remain limited. To address this, we developed...
Heterogenous HIV-like particle (HLP) is a targeted, potent HIV latency reversal agent (LRA) for people living with HIV (PLWH) on ART. When using samples derived from over 50 PLWH on treatment for 2 to 22 years, HLPs at least 100-fold more effective than other latency reversal agents in clinical development. HIV primarily infects activated CD4+ T cells whose T cell receptor (TCR) recognizes HIV...
Nursing home residents pay every year a devastating tribute to respiratory virus outbreaks. While antiviral treatments are effective if administered shorty after symptom onset or, even better, as pre- or post-exposure prophylaxis (Pep or PreP, respectively), they remain dramatically underused. This is due to the limited number of studies that have evaluated their efficacy in nursing homes,...
The molecular clock is a statistical tool that we use to infer evolutionary rates and timescales from molecular sequence data, with the use of calibrations. These calibrations can include sequences sampled at different points in time for many organisms. Without calibrations, evolutionary rates and times are jointly unidentifiable and thus are required. Before inferring rates and times, it is...
Bayesian phylogenetic inference is commonly used to generate distributions of phylogenies and evaluate evolutionary models from genetic data, concluding with a visual inspection of randomness in parameter sampling. Visual inspection impedes full automation of the algorithm making it inaccessible to non-experts, obstructing cloud-based runs, and impeding the running of multiple replicates. One...
Phylodynamic methods provide a coherent framework for the inference of epidemiological parameters directly from genetic data. In particular, methods based on multi-type birth-death models have been used to infer the movement of pathogens between geographic locations or host types, and the transition of infected individuals between disease stages. In these models, population heterogeneity is...
Background: There is a pressing need to monitor the circulating strains and the emergence of novel HIV-1 variants in the country, especially in the understudied South-south regions. Thus, this study aimed to characterize the genetic diversity of HIV-1, tropisms, and drug-resistant mutations (DRMs) among HIV-infected individuals in the region.
Methods:
One hundred and six HIV-infected...
The latent viral reservoir (LVR) consists of transcriptionally-inactive HIV-1 proviruses within long-lived resting CD4+ T-cells, in which proviruses can persist even under fully-suppressive antiretroviral therapy (ART). The presence of this reservoir is the primary reason for viral resurgence upon ART interruption. Gaining a deeper understanding of proviral dynamics in the LVR is critical for...
Understanding transmission clusters is essential to uncovering the dynamics of viral
epidemics, identifying outbreak drivers, and guiding effective public health responses. Cluster
analysis combines genomic and epidemiological data to trace transmission pathways and
generate actionable insights to curb disease spread.
ClusterFinder, developed within the EU-funded SEQ4EPI project, is a...
Over the 2023/2024 winter season, we employed a targeted enrichment by hybrid capture method to sequence 1’160 nasopharyngeal swabs from patients across Switzerland presenting with flu-like symptoms. We identified, extracted and produced high-quality genomes of over 25 respiratory virus strains, including strains never before sequenced and publicly shared in Switzerland. Strains ranged from...
The increasingly widespread application of next-generation sequencing (NGS) in clinical diagnostics and epidemiological research has generated a demand for robust, fast, automated, and user-friendly bioinformatics workflows. To guide the choice of tools for the assembly of full-length viral genomes from NGS datasets, we assessed the performance and applicability of four open-source...
The infected blood inquiry (IBI) carried out by the UK Government examined the incidence of mass contamination of human plasma-derived commercial clotting factors between 1970 and the early 1990’s. The inquiry demonstrated that numerous viruses, including HIV and hepatitis C virus, were transmitted via contaminated blood and blood-products. While limited understanding of bloodborne viruses...
HIV- is a highly adaptive virus that due to its high mutation rate and yield adapts to diverse evolutionary challenges, thereby complicating containment and treatment of the virus. To further understand HIV-1’s genomic evolution, we have utilized the genomic and phenotypic data obtained from our longest HIV-1 evolution experiment, corresponding to 5 years of virus passaging, in two replicates...
Effective viral transmission network analysis is crucial for controlling virus spread. Cluster size is key for prioritizing interventions. Traditional methods often miss network extent due to sampling biases. This study investigates mean pairwise genetic distance (MGEND) as a proxy for estimating "true" cluster size.
Viral samples from an HIV clinic in Mexico City were classified into...
Social microbial behavior has been recognized as a contributor to clinical challenges such as antibiotic resistance and immune evasion. In structured communities, like bacterial biofilms, cooperation among specialized subpopulations ensures survival. Growing evidence suggests social interactions may also influence viral adaptation. For example, hepatitis C virus (HCV) is hypothesized to...
Subclinical or asymptomatic influenza infections are likely to play a major role in ongoing transmission but require longitudinal cohorts to detect. The Household Influenza Vaccine Evaluation (HIVE) cohort is an ongoing study of households where participants respond to weekly symptom surveys and contribute biannual blood samples. From 2011 to 2020, 2039 participants contributed 8511 serum...
The accrual of nucleotide substitutions in virus genomes accompanies transmission of those viruses through their host populations. Because lineages with higher fitness tend to transmit rapidly to new hosts before incurring very many substitutions, large numbers of related sequences are usually interpreted as evidence of transmission success. Quantities like the local branching index (LBI) aim...
Respiratory Syncytial Virus (RSV) is a single-stranded RNA virus that most people encounter as children. RSV infection generally manifests with mild, cold-like symptoms, but can cause severe complications in immunocompromised populations such as infants, the elderly, and immunosuppressed transplant patients. These
Despite its global impact, epidemiological surveillance of RSV in Switzerland...
Timed viral phylogenies are often used to understand geographic movements, past population dynamics, the emergence of new lineages and epidemic dynamics. These inferences are frequently done in a framework using coalescent theory. Exchangeability in coalescent theory refers to the property that each pair of lineages is equally likely to coalesce, moving back in time from the present to the...
Severe Fever with Thrombocytopenia Syndrome Virus (SFTSV) is a tick-borne virus recognized by the World Health Organization as an emerging infectious disease of growing concern. Utilizing phylodynamic and phylogeographic methods, we have reconstructed the origin and transmission patterns of SFTSV lineages and the roles demographic, ecological, and climatic factors have played in shaping its...
HIV remains a major public health challenge, with transmission dynamics and the evolution of drug resistance posing significant barriers to effective control. While traditional sequencing methods, such as Sanger sequencing, have been instrumental in studying specific viral regions, they often fail to capture the full genomic diversity within and between hosts. In this study, we analyze 441...
When constructing phylogenetic trees for phylodynamic and phylogeographic analyses, researchers often rely on tree-building software such as IQ-TREE, BEAST, and BEAST2. The algorithms used by these tools, whether based on maximum likelihood or Bayesian principles, are inherently stochastic. As a result, repeated analyses of the same data can yield trees with unstable structures due to weak or...
Influenza A viruses (IAVs) remain a significant public health threat due to their ability to jump between host species, as demonstrated by the H1N1 pandemic in 2009. Despite increased genomic surveillance, knowledge of the evolutionary dynamics allowing such zoonotic events is still limited, and the genetic markers that facilitate transmission between humans and swine remain unclear. To...
The COVID-19 pandemic saw successive emergence and spread of novel viral variants, exhibiting enhanced transmissibility or evasion from vaccine- and infection-acquired immunity. While the genotypic and phenotypic basis of SARS-CoV-2 variants have been extensively characterized, the evolutionary factors governing their patterns of emergence are less well understood. In this study we...
Clustering infections by genetic similarity is a common method of characterizing the risk structure of a population. A graph is constructed by connecting infections with genetic distances below a threshold, then connected components with two or more nodes are extracted as "clusters". Studies of HIV molecular epidemiology often use logistic regression to find associations between potential risk...
Interactive web dashboards have proven to be effective tools for exploring and communicating epidemiological insights obtained from genomic data. In particular, they are valuable for large and/or regularly updated datasets, facilitating real-time monitoring and the rapid identification of new variants and transmission patterns. During the SARS-CoV-2 pandemic, many new genomic epidemiological...
Despite intensive study, surprising gaps remain in our knowledge of transmission patterns during the COVID-19 pandemic, particularly transmission as new lineages emerge. We analyzed 134,785 SARS-CoV-2 genomes from 7 lineages collected in Massachusetts from November 1, 2021, to January 17, 2023; this includes 85,125 genomes with individualized epidemiological data across 666 testing facilities....
Human metapneumovirus (hMPV) is among the most common causes of respiratory tract infections among humans, especially affecting children and immunocompromised patients. HMPV is closely related to human respiratory syncytial virus (hRSV), these viruses being the only two members of the pneumovirus family known to infect humans. All pneumoviruses express fusion (F) proteins on their surface...
In an HIV-1 molecular epidemiology study in Spain, based on maximum likelihood (ML) phylogenetic analyses of protease-reverse transcriptase (PR-RT) sequences, we identified a cluster of 46 individuals from 8 regions not grouping with references of known subtypes or CRFs, which through analyses with Recombination Identification Program (RIP), bootscanning, and trees of partial fragments, was...
Background: Case reporting in a pandemic or for emerging viral infections depends heavily on testing strategies and as a result the degree of under-reporting of true incidence can vary substantially. We previously developed GInPipe [1], a computational tool that allows estimation of under-reporting levels over time from time-stamped pathogen genome surveillance data, within a few minutes...
Following initial infection, HIV spreads to regional lymph nodes within 3–6 days, and systemic dissemination occurs within 6–25 days. However, these experimental findings do not align with the results of current HIV mathematical models, which show that while viral dissemination in the blood occurs within 6–25 days, the virus appears in the lymph nodes after a delay of 50 days. We hypothesize...
Background:
This study was aimed to understand the evolutionary and transmission dynamics of HIV-1 epidemic in Serbia and to reveal the socio-demographic and clinical factors that shaped the expansion of phylogenetic transmission clusters. The dataset of 720 HIV-1 pol sequences isolated from both newly diagnosed and therapy experienced healthcare clients, between 1997 and the end of 2023, was...
Viral sequencing data are collected, stored, and processed by a wide range of entities, from individual scientists and laboratories to large consortia and global databases. Yet, there has been a lack of a general, reusable software for managing pathogen sequencing data. During the SARS-CoV-2 pandemic, when many new laboratories and consortia started sequencing or analyzing data, they relied on...
HIV-1 transmission leads to lifelong infection marked by continuous viral evolution and evasion of host immunity. The specifics of this host-pathogen interaction, including the complex dynamics of transmission, quasispecies diversity, and viral recombination rate, remain unclear. To better characterize these dynamics in vivo, we engineered doubly barcoded viral libraries of three HIV-1 strains...
Mathematical modelling allows us to answer “if this then what” questions. For infectious disease epidemiology this has been widely used to guide interventions: if we intervene like this, how much disease will we prevent? Also important, though much less widely practised, is the use of mathematical modelling to guide the design of studies and trials: if we measure like this, how much will we...
SARS-CoV-2 infection is a leading cause of death by infectious disease and continues to be a pressing public health issue due to ongoing emergence of novel variants. One factor that affects variant emergence is the transmission bottleneck size, which refers to the number of unique genetic variants from a donor host that transmit and establish the population in a recipient (defined as a pair)....
Every time the Human Immunodeficiency Virus (HIV) replicates within a cell, errors can creep into its genome. These mutations make it possible to track an epidemic, because the more similar two virus sequences are, the closer the two individuals who bear them are likely to be in the chain of transmission.
For more than twenty years, the field of phylodynamics has been analyzing this type of...
The discovery of very potent, broadly neutralizing antibodies (bnAbs) against HIV-1 has revived the hope for new treatment and prevention methods against the virus. Nevertheless, several clinical trials have shown that the breadth of such bnAbs is still limited by viral resistance. Here, we explore whether currently available neutralization
datasets on bnAbs predicted to bind similar epitopes...
A key question in evolutionary epidemiology is to determine differences in the conditions that may allow some mutant strains to spread in a population where a resident strain is already circulating. Evolutionary invasion analyses assume that the immunity is long-lasting for previously infected individuals making it difficult to study straits such as immune escape. We relax this last assumption...
Background:
Persistent SARS-CoV-2 infections in immunocompromised individuals are often associated with key features of viral evolution, including accelerated mutation rates, emergence of novel variants of concern (VOCs), cryptic lineages, reservoirs of antiviral resistance, super-spreading events and interspecies transmission. These patients represent a critical group for the...
Endogenous Viral Elements (EVEs) originate when viruses integrate their genetic material into the host genome, becoming a permanent part of the host’s DNA. As molecular fossils of ancient viral infections, EVEs offer valuable insights into host-virus coevolution and virus diversity over evolutionary timescales.
We aim to detect EVEs across 948 mammalian genomes by performing a nested...
Jingmenviruses are understudied flavi-like viruses with a segmented genome. They have been detected worldwide in a wide range of hosts, such as arthropods including ticks and vertebrates including cattle and humans with febrile illness symptoms. As next-generation sequencing has become more affordable, increasing numbers of large-scale metagenomics studies have been published, alongside raw...
Molecular epidemiology of hepatitis A virus (HAV) plays a critical role in identifying outbreak origin and conducting surveillance. Although it is mostly carried out using short partial VP1/2A genomic sequences, utilizing whole-genome sequences (WGS) provides more accurate and robust information. In Argentina, where HAV vaccination is mandatory since 2005, the local sequence information is...
Detection of HIV-1-infected cells in single-cell sequencing studies presents a significant challenge partially due to the high variability of viral genomes. Conventional alignment-based tools frequently fail to identify these cells accurately, as they rely on perfect sequence matches. We have developed a bioinformatic pipeline that integrates multiple B-clade viral genome references and...
Five years since the start of the pandemic, SARS-CoV-2 remains a leading cause of acute respiratory infections, despite most people having been infected at least once. Unprecedented levels of global genomic surveillance provide a rich basis for understanding how SARS-CoV-2 variants continue to evolve in the post-Omicron era, a critical topic for vaccine strain updates.
Using insights...
Background
In 2022, an Mpox clade II outbreak affected many countries including Slovenia. To optimize control, it is important to know the extent by which such outbreaks are driven by introductions from abroad or by within-country transmission. We used sequences of all Slovenian cases in a phylodynamic model to address this question, and investigated the potential to assess the number of...
Singapore faces recurring dengue outbreaks that pose substantial public health and economic burdens, with recent years seeing unprecedented case numbers. While current surveillance systems track case counts, the ability to predict outbreaks in real-time remains challenging. We propose combining real-time estimation of the effective reproduction number (Rₑ) with analysis of viral spread...
Human immunodeficiency virus (HIV) causes substantial morbidity and mortality in sub-Saharan Africa. HIV cases were first reported in Madagascar, an island nation off Africa’s southeastern coast, in the 1990s, although current epidemiologic data are limited. However, high prevalence of commercial sex work, recent economic changes driving population mobility, and prevalence of other sexually...
Understanding viral transmission is important for responding to infectious diseases. For example, information about the geographic spread of a virus can inform policy decisions about transportation and borders. Estimates of the host species' movements in zoonotic viruses inform our knowledge of the nature and frequency of host jumps, and hence of zoonotic risk. Geographic and host movements...
In July 2023, the WHO issued guidance indicating that individuals with HIV viral loads between 200-1000 copies/mL have a “negligible” risk of onward transmission to sexual partners, marking a shift from “undetectable = untransmittable” messaging. However, this guidance is based on limited empirical data, particularly from settings in Africa since widespread roll out of antiretroviral (ARV)...
Enterovirus D68 (EV-D68), first discovered in the 1960s, emerged to cause outbreaks of severe respiratory disease and polio-like paralysis in 2014. The transmission dynamics of EV-D68 before its emergence are poorly understood due to its clinical similarities to other respiratory infections. These similarities also complicate detection without specific testing. Despite this, extensive genetic...
The B.1.351/Beta VOC was first identified in South Africa in October 2020. Within the United Kingdom (UK), the first two cases of Beta were detected on 10 December 2020. Thereafter, the increase in the number of cases of Beta within the country raised alarms of its rapid growth which led to the UK government closing its international borders to prevent further introductions. Additionally, in...
The COVID-19 pandemic has caused over 776 million cases and 7 million deaths globally, highlight-ing the need for predictive tools to anticipate SARS-CoV-2 evolution. The S1 subunit of the Spike glycoprotein, essential for viral entry into human cells, undergoes frequent mutations that influence transmissibility and immune evasion. Predicting these mutations is crucial for developing vaccines...
Background: Ever since the Severe Acute Respiratory Syndrome Coronavirus-2 (SARS-COV-2) which caused COVID-19 disease began to spread, the globe has been dealing with an unparalleled public health calamity. The entire health system is under strain as a result of the pandemic. Above all, microbiologists have experienced significant challenges in terms of diagnosis. The intriguing part that was...
Influenza remains a persistent global health threat due to its high mutability, rapid transmission, and significant socioeconomic impact. This study addresses the critical research question: How can computational frameworks overcome current limitations to accurately and robustly predict human-influenza protein-protein interactions (PPIs)? Understanding these interactions is crucial for...
Background
The COVID-19 pandemic has highlighted the need for real-time infectious disease surveillance and forecasting systems to identify trends in transmission. In this study, we compare short-term forecasting models for COVID-19 hospital admissions that make predictions 1 to 4 weeks ahead based on retrospective electronic health record data from the Bern region of...
High-Throughput Single-Genome Amplification and Sequencing (HT-SGS) enables detailed measurements of intra-host virus genotypes via barcoding of individual virus genomes during reverse transcription (RT) followed by PCR amplification, sequencing, and bioinformatic error correction procedures. However, the absence of “ground-truth” RNA reference samples makes it challenging to evaluate...
During infection, human immunodeficiency virus (HIV) maintains a stably integrated reservoir of proviruses that persist within the host genome despite combined antiretroviral therapy (cART). Characterizing these reservoirs remains challenging due to insufficient phylogenetic resolution, particularly under cART, which limits our ability to assess proviral integration and replication competency...
HIV-1 has nine main subtypes that persist in infected populations; however, the overall diversity of HIV-1 is much larger due to recombination amongst these original subtypes. Recombination has led to many unique recombinant forms and over 100 circulating recombinant forms (CRFs), which may become more prevalent than the two parent strains in a given region. We analysed sequences taken from...
RNA viruses like SARS-CoV-2 have a high mutation rate, which contributes to their rapid evolution. The rate of mutations depends on the mutation type (e.g., A→C, A→G, etc.) and can vary between sites in the viral genome. Understanding this variation can shed light on the mutational processes at play, and is crucial for quantitative modeling of viral evolution. Using the millions of available...
The way a virus spreads leaves footprints in its genome. Phylodynamics leverages these footprints to estimate epidemiological parameters from collected virus genetic data. The estimation is typically done in a likelihood-based framework. The epidemiological process is modeled on a virus transmission tree. This tree is approximated by time-scaled phylogenetic trees reconstructed from virus...
The HIV epidemic in the Former Soviet Union (FSU) is growing rapidly, with 140,000 new infections reported in 2023, marking a 20% increase since 2010. Despite ongoing efforts to expand HIV prevention and treatment programs, AIDS-related deaths are on the rise in this region, underscoring the need for a better understanding of transmission dynamics and drug resistance mutations (DRMs) to inform...
The measles virus, renamed Morbillivirus hominis in 2023 (Paramyxoviridae), is subjected to a global elimination program established by the World Health Organization since 2012. However, despite these efforts, two significant epidemic resurgences occurred in France (2008-2012 ; 2017-2019). Based on a rich collection of samples, this work illuminates the evolutionary dynamics of...
The unprecedented number of PCR tests and viral genomic sequences generated during the COVID-19 pandemic offers an unprecedented amount of data to follow the dynamics and evolution of a pathogen. While largely used during the pandemic, in particular to follow potential changes in vaccine efficacy, the full potential these community data remains to be exploited. For this project, building on a...
Tree metrics are measures to compare the similarity between two tree topologies. Effective sample size (ESS) is a statistic that quantifies the amount of autocorrelation in Markov chain Monte Carlo (MCMC) and is used to assess run convergence. With tree metrics, Lanfear et al showed that we can compute an ESS for tree topologies. From the plethora of tree metrics that have been developed,...
Epidemiological models are key tools for understanding infectious disease dynamics. Traditional SIR models assume that all individuals in a host population are initially susceptible, limiting their ability to predict complex outbreak patterns observed in real-world epidemics. Here, we introduce the USIR model, which incorporates the concept of evolutionary niche expansion—an adaptation of...
Genomic surveillance of pathogens has become a critical element of modern public health. A key step in most surveillance pipelines is amplification of a target pathogen or locus through PCR, particularly in samples where higher sensitivity is desirable. Despite widespread usage of amplicon sequencing, the potential for errors introduced during amplification to mislead an analysis is...
Phylodynamic methods are widely used to infer the population dynamics of viruses between and within hosts. For HIV-1, these methods have been used to estimate migration rates between different anatomical compartments within a host. The methods typically assume that there is no selective pressure acting on the virus, even though it is known that viruses often experience strong selection...
Respiratory syncytial virus (RSV) is a highly contagious, enveloped, single-stranded RNA respiratory virus that primarily causes mild, cold-like symptoms. However, it can lead to severe illness, hospitalization, or death in infants and immunocompromised individuals. RSV is classified into two subtypes, RSV-A and RSV-B, with one subtype typically dominating a given season that begins in the...
Simulating within-host viral sequence evolution allows for the investigation of factors such as the role of recombination in viral diversification and the impact of selective pressures on virus evolution. Here, we add another model to the toolbox of within-host sequence simulators: wavess (within-host agent-based viral evolution sequence simulator), a discrete-time agent-based model and a...
Human papillomaviruses (HPVs) are common sexually transmitted viruses. Most infections they cause clear naturally within a few months or years, but can become chronic and can cause many ano-genital cancers, especially cervical cancers. Although safe and effective vaccines are available, the genetic bases of persistence and pathogenicity remain poorly understood. Some HPV genotypes, such as...