31st International Dynamics & Evolution of Human Viruses

Name: 31st International Dynamics & Evolution of Human Viruses
Start: 2024-06-19T08:00:00-07:00
End: 2024-06-22T16:00:00-07:00
Location: Squamish, BC, Canada

Jun 19 – 22, 2024

Squamish, BC, Canada

Canada/Pacific timezone

This conference is now SOLD OUT for in-person registration. Virtual registration is still available.

Maureen Helinski

mhelinsk@health.ucsd.edu

Machine learning models for forecasting SARS-CoV-2 lineage frequencies

Not scheduled

20m

Squamish, BC, Canada

Poster Software, tools & methods

Ruian Ke (Los Alamos National Laboratory)

With dozens or hundreds of minor variants of SARS-CoV-2 circulating in the global population, there is an urgent need for predicting the future frequencies of a new variant when it emerged in the population. This would allow for more focused experimental efforts and for timely formulation of new vaccines. To address this need, we constructed machine learning models, based on the transformer architecture (used in modern language processing models). These models use pango lineage frequency time series as input data to predict future lineage frequency. We trained the models on data collected in the US and the UK before the end of 2022, and tested the model against data collected in 2023. The best model is able to predict the frequency of a newly emerged lineage two months in the future with a high level of accuracy, i.e. mean average error less than 0.4 on a log10 scale. Surprisingly, the model makes predictions at similar accuracy on data collected from other countries where the total number of sequences is greater than 10000) without retraining. We compared our model performance with the NextStrain prediction (based on a multinomial logistic model), and found our model outperformed NextStrain substaintially especially for newly emerged lineages. These results demonstrate machine learning approaches, such as natural language processing models, represent promising new methods utilizing genomic data for SARS-CoV-2 lineage frequency forecasting.

Yinan Feng Emma Goldberg (LANL) Youzuo Lin Ruian Ke (Los Alamos National Laboratory)

There are no materials yet.

31st International Dynamics & Evolution of Human Viruses

Maureen Helinski

Machine learning models for forecasting SARS-CoV-2 lineage frequencies

Squamish, BC, Canada

Speaker

Description

Primary authors

Presentation materials