Jun 19 – 22, 2024
Squamish, BC, Canada
Canada/Pacific timezone
This conference is now SOLD OUT for in-person registration. Virtual registration is still available.

CHARACTERISATION OF NOVEL VIRUSES USING PROTEIN LANGUAGE MODELS

Not scheduled
20m
Squamish, BC, Canada

Squamish, BC, Canada

Poster Genomics & bioinformatics

Speaker

Kieran D. Lamb (University of Glasgow MRC Centre for Virus Research)

Description

The SARS-CoV-2 pandemic was an important example of how the spillover of a novel virus can go from a localised outbreak to a global pandemic in weeks. In the early stages of an outbreak, information is scarce and what's available can be exceedingly valuable. Experimental data is time-consuming to produce and is often not available until much later stages of an outbreak

Protein language models (PLMs) like ESM-2 utilise millions of protein sequences to develop an understanding of the properties of amino acid sequences. Recently, ESM-2 was shown to fold protein sequences into their 3D structure, with no alignment or other information about the sequence necessary. As such, it is clear that the embeddings PLMs produce contain information about protein structure and evolutionary constraints.

Here, we describe how ESM-2 can be used to represent meaningful information about a novel virus based on single sequences at various stages of an outbreak. We show using the SARS-CoV-2 pandemic how these models could have been applied; both in the early stages before experimental observations and in later stages for monitoring and horizon scanning. We show how the model outputs can supplement the available information and make a case for their future application and use in outbreaks or pandemics to come.

Primary authors

Presentation materials

There are no materials yet.