Jun 19 – 22, 2024
Squamish, BC, Canada
Canada/Pacific timezone
This conference is now SOLD OUT for in-person registration. Virtual registration is still available.

A NEW METHOD FOR FINDING CONSERVED FEATURES IN LOW-VARIABILITY RNA VIRUS GENE ALIGNMENTS, ILLUSTRATED WITH SARS-CoV AND SARS-CoV-2

Not scheduled
20m
Squamish, BC, Canada

Squamish, BC, Canada

Poster Software, tools & methods

Speaker

Jordan Skittrall (University of Cambridge)

Description

One way of finding new features of biological importance in an organism is to find regions where its genetic sequence has unexpectedly high conservation, signifying that deviations from the conserved sequence affect ability to produce viable progeny. Many methods implementing this approach carry an implicit assumption that each locus in a sequence should be treated equally. But, in fact, some loci yield more information than others. For example, in nucleotide-level analysis, differing information content is seen when different levels of constraint are imposed by codon biases for different amino acids in coding regions. Methods that assume equal information still work in organisms with high genetic diversity, but in organisms with low genetic diversity, accounting for information content may be crucial for detecting weak signals of conservation. In newly-emerged pathogens, whose populations have low genetic diversity, early detection of candidate drug targets is crucial for development of therapeutics.

In newly-emerged pathogens, a further problem in sequence analysis is overdispersion of measures of sequence variability, a result of early mutation patterns. In such scenarios, methods with underlying parametric assumptions may be inappropriate; even methods that appeal to the Central Limit Theorem may be unreliable because the central limit is approached very slowly.

We present a method, RNAdescent, that accounts for the variable information obtained as one traverses a viral genome, and that allows a non-parametric approach to finding conservation in coding regions. We apply this method to the large dataset of SARS-CoV-2 genomes (>5 million sequences) to characterise regions of nucleic acid that must remain conserved, such as a packaging signal and regions key to the formation of subgenomic RNAs. We further show that the method is sufficiently sensitive that it can identify some analogous regions in SARS-CoV, despite the much smaller and less varied set of viral sequences available (119 sequences).

Primary author

Jordan Skittrall (University of Cambridge)

Co-authors

Nerea Irigoyen (University of Cambridge) Ian Brierley (University of Cambridge) Julia Gog (University of Cambridge)

Presentation materials

There are no materials yet.