Jun 19 – 22, 2024
Squamish, BC, Canada
Canada/Pacific timezone
This conference is now SOLD OUT for in-person registration. Virtual registration is still available.

A NOVEL THREE-PRONGED ALIGNMENT-FREE MACHINE LEARNING APPROACH FOR TAXONOMIC CLASSIFICATION: A CASE STUDY OF HUMAN ASTROVIRUSES

Not scheduled
20m
Squamish, BC, Canada

Squamish, BC, Canada

Poster Software, tools & methods

Speaker

Fatemeh Alipour (University of Waterloo)

Description

Astroviruses comprise a genetically diverse viral family linked to diseases in both humans and birds, resulting in substantial health impacts and economic burdens. Traditionally classified into Avastrovirus and Mamastrovirus genera based on host species, next-generation sequencing has revealed broader transmission patterns, necessitating a reevaluation of the current classification approach. In response to these challenges, a novel alignment-free taxonomic classification method is introduced, leveraging whole genome sequence k-mer composition alongside host information. To control the impact of genetic recombination, an optional component for identifying recombinant sequences is incorporated into the method's pipeline. This three-pronged classification approach integrates a supervised machine learning method (support vector machine), an unsupervised machine learning method (K-means++), and consideration of host species. Applying this approach to 191 unclassified astrovirus genomes (with continuous updates for emergent genomes), genus labels are successfully proposed. Additionally, eight genomes displaying incompatibility with reported host species suggest cross-species infections. Notably, the machine learning-based approach, enhanced by principal component analysis (PCA), supports the hypothesis of a human astrovirus (HAstV) subgenus within Mamastrovirus and a goose astrovirus (GoAstV) subgenus within Avastrovirus due to the differences in their genome compositions. In summary, this multipronged machine learning approach offers a rapid, reliable, and scalable method for predicting taxonomic labels. It addresses the challenges posed by emerging viruses and the exponential growth in current genome sequencing output, and facilitates viral taxonomy research.

Primary authors

Fatemeh Alipour (University of Waterloo) Connor Holmes (University of Western Ontario) Joseph Butler (University of Western Ontario) Prof. Yang Young Lu (University of Waterloo) Prof. Kathleen A. Hill (University of Western Ontario) Prof. Lila Kari (University of Waterloo)

Presentation materials

There are no materials yet.