Jun 19 – 22, 2024
Squamish, BC, Canada
Canada/Pacific timezone
This conference is now SOLD OUT for in-person registration. Virtual registration is still available.

A NOVEL ALGORITHM FOR HIGHLY ACCURATE AND EFFICIENT HIV BIOINFORMATICS FROM RAW READ DATA VIA LOCATED K-MERS

Not scheduled
20m
Squamish, BC, Canada

Squamish, BC, Canada

Poster Genomics & bioinformatics

Speaker

Dr Nicholas Grayson (Wellcome Centre for Human Genetics, Nuffield Department of Medicine, University of Oxford, UK.)

Description

Introduction:

Since 2015, the veSEQ method, a bait-capture metagenomic sequencing technique, has advanced HIV genomic research via the PANGEA consortium and is being adopted in Uganda, Botswana, and Zambia. Notable for its affordability and high-throughput capability in sequencing diverse genomes, veSEQ shows promise for HIV drug resistance monitoring and pathogen studies. High computational demands for metagenomic data processing pose challenges, particularly in low and middle-income countries (LMICs). Here, we present a k-mer based approach that innovatively minimizes the computational complexity and the resulting financial cost.

Methods:

Our Located k-mer Assembler (LKA) begins with a local alignment of reads against a large reference HIV genome set, including HXB2. LKA's key innovation is in tracking the k-mer composition of reads and the position of each k-mer relative to HXB2 to allow for easy k-mer and therefore read comparison. This is especially useful for decontamination, a process which has quadratic complexity (O(N^2)) as more samples are compared. This decontamination is essential for quality assurance in clinical applications like drug resistance monitoring.

Results:

LKA accurately detects drug resistance by analyzing amino acid sequence abundance from aligned short k-mers. Decontamination of a sample can be achieved in minutes whereas naively comparing the billions of reads from an Illumina sequencing run would take years. Additionally, LKA streamlines De Bruijn graph generation for genome assembly, prioritizing the most frequent k-mers.

Conclusion:

LKA, integral to AMPHEUS, our solar-powered lab and genome sequencing platform in Zambia, marks a significant step towards affordable and rapid clinical and public health applications in LMICs. The k-merization concept not only simplifies bioinformatics workflows but also extends to genomic studies of other pathogens.

Primary authors

Mx AMPHEUS & PANGEA Consortiums Ms Anna Jeffreys (Wellcome Centre for Human Genetics, Nuffield Department of Medicine, University of Oxford, UK.) Prof. Christophe Fraser (Pandemic Science Institute, Nuffield Department of Medicine, University of Oxford, UK.) Dr David Bonsall (Wellcome Centre for Human Genetics, Nuffield Department of Medicine, University of Oxford, UK.) Mr George MacIntyre-Cockett (Wellcome Centre for Human Genetics, Nuffield Department of Medicine, University of Oxford, UK.) Prof. Helen Ayles (Zambart, School of Public health, University of Zambia, Lusaka, Zambia. Department of Clinical Research, London School of Hygiene and Tropical Medicine, London, UK.) Dr Iain Baudi (Wellcome Centre for Human Genetics, Nuffield Department of Medicine, University of Oxford, UK.) Ms Laura Thomson (Big Data Institute, Nuffield Department of Medicine, University of Oxford, UK.) Dr Lucie Abeler-Dörner (Pandemic Science Institute, Nuffield Department of Medicine, University of Oxford, UK.) Dr Nicholas Grayson (Wellcome Centre for Human Genetics, Nuffield Department of Medicine, University of Oxford, UK.) Dr Sandra Chaudron

Presentation materials