Speaker
Description
Hepatitis C virus (HCV), which affects about 58 million people, is classified into 8 genotypes and >90 subtypes. Genotypes differ at >30-35% of nucleotide positions while subtypes within a genotype differ at 15-25% of positions. While most subtypes can be cured by direct-acting antivirals, some are resistant to treatment. The Antiviral Unit at the UK Health Security Agency established a national genomic surveillance program to characterize by whole genome sequencing HCV samples difficult to subtype using standard methods. We developed a novel method to categorize these samples as belonging to a known subtype, a new subtype of a known genotype, or a new genotype. It requires reference sequences from known (pre-existing) subtypes; we used 238 International Committee on Taxonomy of Viruses HCV reference sequences. By comparison to intra- and inter-genotype reference pairs, we first identify whether, for sliding windows (500 nucleotides) across the genome, the sequence of interest is confidently more similar by genetic distance to the nearest reference genotype relative to other genotypes. Then we compare the percentage of windows with uncertain genotype classification for the genome of interest to that of reference genomes where we assume the genotype is new (IQR: 75-92%) or pre-existing (IQR: 0-51%). If the genotype is pre-existing, we determine for each window whether the distance between the genome of interest and the most closely related reference sequence falls within the distance distribution observed for intra- or inter-subtype reference pairs. Applying this method to three sequences unclassifiable by standard bioinformatic tools identified two new genotype 8 subtypes with >85% of their genomes having genetic distances to genotype 8 sequences within the typical inter-subtype distance range, and potentially a new genotype that has uncertain genotype classification across 94% of its genome. The method developed here can be applied to subtype and genotype classification of other viruses.