Speaker
Description
Genomic surveillance of poliovirus will prove critical to the next phase of the eradication effort and relies on fast and accurate viral detection. With direct sequencing methods, the time from sample to results has been drastically reduced in relation to previous approaches. The piranha software developed within our group provides a framework for analysing high throughput sequencing data from this method and gives clear and actionable results. While between-subtype resolution is easily achieved by the existing bioinformatics workflow, long reads such as those produced by ONT platforms also give the opportunity for more fine grain analysis of variation within subtypes, for example in cases of mixed infection or within environmental samples. This has particular relevance to surveillance in the detection of circulating vaccine-derived viruses in a background of the vaccine strain where a handful of mutations may distinguish between the strains. Here we present a pipeline to resolve mixed populations of polio reads that exploits existing methods for phasing polyploid genomes. We successfully resolve in silico mixtures of simulated and real ONT reads belonging to the same polio subtype into the expected haplotypes at the expected proportions. We see drops in accuracy when attempting to resolve real mixtures that may be caused by errors introduced during amplification and sequencing. Long read technologies are opening up more opportunities to fully characterise viral populations and gain insight into variation within samples. While we see promise in this method, caution is warranted depending on the type of input data that variation is not over-estimated due to error.