Speaker
Description
Determination of virus sequences from individual virions enables detailed studies of intra-host dynamics, evolution, and transmission. We previously introduced high-throughput single-genome sequencing (HT-SGS) as a method that alleviates throughput-related limits faced by traditional single-genome sequencing approaches. By incorporating molecular barcodes during reverse transcription (RT) of RNA, using long-read sequencing with PacBio, and performing bioinformatic error corrections, the method enables recovery of thousands of single-genome sequences (SGS) per sample. Importantly, SGS retain errors induced by RT during cDNA synthesis (approximately 1 per 10,000 bases), as they cannot be detected at the single-sequence level. To better characterize the properties of RT errors, we performed HT-SGS on 42 reference RNA samples, yielding 6,391 SGS. These data were used to inform a simple latent model of RT errors that was leveraged to call variants – and subsequently, full-length haplotypes – in HT-SGS data within a well-controlled false discovery rate (FDR), which is valuable when assessing 1000s of SGS over hundreds of samples. The optimized workflow was applied to a cohort study of people with HIV (PWH, N = 22) and people without HIV (PWOH, N = 25) that were coinfected with SARS-CoV-2. Up to ~1000 single-genome SARS-CoV-2 Spike gene sequences were recovered in each of 184 longitudinal respiratory samples, corresponding to a total of 831 inferred haplotypes. Intra-host Spike haplotype diversity was significantly higher immediately after COVID-19 symptom onset in people with advanced HIV (defined by peripheral blood CD4 T cell counts <200 cells/uL) compared to PWOH (p = 0.0001); moreover, phylogenetic analyses of Spike haplotypes revealed convergent intra-host evolution for several non-synonymous mutations (G142D, G257S, H655Y, P681H/R) in people with advanced HIV, suggesting positive selection pressures were influencing the population. Taken together, this work improves our understanding and handling of technical errors in HT-SGS and highlights the method’s ability to resolve detailed intra-host virus dynamics.