Speaker
Description
Using the NIH ACTIV-TRACE initiative, we have an automated database of mutations of all publicly deposited SARS-CoV-2 raw sequences on NCBI’s short read archive (SRA). Our project delved into exploring the minor variants, those that occupy each position at a minority frequency per sample, in the VCF files that catalog each sample’s mutations. While minority variants are considered a proxy of intrahost diversity, their existence at low minor allele frequency can be confounded as recurring mutations that are potentially artifactual. Mechanisms include sample mixing contamination or systematic molecular interactions. Scanning through 1.5M samples with minor mutations above a minor allele frequency of 15%, we asked whether minority variants identified were sequencing center-specific or method-specific (library preparation or long/short read sequencing), indicating whether these mutations were unlikely due to true homoplasy. We determined genomic hot-spots of variability by mapping mutations in 2D predicted RNA folding or primer binding sites. Through our study, we specify guidelines to contextualize minor variants found in future retrospective single-center genomic studies.