May 6 – 9, 2025
Abbaye de Royaumont, Asnières-sur-Oise, France
Europe/Paris timezone

SHUMI: Simulating High-Throughput UMI-based Single-Genome Amplification and Sequencing Data

Not scheduled
20m
Abbaye de Royaumont, Asnières-sur-Oise, France

Abbaye de Royaumont, Asnières-sur-Oise, France

Abbaye de Royaumont, 95270 Asnières-sur-Oise, France
Poster Software, tools & methods Virtual posters

Speaker

Pierce Radecki (Vaccine Research Center, National Institute of Allergy and Infectious Diseases, National Institutes of Health)

Description

High-Throughput Single-Genome Amplification and Sequencing (HT-SGS) enables detailed measurements of intra-host virus genotypes via barcoding of individual virus genomes during reverse transcription (RT) followed by PCR amplification, sequencing, and bioinformatic error correction procedures. However, the absence of “ground-truth” RNA reference samples makes it challenging to evaluate bioinformatics analyses of HT-SGS data. To support experimental design and bioinformatic method development and validation, we present SHUMI, a pipeline for simulating sequencing data from HT-SGS experiments. SHUMI enables users to generate data from experiments on arbitrary virus populations and utilizing any Unique Molecular Identifier (UMI) scheme. It offers flexibility in controlling key variables, including reverse transcription (RT) and PCR error rates, PCR efficiency and cycle count, RT and PCR recombination rates, and sequencing movie time. We demonstrate that the pipeline produces datasets that closely resemble real experimental data in several key quantitative metrics: overall UMI bin error rates, recombination rates in obtained reads, and mean and variance of observed single-copy UMI bin sizes. Consequently, the pipeline facilitates the assessment of differing UMI schemes and error correction methods on SGS recovery and accuracy. As a valuable resource for designing and optimizing HT-SGS analysis pipelines, this tool aids researchers in developing robust sequencing protocols and validating bioinformatic analyses, thereby advancing the field of single-genome genomics.

Expedited Notification No thanks, I do not require Expedited Notification

Primary authors

Pierce Radecki (Vaccine Research Center, National Institute of Allergy and Infectious Diseases, National Institutes of Health) Sung Hee Ko (Vaccine Research Center, National Institute of Allergy and Infectious Diseases, National Institutes of Health) Eli Boritz (Vaccine Research Center, National Institute of Allergy and Infectious Diseases, National Institutes of Health)

Presentation materials