Speaker
Description
Background: By using genomic sequences, molecular epidemiology has the potential to reconstruct HIV transmission networks, provide insights into disease transmission dynamics, and inform public health strategies. It is believed that long sequences, such as near full-length HIV genome sequences, can improve the accuracy of phylogenetic inference. However, relatively short pol sequences are still broadly used for inferring molecular HIV clusters due to their availability through routine drug resistance testing. Whether a mix of long and short HIV-1 sequences can improve phylogenetic inference of molecular HIV clusters remains unanswered.
Methods: We investigated whether T-shaped alignments, a combination of short pol and near full-length HIV-1 genome sequences, may improve phylogenetic reconstruction of transmission networks. Under the assumption that clustering derived from 100% of long sequences would be the most accurate, we varied the proportion of long and short sequences in our T-shaped alignments. We used HIV-1 whole genome sequences from the Los Alamos National Laboratory Database and systematically masked all non-pol regions with missing characters in proportional increments of 10%, inferred clusters from each dataset, and compared clusters across datasets.
Results: Using 1196 HIV-1 subtype B sequences, we found that adding long sequences to the pol alignment does not necessarily increase phylogeny and cluster inference accuracy. A 50% threshold of long sequences was critical; after it was reached, the phylogenetic tree and cluster inference gradually trended towards the 100% of long sequences. Stringent bootstrap thresholds decreased the gap in clustering accuracy.
Conclusions: Our results suggest that using T-shape alignments with varied sequence lengths should be approached cautiously. Only a majority of long sequences in the alignment is able to improve the outcome of phylogenetic reconstruction. Bootstrap thresholds should be selected carefully based on the goals of cluster inference. The T-shape alignment provides a straightforward method for utilizing all available sequences to improve phylogenetic analysis.