Speaker
Description
Since the coining of the tern phylodynamics the use of phylogenies to learn epidemiological information has steadily increased. As methods have proliferated and grown more computationally expensive the epidemiological information they extract has also evolved to better compliment what can be learned through traditional epidemiological data. However, for genomic epidemiology to continue to grow, and for the accumulating number of pathogen genome sequences to fulfill their potential utility, methodological advances are required to make the extraction of epidemiological information from phylogenies simpler and more efficient. Summary statistics provide a straightforward way of extracting information from a phylogenetic tree, but the relationship between these statistics and epidemiological quantities needs to be better understood.
In this work we begin addressing this need through a simulation study. Using two different models, we evaluate a large number of tree summary statistics and their relationship to epidemiological parameters. We not only look at what information can be inferred from each summary statistic, but also at the computational cost of each summary statistic to allow us to optimize the choice of summary statistics. Lastly, we evaluate using summary statistics as calibration targets, and show that they provide an advantage over only relying on traditional epidemiological targets such as an incidence curve.