Speaker
Description
Cluster and source attribution methodologies in genomic epidemiology are primarily used to identify pairs or groups of infected individuals sharing close proximity in the chain of transmission. It is generally overlooked that these methodologies also identify some number of individuals who are not linked to anyone else, and that the number of these is informative. If one is to take a random sample of infected individuals and are able to identify who infected who amongst the sample, the number of singletons is positively correlated with the complete size of the infected population. This is similar to the principle underlying the mark-recapture methodology, but where that requires multiple sampling rounds, this needs only one round and the ability to link individuals by transmission.
Here, I show how to properly formalise this insight in order to estimate the complete size of an infected population and/or the number of distinct lineage introductions to that population by using the observed number of pairs or clusters and singletons. This is implemented in a Bayesian importance sampling framework. I outline three potential uses of the procedure, firstly in estimation of those quantities, secondly in determination of the sample size needed to obtain a given number of transmission pairs for a separate analysis, and thirdly to test whether assumptions of random sampling have been violated.
| Expedited Notification | No thanks, I do not require Expedited Notification |
|---|