Molecular Infectious Disease Epidemiology: Why and How
It is clear that whole-genome sequences from pathogens are a promising new source of data in infectious disease epidemiology, but it less clear how they will be used and to what end. It is important to distinguish between two scales of analysis: In phylodynamics, a sparse sample of pathogen sequences is used to reconstruct the large-scale spread of disease. In molecular infectious disease epidemiology, a dense sample of pathogen sequences is used to reconstruct who-infected-whom in a cohort followed over time. This is useful because it allows more precise estimation of covariate associations with infectiousness and susceptibility. Unlike phylodynamics, it requires data on both infected individuals and individuals who were exposed to infection but escaped. These data are analyzed using pairwise survival analysis. Each pair consists of an infectious individual A and a susceptible individual B who is at risk of infectious contact from A, which is a contact sufficient to infect B if he or she is susceptible at the time. The failure time is the interval between the onset of infectiousness in A and infectious contact from A to B, which we call the contact interval. When who-infects-whom is observed, standard methods from survival analysis can be used to estimate transmission probabilities or hazard ratios. When who-infects-whom is not observed, estimation is based on sums or averages over possible transmission trees. Genetic sequence data improves precision by greatly reducing the number of possibilities.