We need to be cautious when interpreting phylogenetic studies from COVID-19 genomes

Coronaviruses, such as the newly-discovered SARS-CoV-2, are RNA viruses that have a single short RNA strand consisting of 30,000 letters composed from either ‘A’, ‘C’, ‘G’ and ‘T’ - which provides the genetic instructions for the virus to replicate.

As the virus spreads, its genetic information, or genome, randomly changes a few letters at a time (referred to as mutation). These changes can help us to track the origin, spread and transmission of SARS-CoV-2 around the world.

One way in which we compare genomes is to use phylogenetics which allows us to assess which genomes are most closely related to each other and which ones are different from each other. While the technical jargon of phylogenetics can be confusing it is helpful to think of a phylogenetic tree as a “family tree” in which the relationships between family members are depicted. For instance, your closet relative may be a sibling who shares a common ancestor (the parents) to the exclusion of other individuals. Tracing our family lineage back further can reveal four grandparents and eight great-grandparents and so on. Similarly, with viruses we can reconstruct the evolutionary history between genomes.

Using genome sequencing and phylogenetics as tools in the current pandemic has allowed us to examine when the virus emerged in the human population and where this virus came from. As of now, we know that this new coronavirus is most similar to viruses that naturally infect horseshoe bats – 96% of the genome sequence is identical – and that early cases were associated with a wet food market in Wuhan.

As the current pandemic unfolded parts of the coronavirus genome accumulated many mutations (an average of about two mutations per month). These subtle shifts in the virus’s genetic code have allowed us to track cases and illuminate how the virus has spread globally. Given the small number of genetic differences between the first novel coronavirus sequenced from Wuhan and those currently circulating, it can be challenging to compare genomes via phylogenetics as a lot can rest on a single difference in that sequence of 30,000 letters.

Sampling matters – there are many places in the world which are very poorly sampled and even of those well sampled places there may be a lot of asymptomatic individuals who are not sampled. Although the phylogenetic tree may suggest a connection there are so many missing pieces in the transmission puzzle that there could be other explanations of what happened. Thus, any evidence of epidemiological linkage, sampling uncertainty and other sources of bias need to be carefully considered and reported alongside any phylogenetic interpretation of genomes.

While genetic data can provide clues about the transmission chain of events definitively, proving transmission is a lot more difficult even with infection from a genetically identical SARS-CoV-2 virus.

The pace of SARS CoV-2 genome data generation is unprecedented, now with over 30,000 genomes available. But these still only represent a tiny fraction of COVID-19 cases worldwide. With this limitation we need to be cautious before we jump to conclusions and be responsible in our scientific communication.

Phylogenetic comparison of virus genomes can be used to track outbreaks in communities, hospitals and other care settings and can assist public health authorities in understanding how the virus is spreading. However, they can be easily over-interpreted in the current COVID-19 pandemic so we need to avoid erroneous claims by integrating our interpretations with in-depth epidemiological data and we must do so in a manner that safeguards the privacy of infected individuals.


Ch. Julián Villabona-Arenas, William P. Hanage, Damien C. Tully. Phylogenetic interpretation during outbreaks requires cautionNature Microbiology (2020). DOI: 10.1038/s41564-020-0738-5

COVID-19 Response Fund

There cannot be any complacency as to the need for global action.

With your help, we can plug critical gaps in the understanding of COVID-19. This will support global response efforts and help to save lives around the world.