The coronavirus genome is like a shipping label that lets epidemiologists track where it’s been
April 27, 2020 | Bert Ely and Taylor Carter
As the coronavirus threatens health and upends daily life throughout the world, UofSC Today is turning to our faculty to help us make sense of it all. In this article originally published by The Conversation, biological sciences professor Bert Ely and doctoral student Taylor Carter discuss how the virus's genetic sequence can provide insight into where it has been.
Following the coronavirus’s spread through the population – and anticipating its next move – is an important part of the public health response to the new disease, especially since containment is our only defense so far.
Just looking at an infected person doesn’t tell you where their version of the coronavirus came from, and SARS-CoV-2 doesn’t have a bar code you can scan to allow you to track its travel history. However, its genetic sequence is almost as good for providing some insight into where the virus has been.
An organism’s genome is its complete genetic instructions. You can think of a genome as a book, containing words made up of letters. Each “letter” in the genome is a molecule called a nucleotide – in shorthand, an A, G, C, T or U.
Mutations can occur every time the virus replicates its genome, so that over time mutations accumulate in the viral genome. For example, in place of the “word” CAT, the new virus has GAT. The virus carries these minor modifications as it moves from one person to the next host.
These mutations behave like a passport stamp. No matter where you go next, previous stamps in your passport still show where you’ve been.
Molecular geneticists like us can use this information to construct family trees for the coronavirus. That allows us to trace the routes the virus has traveled through space and time and start to answer questions like how quickly and easily does it spread from one person to another?
Individual patient data help paint a big picture
Online databases have been collecting SARS-CoV-2 genomic nucleotide sequences since mid-December. Whenever a patient tests positive for SARS-CoV-2, a lab can determine the genome sequence of the infecting virus and upload it. As of late April, more than 1,500 genome sequence samples have been deposited in GenBank, a publicly available database run by the National Institutes of Health, and more than 3,000 are in GISAID, the open-access Global Initiative on Sharing All Influenza Data.
Since each sequence is from a patient who is in a specific place in the world, these viral genome sequences allow scientists to compare them and track where the virus has been. The more similar the sequences from two particular viruses are, the more closely related they are and the more recently they’ve shared a common ancestor. The first SARS-CoV-2 genomic sequence uploaded to the GISAID’s website was collected from a patient in early December 2019.
Of course, the viral mutations themselves do not tell researchers which country they happened in. But since the databases record where particular patterns of mutations have been observed, scientists can determine the route that each viral strain has taken. The global map tracks the movement of the virus around the world.
The data recorded from thousands of patients show that SARS-Cov-2 originated in Wuhan China and spread from there to the rest of the world.