March 12, 2020 | Erin Bluvas, email@example.com
Guoshuai Cai was studying microbiology as a master’s student at Wuhan University when he first encountered the emerging field of bioinformatics. It was 2007, and bioinformatics – the development of software tools for understanding complex biological data – was riding a wave of explosive growth.
“This growth was largely driven by the Human Genome Project and the emerging of next-generation sequencing technology,” Cai says. “Scientists began to intensively use DNA sequencing technology to increase what we know about the biology and diseases from multi-level omics, including genome, methylome, transcriptome, proteome, metabolome and others. The knowledge expansion is dramatic.”
Cai’s introduction to the interdisciplinary field resulted from a global collaboration with a researcher at the University of Minnesota. Working on the miRecords project, he learned how to build websites from scratch – including the design of the database as well as the user interface. The miRecords platform was created to enable researchers to analyze human and animal miRNA-target interactions. It also sparked an interest in Cai that would shape the rest of his career.
“Many variables, such as environmental exposures, affect the interaction of these components and lead to or partially contributed to health problems,” Cai says. “miRecords identifies and categories miRNA-target interactions, which helps us to elucidate the pathogenesis and predict what will happen with different environmental exposures.”
After his 2009 graduation, Cai moved to the United States to continue studying bioinformatics. He enrolled in the Ph.D. in Biomathematics and Biostatistics program at The University of Texas Health Science Center at Houston and MD Anderson Cancer Center, adding another layer to his expertise in bioinformatics and its applications.
“Bioinformatics enables us to use methods and techniques to find patterns from big data of biomedicine and public health in large-scale and high-dimension,” Cai says. “We develop methods and apply them to analyze the large amount of omics data – which is now so large-scaled – and none of this was possible before we had this level of computational and statistical power.”
In Houston, Cai worked on The Cancer Genome Atlas data, developing methods for analyzing six levels of data for more than 10,000 patients with 31 types of cancer. By integrating the DNA methylation and mRNA expression data, he proposed a novel subtyping method for improving precision medicine in breast cancer. Following his graduation, he accepted a three-year fellowship at Dartmouth College where he continued building his skill set and developed expertise in scleroderma, a rare autoimmune disease. Using Machine Learning techniques, he identified biomarkers to provide an accurate, objective and easy-to-apply measurement strategy to measure the scleroderma severity
Though the bioinformatics field has changed rapidly over the past few decades, it is still in its infancy in terms of the potential it has to unlock answers to scientists’ biological questions, especially in systematically understanding the health effects from environmental exposures. Cai joined the Arnold School’s Department of Environmental Health Sciences (ENHS) to contribute to this important task.
Working with environmental health scientists, such as the researchers in the South Carolina SmartState Center for Environmental Nanoscience and Risk (CENR) and the Center for Oceans and Human Health and Climate Change Interactions (OHHC2I), Cai can help answer questions related to how nanoparticles, toxins, and other exposures affect human and environmental health. Currently, he is taking the next step in transcriptome data analysis from bulk to single cell by applying a new technique that enables biologists and toxicologist to profile single-cell omics.
“What this bioinformatics approach pioneered by Dr. Cai and others around the world does is help us transition from molecular biomarkers of environmental exposure to predicting disease outcomes from those exposures,” says ENHS Clinical Professor and Chair Geoff Scott.
“Ten years ago, we could pull biomolecules from a tumor biopsy,” he says. “Now we can dissect each single cell from this bulk of tissue – resulting in much high-resolution data, analysis and understanding.” Cai’s goal is to both advance the field through developing databases/interfaces and to help close the gap between bioinformatics scientists like himself and the biologists who are conducting research.
“Biologists are generating lots of data through their scientific research, and we need to work closely together to help ensure that they can fully take advantage of what bioinformatics has to offer,” Cai says. “For example, I developed a user-friendly web server to provide an comprehensive platform to share, analyze, visualize and interpret scRNA-seq data, which enables efficient communication between us.”
The tool he is talking about is the Single Cell Transcriptomics Annotated Viewer (SCANNER). Applying it and bulk and single-cell transcriptomics, he recently studied lung cell type-specific gene expression of ACE2, which is the receptor of the new 2019 novel coronavirus (COVID-19). The study indicates that smokers especially former smokers may be more susceptible to 2019-nCov with higher ACE2 expression and have infection paths different with non-smokers. In the current severe global emergency situation of COVID-19 outbreak, this study provides valuable information in identifying susceptible populations and understanding the pathologies.
Cai has been harnessing the power of bioinformatics for a decade now, and he has no plans to stop. There are always new methods to develop and new applications to employ, and he will continue and enjoy exploring these avenues as long as there are more questions to answer.