Research Areas

Biostatistics is at the forefront of many of today’s big data challenges and uses rigorous methods to unravel the true story hidden within increasingly complex data. The Division’s primary areas of research expertise are listed by methodology and application areas.

Methodological Research

Computational statistics

As volume and complexity of the data used in public health rapidly increase, more advanced methods in statistical modelling and computation are needed. For example, computer-intensive processing and analysis of high-frequency data require efficient programming approaches; Bayesian statistics are becoming increasingly popular in infectious disease modeling and inference; novel computational methods based on object-oriented and topological data analysis are also being developed for biological signal and network analysis.

Statistical methods for spatially and temporally dependent data

In disease mapping, data tend to be correlated due to spatial and temporal proximity, while in social and neuroimaging networks, sites or regions of interest typically possess high spatial and temporal correlation. Multivariate statistics provide powerful tools for simultaneously analyzing multiple, interdependent outcomes.

Longitudinal and clustered data analysis

Longitudinal data consists measurements collected over time repeatedly on the same subjects or different subjects from the same cluster (e.g. family), which are very common in many public health studies such as cohort data and survey data.

Survival analysis

Survival analysis is particularly designed for analyzing the time-to-event data (e.g., time to disease onset, time to death, time in treatment etc.) when the outcome is partially unobserved due to the loss to follow up or end of study and is powerful in examining the mortality risk over time.

Statistical methods for missing, mismeasured, limited dependent, bounded, and circular data

Missing data are data that were supposed to be collected but that, for some reason, were not. This is a statistical problem, very common in the social, behavioral, epidemiological, and medical sciences, that may introduce bias in the analysis and increase the uncertainty of the estimates. Limited dependent outcomes (e.g., binary, count, categorical), bounded outcomes (e.g., clinical scores, academic grades), and circular outcomes (e.g., protein structures, circadian rhythm, sleep cycles) require ad-hoc analytic approaches.

Nonparametric and semiparametric statistical methods

Non- and semi-parametric methods relax modelling assumptions that often limit the applicability of parametric models. For this reason, non- and semi-parametric models provide flexible analytical tools to discover patterns and associations in complex data. .

Statistical genetics and bioinformatics

Modern biological technologies (such as microarray and next generation sequencing) call for the development of powerful and efficient statistical methods for drawing inferences from high throughput genetic, genomic data generated from public health and biomedical research. The analysis includes family and population based human genetic/genomic data such as single nucleotide polymorphism (SNP), gene expression and protein expression data.

Application Areas

The methodology is widely applied to the following areas.

Spatial data

• Infectious disease mapping
• Disease prevalence forecasting
• Disease cluster detection

Neuroimaging data

Novel statistical frameworks based on object-oriented and topological data analysis tools are being developed for electroencephalography (EEG) and diffusion and functional magnetic resonance imaging (dMRI and fMRI) data in post-stroke aphasia and epilepsy to better understand the neural deficts in these brain network disorders

Genetics/genomics data

In collaboration with medical and scientific researchers at USC, as well as other national and international institutions, faculty are developing procedures for the analysis of genetics/genomics data such as microarray analysis, functional genomics, next-generation sequencing analysis, integrative analysis of omics data, epigenetics and complex bioinformatic approaches (RNA-seq analysis, pathway analysis and differential expression analysis).

High-frequency biometric data

Accelerometry for physical activity and sleep monitoring

Census and survey data

NHANES, UK cohort studies

Cancer surveillance

SEER, South Carolina Cancer Registry data

Electronic medical record (EMR) data

• Data from the SC-RFA Office
• ICD9/10 codes

Arnold School of Public Health

Research Areas

Methodological Research

Application Areas

Challenge the conventional. Create the exceptional. No Limits.