SISG MODULE 5

Bayesian Statistics

The use of Bayesian methods in genetics has a long history. This introductory module begins by discussing introductory probability. It then describes Bayesian approaches to binomial proportions, multinomial proportions, two-sample comparisons (binomial, Poisson, normal), the linear model, and Monte Carlo methods of summarization. Advanced topics include hierarchical models, generalized linear models, and missing data. Illustrative applications will include: Hardy-Weinberg testing and estimation, detection of allele-specific expression, QTL mapping, testing in genome-wide association studies, mixture models, multiple testing in high throughput genomics.

For students new to genetic analysis, we recommend pairing with Module 1, which introduces basic statistical concepts, or Module 3, which provides experience in R and Python.  More advanced students will find this Module a great introduction to Modules 9 and 13, or to many other Modules in the other streams.

Learning Objectives: After attending this module, participants will be able to: 

  1. Understand the roles in Bayesian analysis of, in principle, priors, likelihood and posterior probabilities, and how these differ from default frequentist approaches.
  1. Apply these principles to simple conjugate analyses, e.g. beta-binomial, Dirichlet-multinomial and Normal-Normal, interpreting their output in analysis of genetic data.
  1. Understand the principles of how numeric calculations are done to evaluate the posterior in low and fixed-dimensional modeling, including MCMC and INLA.
  1. Use standard packages to implement MCMC and INLA-based analysis of low and fixed-dimensional problems, emphasizing their use in genetic analyses. 
  1. Justify Bayesian interpretations for multiple testing and model averaging, with application to high-throughput genetics.

Course Dates
  • Mon June 3, 8:30 a.m. – 5:00 p.m. EST
  • Tue June 4, 8:30 a.m. – 5:00 p.m. EST
  • Wed June 5, 8:30 a.m. – 12:00 p.m. EST
Suggested Course Pairings

Statistical Analysis Stream 

  • Module 1: Probability and Statistics 
  • Module 9:  Statistical Genetics 
  • Module 13:  Multivariate Analysis 
Course Materials

Please email sisg@biosci.gatech.edu for free access.

About the Instructors

Ken Rice is Professor in the Department of Biostatistics at The University of Washington in Seattle, where he leads the Data Coordinating Center and Analysis Committee for the NHLBI’s TOPMed project.  His research focuses primarily on developing and applying statistical methods for complex disease epidemiology, notably cardiovascular disease. Learn more about Ken’s work here.

Zhaohui “Steve” Qin is Professor of Biostatistics and Bioinformatics in the Rollins School of Public Health at Emory University in Atlanta. His recent research is focused on developing Bayesian model-based methods and software to analyze data generated from applications of next-generation sequencing technologies such as ChIP-seq, RNA-seq, Hi-C, WGBS, resequencing. Learn more about Steve’s work here.