MCMC Methods for Genetics
This module introduces the use of Markov Chain Monte Carlo (MCMC) methods, using genetic examples — in particular, the problem of estimating population structure from genotype data — to motivate the material. It assumes a solid foundation in basic statistics and the concept of likelihood, and a basic familiarity with the R statistical package.
The course will provide an introduction to likelihood, Bayesian statistics, Monte Carlo, Markov Chains, mixture models and MCMC methods, including both Metropolis-Hasting and Gibbs sampling. Some mathematical detail is given; however, the emphasis is on concepts and practical issues arising in applications. Mathematical ideas are illustrated with simple examples and reinforced with computer practicals using the R statistical language.
Learning Objectives: After attending this module, participants will be able to:
- Derive the (analytic) posterior distribution for a Binomial proportion given a conjugate (Beta) prior.
- Implement a Metropolis-Hastings algorithm to sample from this posterior distribution and check that it matches the analytic form.
- Derive the posterior distribution for cluster memberships given a prior on clusters and a likelihood for each cluster.
- Implement a Gibbs sampler to sample from cluster memberships given data from a mixture of product-Bernoulli distributions.
Course Dates
- Wed June 11, 1:30 p.m. – 5:00 p.m. EST
- Thu June 12, 8:30 a.m. – 5:00 p.m. EST
- Fri June 13, 8:30 a.m. – 5:00 p.m. EST
Instructors
- Matthew Stephens
- Eric Anderson
Suggested Course Pairings
Statistical Methods Stream
- Module INT2: Introduction to Programming in R and Python
- Module QG1: Quantitative Genetics
- Module ST2: Bayesian Statistics
- Module ST3: Regression and Regularization
Course Materials
Please email sisg@biosci.gatech.edu for free access.
About the Instructors
Matthew Stephens is Professor of Statistics and Human Genetics at the University of Chicago. His group develops novel statistical methods for a wide range of applications in genetics, typically adopting Bayesian approaches. He was a developer of STRUCTURE, a widely used program for determining population structure and estimating individual admixture, and proposed the Li and Stephens model for studying linkage disequilibrium. His recent research also encompasses RNAseq analysis, genetic association studies, and conservation genetics. Learn more about Matthew’s work here.
Eric Anderson is a Research Geneticist with the National Oceanic and Atmospheric Adminstration (NOAA) Southwest Fisheries Science Center and holds an affiliate faculty position at Colorado State University. His statistical genetics methodology is focused on the application of molecular technologies to fisheries management and ecological research such as the use of single nucleotide polymorphism (SNP) markers in Pacific salmon fishery management and the application of high throughput sequencing data to study genetic influences on migratory behavior in fish. Access Eric’s publications here.