SISG STATISTICAL ANALYSIS MODULE ST3

Regression and Regularization

In many genomic applications, issues such as correlations among features (such as linkage disequilibrium, or correlation among genes in expression) add complications to the analysis. The use of a set of related approaches, such as regularization (i.e. lasso, ridge and elastic net), regularizing priors, penalization methods (such as found in modern mixed models) and other constraints imposed during estimation, can help to moderate the impact of these issues, and provide more reasonable estimates of genetic effects.

This module will cover modern approaches whereby the introduction of a small degree of bias, can provide more meaningful estimates of genetic effects. We will begin by introducing classic regularization approaches, specifically lasso (l1 penalty), ridge (l2 penalty), and how to use cross-validation approaches for lasso and ridge type linear models.  We will discuss how these methods relate to other optimization of classic objective functions (i.e. maximum likelihood, least squares) and then expand to elastic-net, shrinkage of random effects in mixed models, the use of regularizing priors during Bayesian inference, and the estimation of sparse covariance matrices.

Learning Objectives: After attending this module, participants will be able to: 

  1. Describe the purpose of regularization, and how it compares to other optimization approaches during estimation
  2. Compare and Select appropriate regularization approaches
  3. Develop skills (in R) to perform regularization in a number of general genomic contexts
  4. Perform regularization in the context of GWAS
  5. Use regularization of estimates and estimating sparse covariance matrices in the context of variation in gene expression
Course Dates
  • Mon June 9, 8:30 a.m. – 5:00 p.m. EST
  • Tue June 10, 8:30 a.m. – 5:00 p.m. EST
  • Wed June 11, 8:30 a.m. – 12:00 p.m. EST
Suggested Course Pairings

Statistical Methods Stream 

  • Module INT2: Introduction to Programming in R and Python 
  • Module ST2:  Bayesian Statistics 
  • Module QG2:  Mixed Models
  • Module ST4: MCMC for Genetics
Course Materials

Please email sisg@biosci.gatech.edu for free access.

About the Instructor

Ian Dworkin is Professor of Biology at McMaster University in Hamilton, Ontario.  One of the first ever graduates of SISG, he is excited to now teach a new module focused on advanced regression and regularization methods for genetic research. Ian’s Drosophila evolutionary genomics lab is interested in the causes and consequences of context dependent effects of mutations for genetic analyses, with a special focus on the evolution of robustness. Learn more about Ian’s work here.