
Regression and Regularization
In many genomic applications, issues such as correlations among features (such as linkage disequilibrium, or correlation among genes in expression) add complications to the analysis. The use of a set of related approaches, such as regularization (i.e. lasso, ridge and elastic net), regularizing priors, penalization methods (such as found in modern mixed models) and other constraints imposed during estimation, can help to moderate the impact of these issues, and provide more reasonable estimates of genetic effects.
This module will cover modern approaches whereby the introduction of a small degree of bias, can provide more meaningful estimates of genetic effects. We will begin by introducing classic regularization approaches, specifically lasso (l1 penalty), ridge (l2 penalty), and how to use cross-validation approaches for lasso and ridge type linear models. We will discuss how these methods relate to other optimization of classic objective functions (i.e. maximum likelihood, least squares) and then expand to elastic-net, shrinkage of random effects in mixed models, the use of regularizing priors during Bayesian inference, and the estimation of sparse covariance matrices.
Learning Objectives: After attending this module, participants will be able to:
- Describe the purpose of regularization, and how it compares to other optimization approaches during estimation
- Compare and Select appropriate regularization approaches
- Develop skills (in R) to perform regularization in a number of general genomic contexts
- Perform regularization in the context of GWAS
- Use regularization of estimates and estimating sparse covariance matrices in the context of variation in gene expression
Course Dates
- Wed June 3, 1:30 p.m. – 5:00 p.m. EST
- Thu June 4, 8:30 a.m. – 6:00 p.m. EST
- Fri June 5, 8:30 a.m. – 5:00 p.m. EST
Instructors
- Ian Dworkin
- Arbel Harpak
Suggested Course Pairings
Statistical Methods Stream
- Module ST1: Bayesian Statistics
- Module ST2: Regression and Regularization
- Module QG1: Quantitative Genetics
- Module QG2: Mixed Models
Course Materials
Course materials will be available shortly before the class.
Please email sisg@biosci.gatech.edu if you have questions or would like more details.
About the Instructors

Ian Dworkin is Professor of Biology at McMaster University in Hamilton, Ontario. One of the first ever graduates of SISG, he is excited to now teach a new module focused on advanced regression and regularization methods for genetic research. Ian’s Drosophila evolutionary genomics lab is interested in the causes and consequences of context dependent effects of mutations for genetic analyses, with a special focus on the evolution of robustness. Learn more about Ian’s work here.

Arbel Harpak is Assistant Professor of Integrative Biology at the University of Texas, Austin. His research combines large-scale statistical inference and computational modeling directed at understanding how genotypes map onto phenotypes. He is particularly interested in the complexity of biological systems and the implications for understanding mechanisms of evolution as well as the development and application of polygenic scores in human biomedicine. Learn more about Arbel’s work here.