SISG MODULE 1

Probability and Statistics

This module serves as a general introduction to statistical inference for researchers in the life sciences employing quantitative approaches. It introduces core elements of statistical modeling, including basic probability, common distributions, and how to assess confidence. We use examples from experimental and population data to demonstrate classical hypothesis tests for quantitative and categorical analysis, resampling methods, and multiple testing corrections such as false discovery rate control. This module serves as a foundation for almost all of the later modules.  

Training in calculus is not a prerequisite for this module, but a willingness to attempt math problems and some comfort with basic algebra will be necessary.  Exercises will be conducted using basic R scripts that will be taught in the class. 

Learning Objectives: After attending this module, participants will be able to: 

  1. Describe the assumptions underlying common distributions, e.g., binomial, multinomial, normal.  
  1. Translate scientific questions into appropriate null and alternative hypotheses.  
  1. Explain how the likelihood function can be used for estimation and model selection.  
  1. Define sensitivity, specificity, and predictive values in the context of binary screening, e.g., test for disease.  
  1. Describe the assumptions underlying z-tests, t-tests and chi-square tests and use these tests to statistically compare samples.  
  1. Explain and interpret p-values and confidence intervals.  
  1. Explain the role of computer intensive methods in hypothesis testing and confidence intervals, e.g., bootstrap, jackknife, permutation tests.  
  1. Address the issue of multiple comparisons in hypothesis testing. 
Course Dates
  • Wed May 29, 1:30 p.m. – 5:00 p.m. EST
  • Thu May 30, 8:30 a.m. – 5:00 p.m. EST
  • Fri May 31, 8:30 a.m. – 5:00 p.m. EST
Suggested Course Pairings

Statistical Analysis Stream 

  • Module 5: Bayesian Statistics 
  • Module 9:  Statistical Genetics 
  • Module 13:  Multivariate Analysis 
Course Materials

Please email sisg@biosci.gatech.edu for free access.

About the Instructors

Annalise Paaby is an assistant professor in the School of Biological Sciences at Georgia Tech. Her primary research interests are in the quantitative genetics of evolution and development, using the nematode Ceanorhabditis elegans as a model organism. Learn more about Paaby’s work here.

Toyya Pujol is an Operations Researcher in health and health care at the Rand Corporation in Washington D.C. She uses advanced statistics and machine learning methods to study maternal and mental health using large medical databases, with a special interest in health disparities.