SISG STATISTICAL ANALYSIS MODULE ST4

Multivariate Analysis

This module provides an introduction to multivariate analysis, with a strong emphasis on data visualization by means of multivariate graphics known as biplots. The course covers principal component analysis (PCA), multidimensional scaling (MDS), correspondence analysis (CA), canonical analysis, cluster analysis, discriminant analysis (DA) and some multivariate inference, illustrating these methods with genetic data. Some genetic datasets have a compositional nature, and basic principles of compositional data analysis like log-ratio transformations are considered. The use of multivariate methods for uncovering population substructure and cryptic relatedness is also addressed.

Learning Objectives: After attending this module, participants will be able to: 

  1. Describe the purpose of basic multivariate statistical methods and select appropriate methods fo a given dataset. 
  1. Know how to apply appropriate multivariate transformations. 
  1. Perform multivariate statistical analysis in the R pogramming environment. 
  1. Visualize multivariate data by means of biplot construction and interpret their meaning and goodness of fit.
  1. Carry out basic multivariate hypothesis tests 
  1. State the peculiar nature of compositional data and account for it in analyses.
Course Dates
  • Wed June 10, 1:30 p.m. – 5:00 p.m. EST
  • Thu June 11, 8:30 a.m. – 5:00 p.m. EST
  • Fri June 12, 8:30 a.m. – 5:00 p.m. EST
Instructor
  • Jan Graffelman

Suggested Course Pairings

Quantitative Genetics Stream 

  • Module ST1: Bayesian Statistics 
  • Module ST2:  Regression & Regularization 
  • Module PE3:  AI and ML for Genetics
  • Module QG2: Mixed Models 
Course Materials

Course materials will be available shortly before the class.

Please email sisg@biosci.gatech.edu if you have questions or would like more details.

About the Instructor

Jan Graffelman is Professor of Statistics and Operations Research at the Universitat Politecnica de Catalunya in Barcelona, Spain. He is broadly interested in applied statistics, multivariate analysis, and compositional data analysis. His most recent work explores methodology for compositional statistics in population genetics. His publications can be explored here.