Short Course

A short course entitled "Introduction to large-scale genetic association studies" will be taught by Thomas Lumley and held on Sunday, June 15, 2014, from 9 am to 5 pm.  

Fees for the short course include handouts, lunch, and morning and afternoon refreshments.


Annual Meeting of the Western North American Region of the International Biometric Society
University of Hawai‘i at Mānoa, Honolulu

One Day Course on
Introduction to large-scale genetic association studies
June 15, 2014

Thomas Lumley
Professor of Biostatistics, Dept. of Statistics, University of Auckland
Affiliate Professor, Dept. of Biostatistics, University of Washington

It is now feasible to measure hundreds of thousands of genetic variants on enough individuals for population-based epidemiology, and is becoming feasible to do the same for all or part of the complete DNA sequence.  This course will cover the statistics and some of the computing needed to analyse large-scale genetic data on large numbers of unrelated individuals.  Apart from a few brief mentions, it will not cover family-based studies or organisms other than humans, and will assume raw measurements have already been turned into genotypes. This course will include hands-on exercises and participants are requested to bring a laptop.

Thomas Lumley is Professor of Biostatistics at the University of Auckland and a Fellow of the American Statistical Association. Since 2008, Thomas has been a member of the Analysis Committee of the CHARGE (Cohorts for Heart and Aging Research in Genomic Epidemiology) Consortium.  CHARGE has published more than 250 papers on genome-wide SNP analysis and, more recently, DNA sequencing.  Thomas also has interests in semiparametrics, decision theory, clinical trials, and statistical computing. He has given popular short courses on statistical computing in 10 time zones, often as part of the University of Washington Summer Institute in Statistical Genetics; Honolulu will be number 11.

Outline of the short course:

1. In which we meet DNA and learn how it is measured

DNA is very stable and is the same in essentially all cells in the body, so blood samples taken in the 1990s can easily be used to study disease and biology today (unlike RNA, protein, methylation, etc). How DNA varies, and how these variants are measured.  What goes wrong with the measurements. 

2. In which we perform simple tests many times

The basic genome-wide association study consists of millions of simple linear or logistic regressions.  The multiplicity leads to the resurrection of statistical issues that had previously been dismissed or settled.

3. In which we go to the library

Annotation is the process of working out that your association between TLA levels and genotype at  rs5551212 is uninteresting because that variant is in the TLA kinase gene and we already know about it.  Or, occasionally, not.

4. In which we may not be broken down by age and sex, but do discriminate based on ancestry

Confounding works differently in genetics, because your genome gets fixed before you are born.  How model selection for main effects and interactions works in this context.  How confounding by ancestry can cause problems, and what can be done about it.

5. In which we make up data

You have measured a million SNPs, but that's a tiny fraction of all the known ones. How we impute unmeasured SNPs.

6. In which we have friends

Genetic effects tend to be really, really small. A single cohort study typically isn't big enough to be useful, so we need to work together in larger consortia.  Combining results without sharing individual data is important. So is playing nicely with the other children.

7.  In which we do not take one thing at a time

In principle, a meaningful genetic difference could involve multiple SNPs in a complicated non-additive way. How might we tell?

Also, when looking at very rare genetic variants there's no real point studying them individually. How can we group them to increase power?

 8. In which we worry about the future

If there is still time and energy, a brief lecture on some things we might have in the future, such as molecular haplotyping and reliable functional annotation.


09:00 Overview of the course.
09:15 Module 1. In which we meet DNA and learn how it is measured
  Module 2. In which we perform simple tests many times
  Module 3. In which we go to the library
  Module 4. In which we may not be broken down by age and sex, but do not discriminate based on ancestry
12:30 Lunch break
14:00 Module 5. In which we make up data
  Module 6. In which we have friends
  Module 7. In which we do not take one thing at a time
  Module 8. In which we worry about the future
17:00 End of course
  *morning snack will be from 10:15am and afternoon snack will be from 3:30pm to 3:45pm