Identifying differentially expressed genes using linear models (part 2, factorial designs)

Overview

Teaching: 20 min
Exercises: 20 min
Learning Objectives
  • Be able to use limma to identify differentially expressed genes.

  • Understand the formula class of objects in R, and use it to specify the appropriate model for linear modeling.

Experimental designs with more than one covariate

GSE66417 is an example of a factorial experimental design, in which two covariates are varied in a single experiment. In this case, the cell type is (Lymphoma or CTL cells) and the treatment is varied (control or Ixozamib). This is called a 2x2 factorial design.

In order to model the expression of each gene, we can model the group mean expression under each condition. We have four conditions, so the model is something like

\[Y = \beta_1 X_1 + \beta_2 X_2 + \beta_3 X_3 + \beta_4 X_4 + \epsilon\]

where each value $\beta$ represents a particular condition. We would like to identify genes where othe contrasts represent changes between different groups, i.e., the effect of drug treatment in each cell type. In order to do that, we need to specify contrasts explicitly. One way to do this is to create a “dummy” variable that represents the four groups.

library(dplyr)
pd <- pData(gse66417_eset)
pd <- rename(pd,cell_type="cell type:ch1",treatment="treatment:ch1")
pd$treatment <- as.factor(pd$treatment)
levels(pd$treatment) <- c("Ixazomib","Control")
pd$group <- as.factor(paste(pd$cell_type,pd$treatment))
levels(pd$group) <- c("Hodgkins.Control","Hodgkins.Ixazomib","TCL.Control","TCL.Ixazomib")
  cell_type treatment group
GSM1622170 T-Cell Lymphoma Control TCL.Control
GSM1622189 T-Cell Lymphoma Control TCL.Control
GSM1622191 T-Cell Lymphoma Control TCL.Control
GSM1622194 T-Cell Lymphoma Ixazomib TCL.Ixazomib
GSM1622196 T-Cell Lymphoma Ixazomib TCL.Ixazomib
GSM1622198 T-Cell Lymphoma Ixazomib TCL.Ixazomib
GSM1622200 Hodgkin Lymphoma Control Hodgkins.Control
GSM1622202 Hodgkin Lymphoma Control Hodgkins.Control
GSM1622204 Hodgkin Lymphoma Control Hodgkins.Control
GSM1622206 Hodgkin Lymphoma Ixazomib Hodgkins.Ixazomib
GSM1622209 Hodgkin Lymphoma Ixazomib Hodgkins.Ixazomib
GSM1622211 Hodgkin Lymphoma Ixazomib Hodgkins.Ixazomib

Now we can create a design representing the different groups

design <- model.matrix(~ 0 + pd$group)
colnames(design) <- levels(pd$group)
design
   Hodgkins.Control Hodgkins.Ixazomib TCL.Control TCL.Ixazomib
1                 0                 0           1            0
2                 0                 0           1            0
3                 0                 0           1            0
4                 0                 0           0            1
5                 0                 0           0            1
6                 0                 0           0            1
7                 1                 0           0            0
8                 1                 0           0            0
9                 1                 0           0            0
10                0                 1           0            0
11                0                 1           0            0
12                0                 1           0            0
attr(,"assign")
[1] 1 1 1 1
attr(,"contrasts")
attr(,"contrasts")$`pd$group`
[1] "contr.treatment"

Our contrasts can be formed by the usual makeContrasts() function, but we can easily specify five contrasts that might be interesting.

contrasts_matrix <- makeContrasts(drug_in_hodgkins=Hodgkins.Ixazomib - Hodgkins.Control,
              drug_in_TCL=TCL.Ixazomib - TCL.Control,
              cell_in_control=Hodgkins.Control - TCL.Control,
              cell_w_drug=Hodgkins.Ixazomib - TCL.Ixazomib,
              interaction=(Hodgkins.Control - TCL.Control) - (Hodgkins.Ixazomib - TCL.Ixazomib),
              levels=design)
kable(contrasts_matrix)
  drug_in_hodgkins drug_in_TCL cell_in_control cell_w_drug interaction
Hodgkins.Control -1 0 1 0 1
Hodgkins.Ixazomib 1 0 0 1 -1
TCL.Control 0 -1 -1 0 -1
TCL.Ixazomib 0 1 0 -1 1

Now we can run the fit as usual

gse66417_fit <- lmFit(gse66417_eset,design)
gse66417_fit2 <- contrasts.fit(gse66417_fit,contrasts=contrasts_matrix)
gse66417_fit2 <- eBayes(gse66417_fit2)
summary(decideTests(gse66417_fit2,lfc=1))
       drug_in_hodgkins drug_in_TCL cell_in_control cell_w_drug interaction
Down                451           2            2330        2769         380
NotSig            52649       53531           49312       48453       52888
Up                  517          84            1975        2395         349

Key Points

  • The formula class of objects in R enables us to represent a wide range of models to identify differentially expressed genes.