Identifying differentially expressed genes using linear models (part 2, factorial designs)

Overview

Teaching: 20 min
Exercises: 20 min
Learning Objectives
  • Be able to use limma to identify differentially expressed genes.

  • Understand the formula class of objects in R, and use it to specify the appropriate model for linear modeling.

Experimental designs with more than one covariate

GSE66417 is an example of a factorial experimental design, in which two covariates are varied in a single experiment. In this case, the cell type is (Lymphoma or CTL cells) and the treatment is varied (control or Ixozamib). This is called a 2x2 factorial design.

In order to model the expression of each gene, we can model the group mean expression under each condition. We have four conditions, so the model is something like

\[Y = \beta_1 X_1 + \beta_2 X_2 + \beta_3 X_3 + \beta_4 X_4 + \epsilon\]

where each value $\beta$ represents a particular condition. We would like to identify genes where othe contrasts represent changes between different groups, i.e., the effect of drug treatment in each cell type. In order to do that, we need to specify contrasts explicitly. One way to do this is to create a “dummy” variable that represents the four groups.

library(dplyr)
pd <- pData(gse66417_eset)
pd <- rename(pd,cell_type="cell type:ch1",treatment="treatment:ch1")
pd$treatment <- as.factor(pd$treatment)
levels(pd$treatment) <- c("Ixazomib","Control")
pd$group <- as.factor(paste(pd$cell_type,pd$treatment))
levels(pd$group) <- c("Hodgkins.Control","Hodgkins.Ixazomib","TCL.Control","TCL.Ixazomib")


|           |cell_type        |treatment |group             |
|:----------|:----------------|:---------|:-----------------|
|GSM1622170 |T-Cell Lymphoma  |Control   |TCL.Control       |
|GSM1622189 |T-Cell Lymphoma  |Control   |TCL.Control       |
|GSM1622191 |T-Cell Lymphoma  |Control   |TCL.Control       |
|GSM1622194 |T-Cell Lymphoma  |Ixazomib  |TCL.Ixazomib      |
|GSM1622196 |T-Cell Lymphoma  |Ixazomib  |TCL.Ixazomib      |
|GSM1622198 |T-Cell Lymphoma  |Ixazomib  |TCL.Ixazomib      |
|GSM1622200 |Hodgkin Lymphoma |Control   |Hodgkins.Control  |
|GSM1622202 |Hodgkin Lymphoma |Control   |Hodgkins.Control  |
|GSM1622204 |Hodgkin Lymphoma |Control   |Hodgkins.Control  |
|GSM1622206 |Hodgkin Lymphoma |Ixazomib  |Hodgkins.Ixazomib |
|GSM1622209 |Hodgkin Lymphoma |Ixazomib  |Hodgkins.Ixazomib |
|GSM1622211 |Hodgkin Lymphoma |Ixazomib  |Hodgkins.Ixazomib |

Now we can create a design representing the different groups

design <- model.matrix(~ 0 + pd$group)
colnames(design) <- levels(pd$group)
design
   Hodgkins.Control Hodgkins.Ixazomib TCL.Control TCL.Ixazomib
1                 0                 0           1            0
2                 0                 0           1            0
3                 0                 0           1            0
4                 0                 0           0            1
5                 0                 0           0            1
6                 0                 0           0            1
7                 1                 0           0            0
8                 1                 0           0            0
9                 1                 0           0            0
10                0                 1           0            0
11                0                 1           0            0
12                0                 1           0            0
attr(,"assign")
[1] 1 1 1 1
attr(,"contrasts")
attr(,"contrasts")$`pd$group`
[1] "contr.treatment"

Our contrasts can be formed by the usual makeContrasts() function, but we can easily specify five contrasts that might be interesting.

contrasts_matrix <- makeContrasts(drug_in_hodgkins=Hodgkins.Ixazomib - Hodgkins.Control,
              drug_in_TCL=TCL.Ixazomib - TCL.Control,
              cell_in_control=Hodgkins.Control - TCL.Control,
              cell_w_drug=Hodgkins.Ixazomib - TCL.Ixazomib,
              interaction=(Hodgkins.Control - TCL.Control) - (Hodgkins.Ixazomib - TCL.Ixazomib),
              levels=design)
kable(contrasts_matrix)


|                  | drug_in_hodgkins| drug_in_TCL| cell_in_control| cell_w_drug| interaction|
|:-----------------|----------------:|-----------:|---------------:|-----------:|-----------:|
|Hodgkins.Control  |               -1|           0|               1|           0|           1|
|Hodgkins.Ixazomib |                1|           0|               0|           1|          -1|
|TCL.Control       |                0|          -1|              -1|           0|          -1|
|TCL.Ixazomib      |                0|           1|               0|          -1|           1|

Now we can run the fit as usual

gse66417_fit <- lmFit(gse66417_eset,design)
gse66417_fit2 <- contrasts.fit(gse66417_fit,contrasts=contrasts_matrix)
gse66417_fit2 <- eBayes(gse66417_fit2)
summary(decideTests(gse66417_fit2,lfc=1))
       drug_in_hodgkins drug_in_TCL cell_in_control cell_w_drug interaction
Down                451           2            2330        2769         380
NotSig            52649       53531           49312       48453       52888
Up                  517          84            1975        2395         349

Key Points

  • The formula class of objects in R enables us to represent a wide range of models to identify differentially expressed genes.