Introduction to gene expression microarray analysis in R and Bioconductor: Glossary

Key Points

Working with Bioconductor
  • Bioconductor is an archive containing a wide range of packages for bioinformatics analysis.

  • Installation of Bioconductor packages is done using the BiocManager::install() function, rather than through the usual install.packages()

Importing processed microarray data into R from GEO
  • GEO data types have enough similarities to allow data access, but enough differences to require specific type-specific steps. and analysis.

  • The ExpressionSet class of object contains slots for different information associated with a microarray experiment.

Importing raw (unprocessed) Affymetrix microarray data
  • GEO data types have enough similarities to allow data access, but enough differences to require specific type-specific steps.

  • The ExpressionSet class of object contains slots for different information associated with a microarray experiment.

Working with experimental metadata
  • GEO metadata can be cast into R data objects for analysis. The details are up to the user.

  • Using proper phenoData to describe an experiment helps to ensure reproducibility and avoid reading in files out of order

Microarray Data processing with RMA
  • RMA, the most widely used processing algorithm for Affymetrix data, is implemented in R using the rma() function in the oligo or affy packages, depending on how the data was imported.

  • The steps of background correction, quantile normalisation, and summarisation are performed in order to obtain feature-level data

Identifying differentially expressed genes using linear models (part 1)
  • The formula class of objects in R enables us to represent a wide range of models to identify differentially expressed genes.

Identifying differentially expressed genes using linear models (part 2, factorial designs)
  • The formula class of objects in R enables us to represent a wide range of models to identify differentially expressed genes.

From features to annotated gene lists
  • BioConductor has a rich annotation infrastructure, with different data type being stored in different annotation packages.

  • The select() function allows us to efficiently query annotation databases.

  • Using topTable() in conjunction with rownames() allows us to retrieve all the probes which are differentially expressed between our experimental conditions.

Basic downstream analysis of microarray data
  • Volcano plots can be used to infer relationships between fold changes and statstical confidence.

  • Heatmaps can be used to visualize distinct trends in gene expression patterns between different experimental conditions by clustering.

Glossary

FIXME