Introduction to gene expression microarray analysis in R and Bioconductor: Glossary

Key Points

Working with Bioconductor	Bioconductor is an archive containing a wide range of packages for bioinformatics analysis. Installation of Bioconductor packages is done using the `BiocManager::install()` function, rather than through the usual `install.packages()`
Importing processed microarray data into R from GEO	GEO data types have enough similarities to allow data access, but enough differences to require specific type-specific steps. and analysis. The ExpressionSet class of object contains slots for different information associated with a microarray experiment.
Importing raw (unprocessed) Affymetrix microarray data	GEO data types have enough similarities to allow data access, but enough differences to require specific type-specific steps. The ExpressionSet class of object contains slots for different information associated with a microarray experiment.
Working with experimental metadata	GEO metadata can be cast into R data objects for analysis. The details are up to the user. Using proper phenoData to describe an experiment helps to ensure reproducibility and avoid reading in files out of order
Microarray Data processing with RMA	RMA, the most widely used processing algorithm for Affymetrix data, is implemented in R using the `rma()` function in the `oligo` or `affy` packages, depending on how the data was imported. The steps of background correction, quantile normalisation, and summarisation are performed in order to obtain feature-level data
Identifying differentially expressed genes using linear models (part 1)	The `formula` class of objects in R enables us to represent a wide range of models to identify differentially expressed genes.
Identifying differentially expressed genes using linear models (part 2, factorial designs)	The `formula` class of objects in R enables us to represent a wide range of models to identify differentially expressed genes.
From features to annotated gene lists	BioConductor has a rich annotation infrastructure, with different data type being stored in different annotation packages. The `select()` function allows us to efficiently query annotation databases. Using `topTable()` in conjunction with `rownames()` allows us to retrieve all the probes which are differentially expressed between our experimental conditions.
Basic downstream analysis of microarray data	Volcano plots can be used to infer relationships between fold changes and statstical confidence. Heatmaps can be used to visualize distinct trends in gene expression patterns between different experimental conditions by clustering.

Glossary

FIXME