Working with Bioconductor
Overview
Teaching: 10 min
Exercises: 10 minLearning Objectives
Become familiar with the basic Bioconductor setup.
Be able to install the appropriate Bioconductor packages for microarray analysis
Be able to use the help system and vignettes for Bioconductor packages
Bioconductor is an open-source project and software repository hosting a wide range of packages tailored for the analysis of biological data. As a repository, Bioconductor is a complement to CRAN, which is used for R package hosting in general. Importantly, Bioconductor provides widely used object classes (such as the ExpressionSet class, which we will discuss later) for representing and manipulating genomic data, as well as data packages such as annotation for both microarray platforms and genomes.
Installing Bioconductor packages
Bioconductor has its own package repository, like CRAN, for its packages. In order to install packages from Bioconductor, you will first need to set up the Bioconductor installer on your computer. This can be done using the code described in more detail at the Bioconductor installation page, but basically this:
if (!requireNamespace("BiocManager"))
install.packages("BiocManager")
BiocManager::install()
Notice the use of if
. The BiocManager
package will only be
installed if it can’t be loaded, because requireNamespace()
returns
a logical value depending on the success of loading the named library.
After loading the BiocManager namespace, BiocManager::install()
installs the required Bioconductor packages. To install an optional
Bioconductor package named “foo”, you can run
BiocManager::install('foo')
instead of install.packages()
.
Setting up your computer for today’s lesson.
For today’s lesson we will be using data from GEO. There is a package in Bioconductor called GEOquery
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("GEOquery")
You will need to install the following packages which we will be using for the rest of the practical today. Some of them may be intalled by default, so only install them if they are not available.
- affy
- A package for analysing Affymetrix platform data
- oligo
- A different package for analysing oligonucleotide platform data, including Affymetrix data
- limma
- A package for analysing linear models of microarray data.
- hgu133plus2.db
- an annotation package for the Affymetrix microarrays of this lesson.
- org.Hs.eg.db
- Annotation of the human genome
Once these packages are installed, load them using library()
. Check that these
packages are loaded using sessionInfo()
or search()
.
Masking objects with
library()
when you attached a package with
library()
you may have noticed a bunch of messages beginning withThe following objects are masked ...
. Remember,library()
adds packages at position 2 of the search path by default. This means that any time you attach a package usinglibrary()
you risk masking objects of the same name in other packages.You can always refer to an object
foo
from packagemypackage
using its fully qualified name:mypackage::foo
.
Getting help for Bioconductor packages
Bioconductor workflows
Some great help is available at this page on Bioconductor, especially if you are unsure of how to approach a common problem. The site includes sample workflows for the most common analysis types, including sample code. Unlike vignettes (below) which deal with specific packages, these workflows often describe multiple packages commonly used together to answer a biological question.
Package vignettes
Many, if not most, packages that are deposited on Bioconductor includes a vignette, which is a short example of how to use the package to address a given problem. The vignettes will also provide example code, and use self-contained datasets to demonstrate use of the package.
To find the vignettes available for a package, type
browseVignettes('packagename')
at the R console.
Find out what you can do with an object
Bioconductor objects may seem complicated. Many of them have specific methods available, but you need to know what functions you can call on them. You can find the methods available for an object by using its class, via
## find the methods available for an object named foo methods(class=class(foo))
Package reference manuals
This is usually a more technical document that details the usage of different functions within the package. Usually, this is what people refer to when they want to know the full range of arguments that can be passed into the function. Also, the reference manuals will include information about default options that are used if a particular argument is not supplied by the user.
Bioconductor mailing list
This is a community-driven resource for help. The Bioconductor mailing list is very active, with people providing not just help when one runs into errors in their scripts but also advice on statistical tests/methods for analyzing ones data. Questions on the mailing list can be found at https://support.bioconductor.org/ Please read the posting guide http://bioconductor.org/help/support/posting-guide/ prior to posting. Users on the forum assume that you are very familiar with R.
Key Points
Bioconductor is an archive containing a wide range of packages for bioinformatics analysis.
Installation of Bioconductor packages is done using the
BiocManager::install()
function, rather than through the usualinstall.packages()