Getting started with R and RStudio

Overview

Teaching: 15 min
Exercises: 5 min
Learning Objectives
  • Understand and describe two ways of using R interactively: The R console and RStudio

  • Be able to create an R script in RStudio and send R commands to the R console

  • Be able to use the basic R help system in R and RStudio

What is R?

R is a high-level programming language that is extensively used by statisticians and bioinformaticians. While R is a general purpose programming language, R is designed to work with data, and is heavily used in the data science community.

Additionally, R is an interpreted language, which means that code written in R will be executed by a command interpreter instead of requiring compilation (as is the case with compiled languages such as C or Java). Interpreted languages are especially convenient for real-time problem solving and rapid prototyping.

Use of R has increased substantially over the last few years in large part due to the growth in data intensive fields and data science. Hence, knowing how to work in R is becoming increasingly important, and will be the main language we will work in for analysis of genomic data.

R features

Various features of R makes it attractive for data analysis. These includes the following:

  1. R is open-source, which means both that algorithms and methods can be checked for validity, and that users of R contribute to the code base and development of new packages.
  2. R runs on all major computer platforms.
  3. R can be extended using packages.
  4. R is free software (as in free speech, not free beer), available under the [GNU General Public License][GPL] version 2 or greater. R packages may be distributed under different license terms.
  5. R has outstanding graphics and data visualisation capabilities, supporting publication quality graphs/figures.
  6. R has an active community of users and developers, which facilitates the ease of getting help from others.

Disadvantages of R

Life is full of tradeoffs, and not everything about R is an unambiguous advantage over other choices.

The R Console

One way to run R is inside bash. Try running R from either an open shell window (type R) or from the window launcher. In either case, you should wind up at a console window that looks something like this:

The R Console

The R Console is your basic R interface, provided by the language itself. There is a prompt, but it’s no longer the bash prompt $ , it’s the R prompt > . When R starts, it tells you what version of the R program is running and what kind of hardware you are using to run it.

R can be used as a fancy calculator: you can get output just by typing some math at the console, like this:

# This is some R input. A line that begins with # is a comment
# 22/7 is pretty close to pi, but it's not pi!
22/7
[1] 3.142857
# it's silly to estimate the value of pi as a fraction; R already defines a value for pi
pi
[1] 3.141593

In these episodes, we can distinguish the input (what you type at the console) from what the R language returns by the color coding on the left.

you can perform a lot of actions at the prompt. Try this, and see a plot similar to what is shown below

x <- rnorm(1000)
hist(x)

plot of chunk basic

RStudio

R is the language and interpreter, whereas [RStudio][rstudio] is a professional development environment that includes a code editor, debugging tools, and a server available from a browser. A screenshot of the RStudio interface is shown below. We encourage the use of RStudio as an interface to R.

RStudioWindow

In RStudio, the console, graphics, help windows, scripts, and a data explorer tool are all integrated into one graphical interface.

Try it!

Run the following in the R Console window in RStudio:

x <- rnorm(1000)
hist(x)

You can type R commands just like at the bash command line: each time you hit Enter you will send a command to the R interpreter.

What happens?

Solution

  • You may notice that when you type ( RStudio automatically adds the ) and positions the cursor appropriately.
  • Your plot will open up in a subwindow on the right, not a new window as it would if you were using the console.

R’s help system

Changing the sample distribution

Using RStudio, create a new R script. Enter code into the script to create a few lines of R code to sample a normal distribution with mean 100 and standard deviation 5 (hint: use ?rnorm to figure this out), and plot a histogram of the results. Use Ctrl-Enter to send each line from the script editor to the console.

Solution

Let’s sample a large number of points to get a good smooth distribution.

x <- rnorm(n=10000,mean=100,sd=5)
hist(x)

plot of chunk newdistribution

Our last preparation step before moving on

  1. Using RStudio, create a New Project (File -> New Project) in a new directory (I suggest the Desktop). Call the project whatever you want.
  2. Under the Files tab on the right (!), create a new folder called “data”. We will use this later in the session.


Key Points

  • R is a general purpose programming language; both the R console and RStudio are interfaces to R

  • RStudio has features that help R users control and document their activities.

  • The RStudio script editor allows users to send code line by line to the console using Control-Enter

  • R has an extensive help system.