Introduction to R: Glossary

Key Points

Getting started with R and RStudio
  • R is a general purpose programming language; both the R console and RStudio are interfaces to R

  • RStudio has features that help R users control and document their activities.

  • The RStudio script editor allows users to send code line by line to the console using Control-Enter

  • R has an extensive help system.

Basic R language syntax
  • R objects like variables provide abstraction for R programming and data analysis

  • R functions encapsulate capabilities that we can re-use over and over with different input provided as arguments.

R data types
  • Both data frames and matrices are two dimensional objects with rows and columns, but data frame columns can be of different types.

  • Subsetting can be done with [] or [[]] in R.

R packages and the environment
  • R functions and objects are stored in packages

  • A user has access to all objects in all installed packages

  • Loading a package brings package objects into the user namespace, which is searched

Logical and arithmetic operations
  • Arithmetic operations are performed element wise in R.

  • %in% is used to find exact matches, while grep is used for inexact matches.

  • which is used to return the index of matches found using %in%.

Let's do it again: using loops
  • Control structures allow us to automate tasks in R, as long as certain conditions are met.

  • If-else differs from for and while loops in that the block of code evaluated depends on the result of the logical test.

  • The apply family of functions are a powerful way of replacing loops in R.

Working with data
  • The summary function is used to provide a quick overview of numeric data frames.

  • The same function in R can give different results depending on the data type.

  • mean and standard deviation for a vector are calculated using the mean and sd functions respectively.

  • apply can be used to extend the mean and sd functions to perform summary statistics on data frames or matrices.

  • cor is used to calculate correlation coefficient between two vectors, or if a matrix is provided, pairwise correlation coefficients between variables.

  • It is important to know what type of data one has on hand, as functions require specific data types as inputs.

  • Default options might not always be the most appropriate for the data at hand, and hence it is important to know what the defaults are when writing scripts.
