Getting started with R and RStudio
|
R is a general purpose programming language; both the R console and RStudio are interfaces to R
RStudio has features that help R users control and document their activities.
The RStudio script editor allows users to send code line by line to the console using Control-Enter
R has an extensive help system.
|
Basic R language syntax
|
|
R data types
|
Both data frames and matrices are two dimensional objects with rows and columns, but data frame columns can be of different types.
Subsetting can be done with [] or [[]] in R.
|
R packages and the environment
|
R functions and objects are stored in packages
A user has access to all objects in all installed packages
Loading a package brings package objects into the user namespace, which is searched
|
Logical and arithmetic operations
|
Arithmetic operations are performed element wise in R.
%in% is used to find exact matches, while grep is used for inexact matches.
which is used to return the index of matches found using %in%.
|
Let's do it again: using loops
|
Control structures allow us to automate tasks in R, as long as certain conditions are met.
If-else differs from for and while loops in that the block of code evaluated depends on the result of the logical test.
The apply family of functions are a powerful way of replacing loops in R.
|
Working with data
|
The summary function is used to provide a quick overview of numeric data frames.
The same function in R can give different results depending on the data type.
mean and standard deviation for a vector are calculated using the mean and sd functions respectively.
apply can be used to extend the mean and sd functions to perform summary statistics on data frames or matrices.
cor is used to calculate correlation coefficient between two vectors, or if a matrix is provided, pairwise correlation coefficients between variables.
It is important to know what type of data one has on hand, as functions require specific data types as inputs.
Default options might not always be the most appropriate for the data at hand, and hence it is important to know what the defaults are when writing scripts.
|