Let's do it again: using loops
Overview
Teaching: 10 min
Exercises: 15 minLearning Objectives
Write and use basic loop structures for performing repetitive computations.
Become aware of the functional alternatives to control structures.
Control structures
Control structures allow us to automate tasks that have to be done repetitively, as long as a given condition is not met. In R – just like in other programming languages, there are three key control structures, namely:
- for loops;
- while loops;
- if-else statements.
We will discuss how each of the following is used in R. However, the principles behind these three control structures are shared across most, if not all, programming languages and hence understanding how they work is imperative.
for loops
In programming, there is a principle called DRY – also known as “Don’t Repeat Yourself”. Consider the following example where one needs to print the statement “The year is [year]” where year is between 2010 and 2015. If we had to do it manually, we will end up doing the following:
# (Taken from https://www.r-bloggers.com/how-to-write-the-first-for-loop-in-r/)
print(paste("The year is", 2010))
[1] "The year is 2010"
print(paste("The year is", 2011))
[1] "The year is 2011"
print(paste("The year is", 2012))
[1] "The year is 2012"
print(paste("The year is", 2013))
[1] "The year is 2013"
print(paste("The year is", 2014))
[1] "The year is 2014"
print(paste("The year is", 2015))
[1] "The year is 2015"
As is evident, the above is tedious as we need to type the same thing over and over again while only varying the year. This is in clear violation of the DRY principle. The correct way to do it is to use a for loop, as such:
for (year in 2010:2015) {
print(paste("The year is", year))
}
[1] "The year is 2010"
[1] "The year is 2011"
[1] "The year is 2012"
[1] "The year is 2013"
[1] "The year is 2014"
[1] "The year is 2015"
The general skeleton of a for loop is as such:
for (i in <vector> ){
<code chunk here>
}
Writing your first for loop
Using a for loop, calculate and print the squares of the first 10 natural numbers
Solution
for (i in 1:10) { print(i^2) }
[1] 1 [1] 4 [1] 9 [1] 16 [1] 25 [1] 36 [1] 49 [1] 64 [1] 81 [1] 100
while loops
While loops evaluate for as long as the logical test returns a TRUE. This is unlike a for loop which only iterates along a vector.
Infinite loops
A loop may continue forever when the stopping condition is never met. Consider the following example:
y <- 1 x <- 5 while (y <= 5){ x <- x + 1 print(x) }
This code will loop infinitely because y will forever be 1. On the other hand, x will increase till the evaluation gets killed manually because its value is being increased by 1 at each successive iteration. It is a common mistake to make to not increase the value of y (which serves as the counter) hence leading to an infinite loop. Instead, the correct code will be as follows:
y <- 1 x <- 5 while (y <= 5){ x <- x + 1 print(x) y <- y + 1 }
[1] 6 [1] 7 [1] 8 [1] 9 [1] 10
Writing your first while loop
Write a short script that prints out all the numbers between 10 and 100, inclusive.
solution
i_start <- 10 i_stop <- 100 i <- i_start while (i <= i_stop) { print(i) i <- i + 1 }
[1] 10 [1] 11 [1] 12 [1] 13 [1] 14 [1] 15 [1] 16 [1] 17 [1] 18 [1] 19 [1] 20 [1] 21 [1] 22 [1] 23 [1] 24 [1] 25 [1] 26 [1] 27 [1] 28 [1] 29 [1] 30 [1] 31 [1] 32 [1] 33 [1] 34 [1] 35 [1] 36 [1] 37 [1] 38 [1] 39 [1] 40 [1] 41 [1] 42 [1] 43 [1] 44 [1] 45 [1] 46 [1] 47 [1] 48 [1] 49 [1] 50 [1] 51 [1] 52 [1] 53 [1] 54 [1] 55 [1] 56 [1] 57 [1] 58 [1] 59 [1] 60 [1] 61 [1] 62 [1] 63 [1] 64 [1] 65 [1] 66 [1] 67 [1] 68 [1] 69 [1] 70 [1] 71 [1] 72 [1] 73 [1] 74 [1] 75 [1] 76 [1] 77 [1] 78 [1] 79 [1] 80 [1] 81 [1] 82 [1] 83 [1] 84 [1] 85 [1] 86 [1] 87 [1] 88 [1] 89 [1] 90 [1] 91 [1] 92 [1] 93 [1] 94 [1] 95 [1] 96 [1] 97 [1] 98 [1] 99 [1] 100
if-else statements
Unlike the earlier two control structures which performs a task repetitively, an if-else statement functions as a switch depending on whether a logical condition. For example:
x <- 5
if (x < 6) {
print("FALSE")
} else {
print("TRUE")
}
[1] "FALSE"
Extra: using functional programming with the apply family of functions
Although loops are powerful, control structures like for loops and while loops don’t return a value. If the purpose of a loop is to build up a vector of results – one for each iteration of the loop – you will need to assign the value inside each iteration. It might look like this:
for (i in 1:10) {
foo[i] <- my_func(i) # assign the value of the vector foo
}
Functions like my_func()
, of course, can return values. R is a language that follows a functional programming paradigm. In keeping with that style of programming, the preferred approach for many tasks is to use a function from the apply family of functions instead of a looping control structure.
The apply
version of the for
loop above would look like this
foo <- sapply(1:10,my_func)
To find out what these functions are, simply type ??apply
. The following output results:
base::apply Apply Functions Over Array Margins
Aliases: apply
base::by Apply a Function to a Data Frame Split by
Factors
base::eapply Apply a Function Over Values in an Environment
Aliases: eapply
base::lapply Apply a Function over a List or Vector
Aliases: lapply, sapply, vapply
base::mapply Apply a Function to Multiple List or Vector
Arguments
Aliases: mapply
base::rapply Recursively Apply a Function to a List
Aliases: rapply
base::.subset Internal Objects in Package 'base'
Aliases: .mapply
base::tapply Apply a Function Over a Ragged Array
Aliases: tapply
Although there are numerous functions in the apply family of functions, there are two functions that are used very often. They are (1) apply and (2) lapply. For that reason, we will focus on their usage below.
apply
From the manual,
apply package:base R Documentation
Apply Functions Over Array Margins
Description:
Returns a vector or array or list of values obtained by applying a
function to margins of an array or matrix.
Usage:
apply(X, MARGIN, FUN, ...)
From the usage, we know that apply()
requires 3 components: a data frame or matrix x, a margin and a function to apply along the margins. So what are margins?
Simply, margins refer to rows and/or columns. For example:
(Taken from https://nsaunders.wordpress.com/2010/08/20/a-brief-introduction-to-apply-in-r/)
# create a matrix of 10 rows x 2 columns
m <- matrix(c(1:10, 11:20), nrow = 10, ncol = 2)
# mean of the rows
apply(m, 1, mean)
[1] 6 7 8 9 10 11 12 13 14 15
# mean of the columns
apply(m, 2, mean)
[1] 5.5 15.5
If we want to apply the function along the rows, then the margin is “1”. Similarly, if we want to apply the function along the columns, then the margin will be two. We can apply a value along both rows and columns (that is, to every single element) by specifying the margin to be both “1” and “2” (that is, c(1,2)
).
lapply
From the manual
lapply package:base R Documentation
Apply a Function over a List or Vector
Description:
‘lapply’ returns a list of the same length as ‘X’, each element of
which is the result of applying ‘FUN’ to the corresponding element
of ‘X’.
Because it is quite well explained, it is easy to see what lapply()
is used for. While apply()
is used for dataframes or matrices, lapply()
is used for lists. The following example will illustrate what lapply()
does:
(Taken from https://nsaunders.wordpress.com/2010/08/20/a-brief-introduction-to-apply-in-r/)
# create a list with 2 elements
l <- list(a = 1:10, b = 11:20)
# the mean of the values in each element
lapply(l, mean)
$a
[1] 5.5
$b
[1] 15.5
# the sum of the values in each element
lapply(l, sum)
$a
[1] 55
$b
[1] 155
sapply()
The
sapply()
function provides a wrapper aroundlapply
to return a simple vector or matrix instead of a list
Key Points
Control structures allow us to automate tasks in R, as long as certain conditions are met.
If-else differs from for and while loops in that the block of code evaluated depends on the result of the logical test.
The apply family of functions are a powerful way of replacing loops in R.