Let's do it again: using loops

Overview

Teaching: 10 min
Exercises: 15 min
Learning Objectives
  • Write and use basic loop structures for performing repetitive computations.

  • Become aware of the functional alternatives to control structures.

Control structures

Control structures allow us to automate tasks that have to be done repetitively, as long as a given condition is not met. In R – just like in other programming languages, there are three key control structures, namely:

  1. for loops;
  2. while loops;
  3. if-else statements.

We will discuss how each of the following is used in R. However, the principles behind these three control structures are shared across most, if not all, programming languages and hence understanding how they work is imperative.

for loops

In programming, there is a principle called DRY – also known as “Don’t Repeat Yourself”. Consider the following example where one needs to print the statement “The year is [year]” where year is between 2010 and 2015. If we had to do it manually, we will end up doing the following:

# (Taken from https://www.r-bloggers.com/how-to-write-the-first-for-loop-in-r/)
print(paste("The year is", 2010))
[1] "The year is 2010"
print(paste("The year is", 2011))
[1] "The year is 2011"
print(paste("The year is", 2012))
[1] "The year is 2012"
print(paste("The year is", 2013))
[1] "The year is 2013"
print(paste("The year is", 2014))
[1] "The year is 2014"
print(paste("The year is", 2015))
[1] "The year is 2015"

As is evident, the above is tedious as we need to type the same thing over and over again while only varying the year. This is in clear violation of the DRY principle. The correct way to do it is to use a for loop, as such:

for (year in 2010:2015) {
  print(paste("The year is", year))
}
[1] "The year is 2010"
[1] "The year is 2011"
[1] "The year is 2012"
[1] "The year is 2013"
[1] "The year is 2014"
[1] "The year is 2015"

The general skeleton of a for loop is as such:

for (i in <vector> ){
	<code chunk here>
}

Writing your first for loop

Using a for loop, calculate and print the squares of the first 10 natural numbers

Solution

for (i in 1:10) { 
  print(i^2)
}
[1] 1
[1] 4
[1] 9
[1] 16
[1] 25
[1] 36
[1] 49
[1] 64
[1] 81
[1] 100

while loops

While loops evaluate for as long as the logical test returns a TRUE. This is unlike a for loop which only iterates along a vector.

Infinite loops

A loop may continue forever when the stopping condition is never met. Consider the following example:

y <- 1
x <- 5
while (y <= 5){
	x <- x + 1
	print(x)
}

This code will loop infinitely because y will forever be 1. On the other hand, x will increase till the evaluation gets killed manually because its value is being increased by 1 at each successive iteration. It is a common mistake to make to not increase the value of y (which serves as the counter) hence leading to an infinite loop. Instead, the correct code will be as follows:

y <- 1
x <- 5
while (y <= 5){
	x <- x + 1
	print(x)
	y <- y + 1
}
[1] 6
[1] 7
[1] 8
[1] 9
[1] 10

Writing your first while loop

Write a short script that prints out all the numbers between 10 and 100, inclusive.

solution

i_start <- 10
i_stop <- 100
i <-  i_start
while (i <= i_stop) {
    print(i)
    i <- i + 1
    }
[1] 10
[1] 11
[1] 12
[1] 13
[1] 14
[1] 15
[1] 16
[1] 17
[1] 18
[1] 19
[1] 20
[1] 21
[1] 22
[1] 23
[1] 24
[1] 25
[1] 26
[1] 27
[1] 28
[1] 29
[1] 30
[1] 31
[1] 32
[1] 33
[1] 34
[1] 35
[1] 36
[1] 37
[1] 38
[1] 39
[1] 40
[1] 41
[1] 42
[1] 43
[1] 44
[1] 45
[1] 46
[1] 47
[1] 48
[1] 49
[1] 50
[1] 51
[1] 52
[1] 53
[1] 54
[1] 55
[1] 56
[1] 57
[1] 58
[1] 59
[1] 60
[1] 61
[1] 62
[1] 63
[1] 64
[1] 65
[1] 66
[1] 67
[1] 68
[1] 69
[1] 70
[1] 71
[1] 72
[1] 73
[1] 74
[1] 75
[1] 76
[1] 77
[1] 78
[1] 79
[1] 80
[1] 81
[1] 82
[1] 83
[1] 84
[1] 85
[1] 86
[1] 87
[1] 88
[1] 89
[1] 90
[1] 91
[1] 92
[1] 93
[1] 94
[1] 95
[1] 96
[1] 97
[1] 98
[1] 99
[1] 100

if-else statements

Unlike the earlier two control structures which performs a task repetitively, an if-else statement functions as a switch depending on whether a logical condition. For example:

x <- 5
if (x < 6) {
    print("FALSE")
  } else {
    print("TRUE")
  }
[1] "FALSE"

Extra: using functional programming with the apply family of functions

Although loops are powerful, control structures like for loops and while loops don’t return a value. If the purpose of a loop is to build up a vector of results – one for each iteration of the loop – you will need to assign the value inside each iteration. It might look like this:

for (i in 1:10) {
 foo[i] <- my_func(i) # assign the value of the vector foo
}

Functions like my_func(), of course, can return values. R is a language that follows a functional programming paradigm. In keeping with that style of programming, the preferred approach for many tasks is to use a function from the apply family of functions instead of a looping control structure.

The apply version of the for loop above would look like this

foo <- sapply(1:10,my_func)

To find out what these functions are, simply type ??apply. The following output results:

base::apply             Apply Functions Over Array Margins
  Aliases: apply
base::by                Apply a Function to a Data Frame Split by
                        Factors
base::eapply            Apply a Function Over Values in an Environment
  Aliases: eapply
base::lapply            Apply a Function over a List or Vector
  Aliases: lapply, sapply, vapply
base::mapply            Apply a Function to Multiple List or Vector
                        Arguments
  Aliases: mapply
base::rapply            Recursively Apply a Function to a List
  Aliases: rapply
base::.subset           Internal Objects in Package 'base'
  Aliases: .mapply
base::tapply            Apply a Function Over a Ragged Array
  Aliases: tapply

Although there are numerous functions in the apply family of functions, there are two functions that are used very often. They are (1) apply and (2) lapply. For that reason, we will focus on their usage below.

apply

From the manual,

apply                   package:base                   R Documentation

Apply Functions Over Array Margins

Description:

     Returns a vector or array or list of values obtained by applying a
     function to margins of an array or matrix.

Usage:

     apply(X, MARGIN, FUN, ...)

From the usage, we know that apply() requires 3 components: a data frame or matrix x, a margin and a function to apply along the margins. So what are margins?

Simply, margins refer to rows and/or columns. For example:

(Taken from https://nsaunders.wordpress.com/2010/08/20/a-brief-introduction-to-apply-in-r/)
# create a matrix of 10 rows x 2 columns
m <- matrix(c(1:10, 11:20), nrow = 10, ncol = 2)
# mean of the rows
apply(m, 1, mean)
 [1]  6  7  8  9 10 11 12 13 14 15
# mean of the columns
apply(m, 2, mean)
[1]  5.5 15.5

If we want to apply the function along the rows, then the margin is “1”. Similarly, if we want to apply the function along the columns, then the margin will be two. We can apply a value along both rows and columns (that is, to every single element) by specifying the margin to be both “1” and “2” (that is, c(1,2)).

lapply

From the manual

lapply                  package:base                   R Documentation

Apply a Function over a List or Vector

Description:

     ‘lapply’ returns a list of the same length as ‘X’, each element of
     which is the result of applying ‘FUN’ to the corresponding element
     of ‘X’.

Because it is quite well explained, it is easy to see what lapply() is used for. While apply() is used for dataframes or matrices, lapply() is used for lists. The following example will illustrate what lapply() does:

(Taken from https://nsaunders.wordpress.com/2010/08/20/a-brief-introduction-to-apply-in-r/)
# create a list with 2 elements
l <- list(a = 1:10, b = 11:20)
# the mean of the values in each element
lapply(l, mean)
$a
[1] 5.5
 
$b
[1] 15.5
 
# the sum of the values in each element
lapply(l, sum)
$a
[1] 55
 
$b
[1] 155

sapply()

The sapply() function provides a wrapper around lapply to return a simple vector or matrix instead of a list





Key Points

  • Control structures allow us to automate tasks in R, as long as certain conditions are met.

  • If-else differs from for and while loops in that the block of code evaluated depends on the result of the logical test.

  • The apply family of functions are a powerful way of replacing loops in R.