Logical and arithmetic operations

Overview

Teaching: 10 min
Exercises: 10 min
Learning Objectives
  • Be able to perform basic arithmetic in R using standard symbols.

  • Be aware of different logical tests in R.

  • Perform value matching using %in% and grep, knowing the difference between the two operations.

Vector arithmetic in R

Basic arithmetic operations on numeric data use standard calculator symbols such as +, -, *, and / for addition, subtraction, multiplication and division respectively. Other operations that might be of interest include the modulus operator %%, which allows us to calculate the remainder of an integer division, for instance, and %/% which is integer division without the remainder.

5 %% 2
[1] 1
5 %/% 2
[1] 2

In order to raise a numeric value to a power, we simply use ^. For example,

3^3
[1] 27

More information on arithmetic can be found at https://stat.ethz.ch/R-manual/R-devel/library/base/html/Arithmetic.html.

What happens if we are operating on a vector of numbers, rather than a single number? Since everything is a vector, it works on all items:

a <- c(2,2,5,5,6)
a %% 2
[1] 0 0 1 1 0

When working with two vectors, arithmetic operations in R are performed element wise. This is shown in the following:

a <- c(2,2,5,5,6)
b <- c(2,2,2,3,2)
a %% b
[1] 0 0 1 2 0

Be careful about recycling!

The element-wise nature of R operations has some unexpected implications. Consider the following:

a %% c(2,3)
Warning in a%%c(2, 3): longer object length is not a multiple of shorter object
length
[1] 0 2 1 2 0

Noticeably, R returns a cryptic warning message. What R did was to repeat (or recycle) the shorter object until it had enough elements to match the longer objects and then do the computations element-wise. This is referred to as “element recycling”, and can be useful, but as you can see can also be a bit counter-intuitive. The error message does not occur in every case of recycling, but only because length of the longer vector (5) was not a multiple of the shorter (2), so R had to truncate some entries in the shorter vector when doing the operation.

Arithmetic operations on two dimensional objects

What happens when we try to divide a 2 dimensional object such as a matrix by:

  1. An object of equal dimension;
  2. A vector of length one (that is, a single number);
  3. An object of different dimension?

Solution

# let's create a 3x3 matrix of numbers from 1:9
m <- matrix(1:9,nrow=3,byrow = TRUE)

m
     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    4    5    6
[3,]    7    8    9
                                        # dividing by itself operates element-wise

m / m 
     [,1] [,2] [,3]
[1,]    1    1    1
[2,]    1    1    1
[3,]    1    1    1
# dividing by a single number does what you might expect

m / 2
     [,1] [,2] [,3]
[1,]  0.5  1.0  1.5
[2,]  2.0  2.5  3.0
[3,]  3.5  4.0  4.5
# dividing by a row or a column, in both cases, treats the second operand as a vector, and thus performs column-wise division

m / m[1,] # the second operator is a vector of the first row
         [,1]     [,2] [,3]
[1,] 1.000000 2.000000    3
[2,] 2.000000 2.500000    3
[3,] 2.333333 2.666667    3
m / m[,1] # the second operator is a vector of the first row
     [,1]     [,2]     [,3]
[1,]    1 2.000000 3.000000
[2,]    1 1.250000 1.500000
[3,]    1 1.142857 1.285714

One commonly used arithmetic operation in bioinformatics is the log operation. This can be done using the log() function.

Changing the base of a logarithm in R

What is the default base used for log in R? How could you take a logarithm using a different base?

Hint: Use help(log) for more information on the log function.

Solution

args(log)
function (x, base = exp(1)) 
NULL

The default is an argument called base, which defaults to e. This can be changed, for example, by log(x,base=10)

Logical operators in R

Logical operations are used to determine if a condition is TRUE or FALSE. As such, logical operations return logical values only. The common logical operations are as follows:

Symbol Interpretation
> greater than
>= greater than or equal to
< less than
<= less than or equal to
== equal to
!= Not equal to
! not (this is a negation operator an inverts the logical value of whatever follows)

All but the last are infix operators: they are placed between the variables or values on which they operate, e.g, x + y. The negation operator is a prefix operator; it negates the logical value of whatever immediately follows.

As shown above, most of the logical operators are similar to what we use in everyday mathematics. Also as in ordinary mathematics, parentheses can be used to enforce order of evaluation. The major notational difference is the use of == to test for equality instead of =. This is common in computer languages because = is often used in computer languages as an assignment operator. (R can use = for variable assignment, but <- is preferred by most R style guides.)

Value matching in R

Suppose we are interested in whether a vector contains certain values. There are several ways to determine this. The first is the %in% operator, which searches for exact matches of elements. For example:

a <- (1:10)^2
a %in% c(22,25,49)
 [1] FALSE FALSE FALSE FALSE  TRUE FALSE  TRUE FALSE FALSE FALSE

The above snippet can be read as “Do the elements in vector a match our desired values? Notice that one of the desired values (22) is not a member of a at all, so %in% acts as a filter. The result of %in% is a logical vector showing which are TRUE (that is, which entries match the values of interest). We can directly perform subsetting to keep only the entries that matches the values of interest:

a[a %in% c(22,25,49)]
[1] 25 49

On the other hand, to get only the indexes of the match, we will need to combine %in% with another function, which(), as such:

which(a %in% c(22,25,49))
[1] 5 7

which tells us that the values 25 and 29 are found at position 5 and 7 of our vector.

We can combine the logical prefix operator ! with which() to find out which entries do not match our query. For example:

which(!(a %in% c(22,25,49)))
[1]  1  2  3  4  6  8  9 10

Inexact matches

The examples above all find exact matches, but grep() can be used to find inexact matches. For example:

genes <- c("TP63","TP53", "CDK1", "Ras", "pTP53")
grep("TP",genes)
[1] 1 2 5

In this example, we first created a vector of gene names. The grep() function can be understood as: which entries in vector genes contain the pattern “TP”?. grep will return a numeric vector containing the indexes of the matches, which is immensely useful for subsetting (remember that subsetting of vectors require us to only provide the index corresponding to the entries of interest).

A note on regular expressions

grep() is in fact searching using regular expressions. Regular expressions (regex in short) are sequences of characters that define a formal search pattern. Regular expressions provide powerful pattern matching capabilities in many computer languages, so it it useful to learn about the patterns. In this case, the ordinary letters “TP” search for the exact substring, but other patterns are possible. For example, grep("^TP", a) anchors the pattern to the beginning of each string, and returns only 1 2. The fifth entry (pTP53) does not match because the “TP” pattern is not at the beginning of the string.

A version of grep is found in bash. For those interested, it is worth learning more about the different regular expressions that can be used with grep. More can be found at This link



Key Points

  • Arithmetic operations are performed element wise in R.

  • %in% is used to find exact matches, while grep is used for inexact matches.

  • which is used to return the index of matches found using %in%.