Logical and arithmetic operations
Overview
Teaching: 10 min
Exercises: 10 minLearning Objectives
Be able to perform basic arithmetic in R using standard symbols.
Be aware of different logical tests in R.
Perform value matching using %in% and grep, knowing the difference between the two operations.
Vector arithmetic in R
Basic arithmetic operations on numeric data use standard calculator symbols such as +
,
-
, *
, and /
for addition, subtraction, multiplication and division
respectively. Other operations that might be of interest include the modulus
operator %%
, which allows us to calculate the remainder of an integer division, for
instance, and %/%
which is integer division without the remainder.
5 %% 2
[1] 1
5 %/% 2
[1] 2
In order to raise a numeric value to a power, we simply use ^
. For example,
3^3
[1] 27
More information on arithmetic can be found at https://stat.ethz.ch/R-manual/R-devel/library/base/html/Arithmetic.html.
What happens if we are operating on a vector of numbers, rather than a single number? Since everything is a vector, it works on all items:
a <- c(2,2,5,5,6)
a %% 2
[1] 0 0 1 1 0
When working with two vectors, arithmetic operations in R are performed element wise. This is shown in the following:
a <- c(2,2,5,5,6)
b <- c(2,2,2,3,2)
a %% b
[1] 0 0 1 2 0
Be careful about recycling!
The element-wise nature of R operations has some unexpected implications. Consider the following:
a %% c(2,3)
Warning in a%%c(2, 3): longer object length is not a multiple of shorter object length
[1] 0 2 1 2 0
Noticeably, R returns a cryptic warning message. What R did was to repeat (or recycle) the shorter object until it had enough elements to match the longer objects and then do the computations element-wise. This is referred to as “element recycling”, and can be useful, but as you can see can also be a bit counter-intuitive. The error message does not occur in every case of recycling, but only because length of the longer vector (5) was not a multiple of the shorter (2), so R had to truncate some entries in the shorter vector when doing the operation.
Arithmetic operations on two dimensional objects
What happens when we try to divide a 2 dimensional object such as a matrix by:
- An object of equal dimension;
- A vector of length one (that is, a single number);
- An object of different dimension?
Solution
# let's create a 3x3 matrix of numbers from 1:9 m <- matrix(1:9,nrow=3,byrow = TRUE) m
[,1] [,2] [,3] [1,] 1 2 3 [2,] 4 5 6 [3,] 7 8 9
# dividing by itself operates element-wise m / m
[,1] [,2] [,3] [1,] 1 1 1 [2,] 1 1 1 [3,] 1 1 1
# dividing by a single number does what you might expect m / 2
[,1] [,2] [,3] [1,] 0.5 1.0 1.5 [2,] 2.0 2.5 3.0 [3,] 3.5 4.0 4.5
# dividing by a row or a column, in both cases, treats the second operand as a vector, and thus performs column-wise division m / m[1,] # the second operator is a vector of the first row
[,1] [,2] [,3] [1,] 1.000000 2.000000 3 [2,] 2.000000 2.500000 3 [3,] 2.333333 2.666667 3
m / m[,1] # the second operator is a vector of the first row
[,1] [,2] [,3] [1,] 1 2.000000 3.000000 [2,] 1 1.250000 1.500000 [3,] 1 1.142857 1.285714
One commonly used arithmetic operation in bioinformatics is the log operation.
This can be done using the log()
function.
Changing the base of a logarithm in R
What is the default base used for log in R? How could you take a logarithm using a different base?
Hint: Use help(log) for more information on the log function.
Solution
args(log)
function (x, base = exp(1)) NULL
The default is an argument called
base
, which defaults toe
. This can be changed, for example, bylog(x,base=10)
Logical operators in R
Logical operations are used to determine if a condition is TRUE or FALSE. As such, logical operations return logical values only. The common logical operations are as follows:
Symbol | Interpretation |
---|---|
> | greater than |
>= | greater than or equal to |
< | less than |
<= | less than or equal to |
== | equal to |
!= | Not equal to |
! | not (this is a negation operator an inverts the logical value of whatever follows) |
All but the last are infix operators: they are placed between the variables or
values on which they operate, e.g, x + y
. The negation operator is a prefix
operator; it negates the logical value of whatever immediately follows.
As shown above, most of the logical operators are similar to what we use in
everyday mathematics. Also as in ordinary mathematics, parentheses can be used
to enforce order of evaluation. The major notational difference is the use of
==
to test for equality instead of =
. This is common in computer languages
because =
is often used in computer languages as an assignment operator. (R
can use =
for variable assignment, but <-
is preferred by most R style
guides.)
Value matching in R
Suppose we are interested in whether a vector contains certain values. There are
several ways to determine this. The first is the %in%
operator, which
searches for exact matches of elements. For example:
a <- (1:10)^2
a %in% c(22,25,49)
[1] FALSE FALSE FALSE FALSE TRUE FALSE TRUE FALSE FALSE FALSE
The above snippet can be read as “Do the elements in vector a match our
desired values? Notice that one of the desired values (22) is not a member of a at all, so %in%
acts as a filter. The result of %in%
is a logical vector showing
which are TRUE
(that is, which entries match the values of interest). We can
directly perform subsetting to keep only the entries that matches the values of
interest:
a[a %in% c(22,25,49)]
[1] 25 49
On the other hand, to get only the indexes of the match, we will need to combine %in%
with another function, which()
, as such:
which(a %in% c(22,25,49))
[1] 5 7
which tells us that the values 25 and 29 are found at position 5 and 7 of our vector.
We can combine the logical prefix operator !
with which()
to find out which entries do not match our query. For example:
which(!(a %in% c(22,25,49)))
[1] 1 2 3 4 6 8 9 10
Inexact matches
The examples above all find exact matches, but grep()
can be used to find inexact matches. For example:
genes <- c("TP63","TP53", "CDK1", "Ras", "pTP53")
grep("TP",genes)
[1] 1 2 5
In this example, we first created a vector of gene names. The grep()
function can be understood as: which entries in vector genes
contain the pattern
“TP”?. grep
will return a numeric vector containing the indexes of the
matches, which is immensely useful for subsetting (remember that subsetting of
vectors require us to only provide the index corresponding to the entries of
interest).
A note on regular expressions
grep()
is in fact searching using regular expressions. Regular
expressions (regex in short) are sequences of characters that define a
formal search pattern. Regular expressions provide powerful pattern matching
capabilities in many computer languages, so it it useful to learn about the
patterns. In this case, the ordinary letters “TP” search for the exact
substring, but other patterns are possible. For example, grep("^TP", a)
anchors the pattern to the beginning of each string, and returns only 1 2
.
The fifth entry (pTP53
) does not match because the “TP” pattern is not at the
beginning of the string.
A version of grep
is found in bash. For those interested, it is worth
learning more about the different regular expressions that can be used with
grep
. More can be found at This link
Key Points
Arithmetic operations are performed element wise in R.
%in%
is used to find exact matches, whilegrep
is used for inexact matches.
which
is used to return the index of matches found using %in%.