R Tutorial #4

Types of loops

Like if/else statements, loops are a useful form of control flow. Loops allow for the repetitive execution of code. This makes it so that when writing our code, all we need to do is figure out how to perform our operation for 1 value. The loop then allows us to perform that operation on all values.

In this tutorial, we will discuss for, while, and apply loops. The for and while loops are common to most programming languages, but the apply loop is more specific to R.

for loops

A for loop in R iterates over the given indices and performs the code within the loop for each index. Let’s take a vector

x = 11:20
x

 [1] 11 12 13 14 15 16 17 18 19 20

and a new empty vector of the same length

y = numeric(10)
y

 [1] 0 0 0 0 0 0 0 0 0 0

We can write a for loop to iterate over each element of the vector “x”, add 5, and store it in “y” using

for(i in 1:10){
  y[i] = x[i] + 5
}
y

 [1] 16 17 18 19 20 21 22 23 24 25

Like an if statement, the for loop has 3 parts. The for statement, the index, and the code to be executed. In this case, we call the index “i”, but it could be named anything.

The for loop in R creates a variable “i”, and sets its value to the first number in the index, in this case 1. It then executes the code within the {}. Here, we take x[1], add 5, and store it in y[1].

Once the end of the {} is reached, the loop goes back to the beginning and moves to the next value of the index, in this case 2. It then changes “i” to 2 and repeats the process. So the next iteration takes x[2], adds 5, and stores it in y[2]. This process is repeated until the last value of the index, 10. After the code is executed for i=10, the loop terminates.

Note that the “i” variable remains in the workspace

[1] 10

and retains the value of the last index. However, if we write another loop and use “i” as an index again, it will be overwritten to the first value of the new index.

Indices are commonly a sequence of integers, but this need not be the case. E.g. below we loop over only the indices 3, 5, and 7.

y = numeric(10)
y

 [1] 0 0 0 0 0 0 0 0 0 0

for(i in c(3,5,7)){
  y[i] = x[i] + 5
}
y

 [1]  0  0 18  0 20  0 22  0  0  0

while loops

The while loop is a more general version of the for loop. Instead of iterating sequentially over the indices, it continues while a condition is true, and terminates as soon as the condition is false. We can replicate the functionality of the for loop using a while loop like so

y = numeric(10)
y

 [1] 0 0 0 0 0 0 0 0 0 0

i = 1
while(i <= 10){
  y[i] = x[i] + 5
  i = i+1
}
y

 [1] 16 17 18 19 20 21 22 23 24 25

Notice that some features built into the for loop are not available for the while loop, so we must write them out explicitly. We start by setting the initial value of the index to 1. We then set the loop to continue so long as i \(\leq\) 10. For the first iteration, i = 1, so the condition holds, and the code within the loop is executed. In the final line of the loop, we must explicitly increment i by 1. This functionality is not built into the while loop the way it is in the for loop.

Now that end of the {} has been reached, it cycles back to the beginning. Since i = 2 is still \(\leq\) 10, the loop continues. When it reaches i = 10, the code will execute and “i” will be increased to 11. At this point, “i” will no longer be \(\leq\) 10 so the loop will terminate.

Note that the changing of the index is incredibly important. If this line were left out, “i” would always be \(\leq\) to 10 so the loop would continue to run forever! Or at least until the power went out.

while loops are more flexible than for loops because the conditions for the loop to continue can be more creatively defined. for loops in R automatically increment the index by 1 after each iteration, whereas indices in while loops can be changed in different ways.

E.g. we can write a while loop as follows

y = numeric(10)
y

 [1] 0 0 0 0 0 0 0 0 0 0

i = 1
flag = F
while(flag == F){
  y[i] = x[i] + 5
  i = i+1
  if(x[i] == 18){
    flag = T
  }
}
y

 [1] 16 17 18 19 20 21 22  0  0  0

Here, the loop is indexed not by “i”, but by a variable called “flag”. The flag is initialized as FALSE, and the while loop continues until the flag is no longer FALSE, i.e. when it is changed to TRUE. Via an if statement, this happens when “x” is 18, at which point the flag is ‘raised’ and the while loop terminates.

This can be useful when the loop needs to run for an unknown length of time, until some event happens, at which point it should stop. Here we would not know when 18 would pop up in the data, so it would be impossible to define a range of indices in advance to terminate the loop at the precise moment the event occurred.

Nested loops

Loops can also be used on two-dimensional arrays. Take

a = matrix(x, nrow=5, ncol=2)
a

     [,1] [,2]
[1,]   11   16
[2,]   12   17
[3,]   13   18
[4,]   14   19
[5,]   15   20

b = matrix(0, nrow=5, ncol=2)
b

     [,1] [,2]
[1,]    0    0
[2,]    0    0
[3,]    0    0
[4,]    0    0
[5,]    0    0

To loop over all elements in “a”, we can use 2 for loops, one nested within the another. E.g.

for(i in 1:nrow(a)){
  for(j in 1:ncol(a)){
    b[i,j] = a[i,j] + 5
  }
}
b

     [,1] [,2]
[1,]   16   21
[2,]   17   22
[3,]   18   23
[4,]   19   24
[5,]   20   25

The first for loop (“i”) goes from 1 to the number of rows in “a”, in this case 5. The second (“j”) goes from 1 to the number of columns in “a”, in this case 2. The code first goes through all elements of “j”, then through all elements of “i”. It starts at element (1,1), then moves to element (1,2), then (2,1), (2,2), then (3,1), (3,2), until (5,2). In this way, the loops can iterate over each element of the matrix.

apply loops

R makes looping over matrices easier via apply loops. The apply loops in R can apply a function to either rows or columns of a matrix.

apply(a, 1, mean)

[1] 13.5 14.5 15.5 16.5 17.5

This applies the mean() function to each row of “a”. The 1 here stands for rows. Of course, the rowMeans functions is more efficient here

rowMeans(a)

[1] 13.5 14.5 15.5 16.5 17.5

but apply is not limited in terms of what function can be applied. Functions can be applied to columns by using the value 2 instead of 1.

apply(a, 2, mean)

[1] 13 18

which can be verified using

colMeans(a)

[1] 13 18

The apply loops is a general family of loops in R. Others may be more useful in context, e.g. sapply(), lapply(), etc. See their documentation for further details.

Vectorized operations

Because R is a higher level language, the execution of loops in R is less efficient than other programming languages. In most cases, the difference is negligible, but when looping over hundreds of thousands (or more) values, the lag might become noticeable. For this reason, using vectorized operations is recommended. Vectorized operations are applied to an entire vector, performing the operation for all elements within it without having to loop over each one. E.g., we can write

x + 5

 [1] 16 17 18 19 20 21 22 23 24 25

without using a loop. Not only is this more efficient to code, but it should also be faster to execute. This is because vectorized operations are executed in C, which is more computationally efficient than R. Use vectorized operations when possible, but keep loops in mind when flexibility is needed.