Programming functions

A function in programming takes some input(s), performs some operations on the input(s), and returns an output. We have been using built-in functions throughout these tutorials, but this tutorial will focus on how to write our own. This can be useful when a pre-existing function does not exist to perform the task we need to accomplish.

A function has 4 parts. The name of the function, its arguments (the inputs), the body (the code performing the operations), and the output to be returned. We will cover two examples of how to write functions by writing them ourselves: the mean and the median.

Mean

The arithmetic mean, or average, is the sum of all values in a vector divided by the number of values, its length. Since a “mean” function already exists in R, we will write our own called our.mean().

our.mean = function(x){
  sumx = sum(x)
  lengthx = length(x)
  meanx = sumx/lengthx
  return(meanx)
}

Let’s break down how this function works. First, we create a variable named “our.mean” and define it as a function. The arguments of the function are what goes into the function, the input. This is usually provided by the user. In this case, the input consists of a simple vector of numbers called “x”. The arguments are contained within (). Multiple arguments can be provided, separated by commas.

Once the function is called by the user, it takes “x” and performs operations on it. First, it calculates the sum of all values in “x”, and saves it as sumx. Then it calculates the length of “x” to determine how many values are being summed over. Finally, it calculates the mean by dividing the sum by the length.

The final line is the return() line. This tells R what to output from this function. It should be noted that sumx, lengthx, and meanx will not be created as variables in the global environment workspace. They will exist in a workspace specific to the function, and will be deleted once the function has performed its operations. Thus, the return() statement is needed to tell R what should come out of the function. We don’t need the sum or the length, just the mean, so we return meanx.

If the value to be returned is not explicitly specified using a return() statement, the last line of the function will be returned by default. I like to explicitly specify what I want to be returned so I can be sure I know exactly what it is.

Let’s apply our function to some data.

x = 11:20
x
 [1] 11 12 13 14 15 16 17 18 19 20
mean(x)
[1] 15.5
our.mean(x)
[1] 15.5

Notice how our.mean() returns the same value as R’s mean() function. We can also check our workspace, noting that the intermediate calculations are not saved in the global environment.

ls()
[1] "our.mean" "x"       

Median

Let’s turn to a function to calculate the median.

our.median = function(x){
  lengthx = length(x)
  sortedx = sort(x)
  if(lengthx %% 2 == 0){
    medianx = (sortedx[lengthx/2] + sortedx[lengthx/2+1])/2    
  }
  else if(lengthx %% 2 != 0){
    medianx = sortedx[(lengthx+1)/2]
  }
  return(medianx)
}

As with the mean, the function takes a vector of numbers as input, “x”. The length is also calculated in the same way as the mean. To find the median, the values must be sorted in order of least to greatest, done using the sort() function.

An if and else if statement are used to perform the calculations based on 2 separate conditions. If the length of the vector is even, the median is calculated by averaging the middle 2 numbers. If the length is odd, the middle number is the median.

Whether the length is even or odd is calculated using modulo arithemetic. Essentially, lengthx %% 2 takes the length, divides it by 2, and gives the remainder. If the length is even, the remainder will be 0. If the length is odd, the number will not be 0. The if and else if statements check this condition.

Once a condition is checked, all that is left is to calculate the median. The median for vectors with even lengths is found by calculating the average of the 2 middle numbers, whereas the median for vectors of odd lengths just takes the middle number.

median(x)
[1] 15.5
our.median(x)
[1] 15.5

As with the mean function, our.median() returns the same value as R’s median() function.

Tips for writing functions

As this example demonstrates, functions can incorporate control flow statements. The ability to write a function allows us to define a procedure that can then be implemented easily on multiple inputs. Like loops, they save time by defining routines that are repeatedly applied.

As seen in the examples, functions can contain functions. The recommended practice when writing functions is to break up the operation into small parts, and write short simple functions for each one. This way, if the function behaves unexpectedly, it is easy to find where the error is made.

Functions in R can be debugged using the debug() function. This allows the user to walk through each step of the function as it is being executed to see the intermediate operations performed.