29  Functions

Author

Jarad Niemi

R Code Button

R is a functional programming language. Thus understanding how to use functions is paramount to be effectively use the language.

class(log)
[1] "function"

29.1 Function basics

Functions take in some input (\(x\)) and return some output (\(y\)). Mathematically, we might consider \[y = f(x)\] where \(x\) is the input and \(y\) is the output. While most mathematical functions return numbers, functions in R may return a wide variety of output including numbers or characters or more complicated objects like data frames and lists.

29.1.1 Arguments

# Arguments
log(10)
[1] 2.302585
log(x = 10)
[1] 2.302585
log(10, base = exp(1))
[1] 2.302585
log(10, base = 10)
[1] 1
log(x = 10, base = 10)
[1] 1

Take a look at the arguments.

args(log)
function (x, base = exp(1)) 
NULL

29.1.1.1 Default arguments

In the log function, the default value for the base argument is exp(1).

# Default arguments
all.equal(
  log(10),
  log(10, base = exp(1))
)
[1] TRUE

29.1.1.2 Positional matching

# Matching arguments by position
log(10, exp(1))
[1] 2.302585
log(exp(1), 10)
[1] 0.4342945

29.1.1.3 Name matching

# Matching arguments by name
log(x = 10, base = exp(1))
[1] 2.302585
log(base = exp(1), x = 10)
[1] 2.302585

29.1.1.4 Partial matching

# Matching arguments by partial names
log(10, b    = exp(1))
[1] 2.302585
log(10, ba   = exp(1))
[1] 2.302585
log(10, bas  = exp(1))
[1] 2.302585
log(10, base = exp(1))
[1] 2.302585

29.1.1.5 R objects as input

# Use R object as input
y <- 100
log(y, base = 10)
[1] 2

29.1.2 Return value

Functions return objects and these objects can be a variety of types.

# Return values
class(log(10))
[1] "numeric"
class(as.data.frame(10))
[1] "data.frame"
# Return logicals
class(all.equal(1,1))
[1] "logical"
class(all.equal(1,2))
[1] "character"
# Return lists
m <- lm(len ~ dose, data = ToothGrowth)
class(m)
[1] "lm"
class(summary(m))
[1] "summary.lm"

29.2 Building functions

It is often helpful to build your own functions.

29.2.1 Function definition

Define a function using the function() function.

# Create a function
add <- function(x, y) { # x and y are inputs
  x + y                 # definition within { }
}

# Argument by order
add(1, 2)
[1] 3
# Argument by name
add(x = 1, y = 2)
[1] 3
# Vector arguments
add(1:2, 3:4)
[1] 4 6
# Vector and scalar argument
add(1:2, 3) # 3 is recycled
[1] 4 5
# Warning due to different argument sizes
add(1:2, 3:5) # 1:2 is partially recycled
Warning in x + y: longer object length is not a multiple of shorter object
length
[1] 4 6 6

29.2.2 Default arguments

Default arguments are often provided in functions so the user does not have to specify all of the arguments.

# Default argument for `base` in the `log()` function is the number e
log(10)                # default
[1] 2.302585
log(10, base = exp(1)) # explicit
[1] 2.302585
# Define function with default arguments
add <- function(x = 1, y = 2) {
  x + y
}

# Uses both default arguments
add()
[1] 3
# Uses default argument for y (second argument)
add(3)
[1] 5
# Suses default argument for x since y is defined
add(y = 5)
[1] 6

29.2.3 Explicit return

R functions will return the last value created, but better practice is to explicitly return using the return() function.

# Create function with explicit return
add <- function(x, y) {
  return(x + y)
}

# Run function
add(1, 2)
[1] 3

Suppose you want to return a TRUE/FALSE depending on whether a specific character is in a string. As soon as you find the character, you can immediately return TRUE. If you don’t find the character, you can return FALSE.

# Define function to check string for a character
is_char_in_string <- function(string, char) {
  for (i in 1:nchar(string)) {
    if (char == substr(string, i, i))
      return(TRUE)
  }
  return(FALSE)
}

# Examples
is_char_in_string("this is my string", "a")
[1] FALSE
is_char_in_string("this is my string", "s")
[1] TRUE

29.2.4 Error handling

Errors will automatically be produced by the underlying functions you used.

# Error since "a" is not numeric
add(1, "a") # add used the `+` function
Error in x + y: non-numeric argument to binary operator

29.2.4.1 message()

There are a variety of ways to communicate to the user. Use message() for communicating a message.

# Create function with message
add <- function(x, y) {
  message(paste(x, "+", y, "="))
  return(x + y)
}

# Example message
add(1, 2)
1 + 2 =
[1] 3

29.2.4.2 warning()

Use warning() when you think the results may not be what the user is expecting.

add <- function(x, y) {
  if (length(x) != length(y))
    warning("'x' and 'y' have inequal length.")
  return(x + y)
}

# No warning
add(1, 2)
[1] 3
# Warning 
add(1:2, 3)
Warning in add(1:2, 3): 'x' and 'y' have inequal length.
[1] 4 5

29.2.4.3 stop()

Use stop() when it is clear the user is not doing what they intend.

# Define function with 
add <- function(x, y) {
  if (!is.numeric(x)) stop("'x' is not numeric!")
  if (!is.numeric(y)) stop("'y' is not numeric!")
  return(x + y)
}

# No Error
add(1, 2)
[1] 3
# Error
add("a", 2)
Error in add("a", 2): 'x' is not numeric!
add(1, "a")
Error in add(1, "a"): 'y' is not numeric!

As shown previously, this would have caused an error in our original add() function. Perhaps, this version of the error is more helpful.

29.2.4.4 stopifnot()

The stopifnot() function can be used to construct reasonably informative error messages automatically.

# stopifnot() example
add <- function(x, y) {
  stopifnot(is.numeric(x))
  stopifnot(is.numeric(y))
  return(x + y)
}

# No error
add(1, 2)
[1] 3

Now with an error.

add(1, "b")
Error in add(1, "b"): is.numeric(y) is not TRUE

29.3 Function issues

These are some issues I want you to be aware of so you (I hope) avoid issues in the future.

29.3.1 Argument vs object assignment

Specifying argument values must be done using =. You can simultaneously define R objects using <- when specifying argument values. Generally, this should be avoided.

# Define function
my_fancy_function <- function(x, y) {
  return(x + y * 100)
}

What is the result of the following?

# Weird assignment
my_fancy_function(y <- 5, x <- 4)
[1] 405

What happened? We assigned y the value 5 and x the value 4 outside the function. Then, we passed y(5) as the first argument of the function and x(4) as the second argument fo the function.

This was equivalent to

# Prefer assignment outside the function
y <- 5
x <- 4
my_fancy_function(x = y, y = x)
[1] 405

So, when assigning function arguments, use =. Also, it is probably helpful to avoid naming objects the same name as the argument names.

29.3.2 Scoping

R functions will look outside their function if objects are missing.

# Define function with missing object
f <- function() {
  return(y) # y was never defined
}

What is the result of the following?

# What will this result be.
f()
[1] 5

R searches through a series of environments to find the variable called y. The first environment is within the function. The second environment is outside the current function, possibly within another function. Eventually, the environment becomes your current R session.

But, if you change an object’s value inside the function, this will not be retained outside the function,

# Create function
f <- function() {
  a <- a + 1 # change object's value
  print(a)
}

# Create an object
a <- 1
f()
[1] 2
a # object's value is not changed outside the function
[1] 1

29.3.3 Closure errors

Sometimes you get baffling error messages due to closure errors or special errors.

mean[1]
Error in mean[1]: object of type 'closure' is not subsettable
log[1]
Error in log[1]: object of type 'special' is not subsettable

This is related to functions having a typeof closure or special.

# typeof
typeof(mean)
[1] "closure"
typeof(log)
[1] "special"

You will see closure errors much more commonly than special errors. Both of these errors indicate problems using a function.

29.3.4 Generic functions

Generic functions in R will use the class() of the first argument to determine what specific version of the function will be used.

29.3.4.1 mean()

The mean() function is an example of a generic function.

# UseMethod() indicates a generic function
print(mean)
function (x, ...) 
UseMethod("mean")
<bytecode: 0x104ac5260>
<environment: namespace:base>

Take a look at the help file

# Look at the help file
?mean

Notice the words “Generic function”. This means what the function does will depend on the class of the first argument to the function.

# Using mean() on different object types
mean(1:5)
[1] 3
mean(as.Date(c("2023-01-01", "2022-01-01")))
[1] "2022-07-02"

I bring up generic functions primarily to point out that it can be hard to track down the appropriate helpfile. Generally you will look up <function>.<class>.

For example,

# Determine the class
class(as.Date(c("2023-01-01", "2022-01-01")))
[1] "Date"

So you need to look up the helpfile for mean.Date().

# Look up the function
?mean.Date

This didn’t provide the actual help information. Because it went somewhere, this was the intended behavior. Why it isn’t documented, I have no idea.

# Integer
class(1:5)
[1] "integer"
# Try mean.integer help
?mean.integer

There is typically a default method that will be used if a specific method can’t be found.

# Try mean.default
?mean.default

29.3.4.2 summary()

Another function that we have used for multiple different data types is summary().

# Various uses of the generic summary function
summary(ToothGrowth$len)                    # numeric
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   4.20   13.07   19.25   18.81   25.27   33.90 
summary(ToothGrowth$supp)                   # factor
OJ VC 
30 30 
summary(ToothGrowth)                        # data.frame
      len        supp         dose      
 Min.   : 4.20   OJ:30   Min.   :0.500  
 1st Qu.:13.07   VC:30   1st Qu.:0.500  
 Median :19.25           Median :1.000  
 Mean   :18.81           Mean   :1.167  
 3rd Qu.:25.27           3rd Qu.:2.000  
 Max.   :33.90           Max.   :2.000  
summary(lm(len ~ supp, data = ToothGrowth)) # lm object

Call:
lm(formula = len ~ supp, data = ToothGrowth)

Residuals:
     Min       1Q   Median       3Q      Max 
-12.7633  -5.7633   0.4367   5.5867  16.9367 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   20.663      1.366  15.127   <2e-16 ***
suppVC        -3.700      1.932  -1.915   0.0604 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7.482 on 58 degrees of freedom
Multiple R-squared:  0.05948,   Adjusted R-squared:  0.04327 
F-statistic: 3.668 on 1 and 58 DF,  p-value: 0.06039

Take a look at the helpfiles for the different summary functions

# Summary function helpfiles
?summary
?summary.numeric
?summary.factor
?summary.data.frame
?summary.lm

29.3.5 … argument

Some functions have a ... argument. This argument will get expanded by the underlying code and treated appropriately.

# Sum helpfile
?sum

For the sum() function, it will sum everything.

# Sum scalars
sum(1, 2, 3)
[1] 6
# Sum a vector
sum(5:6)
[1] 11
# Sum scalars and vetor
sum(1, 2, 3, 5:6)
[1] 17

Typos get ignored

# Typo in argument name
sum(c(1, 2, NA), na.mr = TRUE) # vs 
[1] NA
sum(c(1, 2, NA), na.rm = TRUE) 
[1] 3

29.4 Suggestions

  • Define functions for tasks that you do 2 or more times
  • Use informative names (verbs) for the functions
  • Use consistent return values