class(log)[1] "function"
R is a functional programming language. Thus understanding how to use functions is paramount to be effectively use the language.
class(log)[1] "function"
Functions take in some input (\(x\)) and return some output (\(y\)). Mathematically, we might consider \[y = f(x)\] where \(x\) is the input and \(y\) is the output. While most mathematical functions return numbers, functions in R may return a wide variety of output including numbers or characters or more complicated objects like data frames and lists.
# Arguments
log(10)[1] 2.302585
log(x = 10)[1] 2.302585
log(10, base = exp(1))[1] 2.302585
log(10, base = 10)[1] 1
log(x = 10, base = 10)[1] 1
Take a look at the arguments.
args(log)function (x, base = exp(1))
NULL
In the log function, the default value for the base argument is exp(1).
# Default arguments
all.equal(
log(10),
log(10, base = exp(1))
)[1] TRUE
# Matching arguments by position
log(10, exp(1))[1] 2.302585
log(exp(1), 10)[1] 0.4342945
# Matching arguments by name
log(x = 10, base = exp(1))[1] 2.302585
log(base = exp(1), x = 10)[1] 2.302585
# Matching arguments by partial names
log(10, b = exp(1))[1] 2.302585
log(10, ba = exp(1))[1] 2.302585
log(10, bas = exp(1))[1] 2.302585
log(10, base = exp(1))[1] 2.302585
# Use R object as input
y <- 100
log(y, base = 10)[1] 2
Functions return objects and these objects can be a variety of types.
# Return values
class(log(10))[1] "numeric"
class(as.data.frame(10))[1] "data.frame"
# Return logicals
class(all.equal(1,1))[1] "logical"
class(all.equal(1,2))[1] "character"
# Return lists
m <- lm(len ~ dose, data = ToothGrowth)
class(m)[1] "lm"
class(summary(m))[1] "summary.lm"
It is often helpful to build your own functions.
Define a function using the function() function.
# Create a function
add <- function(x, y) { # x and y are inputs
x + y # definition within { }
}
# Argument by order
add(1, 2)[1] 3
# Argument by name
add(x = 1, y = 2)[1] 3
# Vector arguments
add(1:2, 3:4)[1] 4 6
# Vector and scalar argument
add(1:2, 3) # 3 is recycled[1] 4 5
# Warning due to different argument sizes
add(1:2, 3:5) # 1:2 is partially recycledWarning in x + y: longer object length is not a multiple of shorter object
length
[1] 4 6 6
Default arguments are often provided in functions so the user does not have to specify all of the arguments.
# Default argument for `base` in the `log()` function is the number e
log(10) # default[1] 2.302585
log(10, base = exp(1)) # explicit[1] 2.302585
# Define function with default arguments
add <- function(x = 1, y = 2) {
x + y
}
# Uses both default arguments
add()[1] 3
# Uses default argument for y (second argument)
add(3)[1] 5
# Suses default argument for x since y is defined
add(y = 5)[1] 6
R functions will return the last value created, but better practice is to explicitly return using the return() function.
# Create function with explicit return
add <- function(x, y) {
return(x + y)
}
# Run function
add(1, 2)[1] 3
Suppose you want to return a TRUE/FALSE depending on whether a specific character is in a string. As soon as you find the character, you can immediately return TRUE. If you don’t find the character, you can return FALSE.
# Define function to check string for a character
is_char_in_string <- function(string, char) {
for (i in 1:nchar(string)) {
if (char == substr(string, i, i))
return(TRUE)
}
return(FALSE)
}
# Examples
is_char_in_string("this is my string", "a")[1] FALSE
is_char_in_string("this is my string", "s")[1] TRUE
Errors will automatically be produced by the underlying functions you used.
# Error since "a" is not numeric
add(1, "a") # add used the `+` functionError in x + y: non-numeric argument to binary operator
There are a variety of ways to communicate to the user. Use message() for communicating a message.
# Create function with message
add <- function(x, y) {
message(paste(x, "+", y, "="))
return(x + y)
}
# Example message
add(1, 2)1 + 2 =
[1] 3
Use warning() when you think the results may not be what the user is expecting.
add <- function(x, y) {
if (length(x) != length(y))
warning("'x' and 'y' have inequal length.")
return(x + y)
}
# No warning
add(1, 2)[1] 3
# Warning
add(1:2, 3)Warning in add(1:2, 3): 'x' and 'y' have inequal length.
[1] 4 5
Use stop() when it is clear the user is not doing what they intend.
# Define function with
add <- function(x, y) {
if (!is.numeric(x)) stop("'x' is not numeric!")
if (!is.numeric(y)) stop("'y' is not numeric!")
return(x + y)
}
# No Error
add(1, 2)[1] 3
# Error
add("a", 2)Error in add("a", 2): 'x' is not numeric!
add(1, "a")Error in add(1, "a"): 'y' is not numeric!
As shown previously, this would have caused an error in our original add() function. Perhaps, this version of the error is more helpful.
The stopifnot() function can be used to construct reasonably informative error messages automatically.
# stopifnot() example
add <- function(x, y) {
stopifnot(is.numeric(x))
stopifnot(is.numeric(y))
return(x + y)
}
# No error
add(1, 2)[1] 3
Now with an error.
add(1, "b")Error in add(1, "b"): is.numeric(y) is not TRUE
These are some issues I want you to be aware of so you (I hope) avoid issues in the future.
Specifying argument values must be done using =. You can simultaneously define R objects using <- when specifying argument values. Generally, this should be avoided.
# Define function
my_fancy_function <- function(x, y) {
return(x + y * 100)
}What is the result of the following?
# Weird assignment
my_fancy_function(y <- 5, x <- 4)[1] 405
What happened? We assigned y the value 5 and x the value 4 outside the function. Then, we passed y(5) as the first argument of the function and x(4) as the second argument fo the function.
This was equivalent to
# Prefer assignment outside the function
y <- 5
x <- 4
my_fancy_function(x = y, y = x)[1] 405
So, when assigning function arguments, use =. Also, it is probably helpful to avoid naming objects the same name as the argument names.
R functions will look outside their function if objects are missing.
# Define function with missing object
f <- function() {
return(y) # y was never defined
}What is the result of the following?
# What will this result be.
f()[1] 5
R searches through a series of environments to find the variable called y. The first environment is within the function. The second environment is outside the current function, possibly within another function. Eventually, the environment becomes your current R session.
But, if you change an object’s value inside the function, this will not be retained outside the function,
# Create function
f <- function() {
a <- a + 1 # change object's value
print(a)
}
# Create an object
a <- 1
f()[1] 2
a # object's value is not changed outside the function[1] 1
Sometimes you get baffling error messages due to closure errors or special errors.
mean[1]Error in mean[1]: object of type 'closure' is not subsettable
log[1]Error in log[1]: object of type 'special' is not subsettable
This is related to functions having a typeof closure or special.
# typeof
typeof(mean)[1] "closure"
typeof(log)[1] "special"
You will see closure errors much more commonly than special errors. Both of these errors indicate problems using a function.
Generic functions in R will use the class() of the first argument to determine what specific version of the function will be used.
The mean() function is an example of a generic function.
# UseMethod() indicates a generic function
print(mean)function (x, ...)
UseMethod("mean")
<bytecode: 0x104ac5260>
<environment: namespace:base>
Take a look at the help file
# Look at the help file
?meanNotice the words “Generic function”. This means what the function does will depend on the class of the first argument to the function.
# Using mean() on different object types
mean(1:5)[1] 3
mean(as.Date(c("2023-01-01", "2022-01-01")))[1] "2022-07-02"
I bring up generic functions primarily to point out that it can be hard to track down the appropriate helpfile. Generally you will look up <function>.<class>.
For example,
# Determine the class
class(as.Date(c("2023-01-01", "2022-01-01")))[1] "Date"
So you need to look up the helpfile for mean.Date().
# Look up the function
?mean.DateThis didn’t provide the actual help information. Because it went somewhere, this was the intended behavior. Why it isn’t documented, I have no idea.
# Integer
class(1:5)[1] "integer"
# Try mean.integer help
?mean.integerThere is typically a default method that will be used if a specific method can’t be found.
# Try mean.default
?mean.defaultAnother function that we have used for multiple different data types is summary().
# Various uses of the generic summary function
summary(ToothGrowth$len) # numeric Min. 1st Qu. Median Mean 3rd Qu. Max.
4.20 13.07 19.25 18.81 25.27 33.90
summary(ToothGrowth$supp) # factorOJ VC
30 30
summary(ToothGrowth) # data.frame len supp dose
Min. : 4.20 OJ:30 Min. :0.500
1st Qu.:13.07 VC:30 1st Qu.:0.500
Median :19.25 Median :1.000
Mean :18.81 Mean :1.167
3rd Qu.:25.27 3rd Qu.:2.000
Max. :33.90 Max. :2.000
summary(lm(len ~ supp, data = ToothGrowth)) # lm object
Call:
lm(formula = len ~ supp, data = ToothGrowth)
Residuals:
Min 1Q Median 3Q Max
-12.7633 -5.7633 0.4367 5.5867 16.9367
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 20.663 1.366 15.127 <2e-16 ***
suppVC -3.700 1.932 -1.915 0.0604 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 7.482 on 58 degrees of freedom
Multiple R-squared: 0.05948, Adjusted R-squared: 0.04327
F-statistic: 3.668 on 1 and 58 DF, p-value: 0.06039
Take a look at the helpfiles for the different summary functions
# Summary function helpfiles
?summary
?summary.numeric
?summary.factor
?summary.data.frame
?summary.lmSome functions have a ... argument. This argument will get expanded by the underlying code and treated appropriately.
# Sum helpfile
?sumFor the sum() function, it will sum everything.
# Sum scalars
sum(1, 2, 3)[1] 6
# Sum a vector
sum(5:6)[1] 11
# Sum scalars and vetor
sum(1, 2, 3, 5:6)[1] 17
Typos get ignored
# Typo in argument name
sum(c(1, 2, NA), na.mr = TRUE) # vs [1] NA
sum(c(1, 2, NA), na.rm = TRUE) [1] 3