── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4.9000 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.2 ✔ tibble 3.3.0
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.0.4
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
This page provides an introduction to commonly used R object types and their dimensions. The page begins with a discussion of object dimensions including scalar, vector, matrices and arrays. It then discusses some of the basic object types including logical, numeric, and character. The page finishes with a discussion of advanced objects including factors, dates, data frames, and lists.
28.1 Dimensions
28.1.1 Scalars
So far we have dealt with all R objects as scalars, i.e. a single value of that data type. For example, the following are all scalars.
# ScalarsTRUE
[1] TRUE
2.5
[1] 2.5
"a scalar"
[1] "a scalar"
Note that “a scalar” is considered a scalar character.
28.1.2 Vectors
If we want to combine multiple scalars, we will typically put them into a vector. We construct vectors using the c() function.
# Vectorsc(TRUE, FALSE)
[1] TRUE FALSE
c(2.5, 6, 7.2)
[1] 2.5 6.0 7.2
c("a", "character", "vector")
[1] "a" "character" "vector"
Vector length can be determined using the length() function.
# Vector lengthvl <-c(TRUE, FALSE)length(vl)
[1] 2
vn <-c(1, 2, 3, 4)length(vn)
[1] 4
vc <-c("a", "character", "vector")length(vc)
[1] 3
The data type can be determined using the class() function.
# Vector typeclass(vl)
[1] "logical"
class(vn)
[1] "numeric"
class(vc)
[1] "character"
Sometimes it is useful to give names to elements of the vector. Names can be assigned using the names() function.
# Surface area of Great Lakesarea <-c(82100, 57800, 59600, 25670, 19010) # km^2# Add namesnames(area) <-c("Superior", "Ontario", "Huron", "Michigan", "Erie")area
Superior Ontario Huron Michigan Erie
82100 57800 59600 25670 19010
Sometimes these names are created from a function.
# Named vectortable(OrchardSprays$treatment)
A B C D E F G H
8 8 8 8 8 8 8 8
28.1.2.1 Accessing
Accessing vector elements can be done using indices, a logical vector, or names. Indexing in R starts at 1 (rather than 0 like in some languages). To utilize a logical vector, the vector
# Construct vectorv <-c("one", "two", "three")# Access using indicesv[1]
[1] "one"
v[-2]
[1] "one" "three"
v[2:3]
[1] "two" "three"
v[-c(1,2)]
[1] "three"
# Access using logical vectorv[c( TRUE, FALSE, FALSE)]
[1] "one"
v[c( TRUE, FALSE, TRUE)]
[1] "one" "three"
v[c(FALSE, TRUE, TRUE)]
[1] "two" "three"
v[c(FALSE, FALSE, FALSE)]
character(0)
When using indices, you can repeat elements.
# Index repeatsv[c(1,1,2,2,3)]
[1] "one" "one" "two" "two" "three"
Care is needed when using a logical vector because the logical vector will be (if possible) replicated to be the same length as the original vector. This replication
# TRUE is replicated 3 timesv[TRUE]
[1] "one" "two" "three"
# New vectorv <- LETTERS[1:4]v[c(TRUE, FALSE)] # Replicated
[1] "A" "C"
If the vector has names, the names can be used to access member elements.
area[c("Ontario", "Erie")]
Ontario Erie
57800 19010
28.1.3 Matrices
We can extend from vector, a one-dimensional object, to a matrix, a two-dimensional object. There are a variety of ways to construct a matrix including matrix(), cbind(), and rbind().
# Construct a matrix matrix(1:6, nrow =3, ncol =2)
[,1] [,2]
[1,] 1 4
[2,] 2 5
[3,] 3 6
# Construct a matrix inputting elements by rowsmatrix(1:6, nrow =3, ncol =2, byrow =TRUE)
[,1] [,2]
[1,] 1 2
[2,] 3 4
[3,] 5 6
# Construct a matrix be binding columnscbind(1:3, 4:6)
[,1] [,2]
[1,] 1 4
[2,] 2 5
[3,] 3 6
# Construct a matrix by binding rowsrbind(1:3, 4:6)
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
The dimensions of the matrix can be found using dim(), nrows(), and ncols().
# Dimensionsm <-matrix(1:6, nrow =2)dim(m) # row by column
[1] 2 3
nrow(m)
[1] 2
ncol(m)
[1] 3
The type of matrix can be determined using typeof().
area volume
Superior 82100 12100
Ontario 57800 4920
Huron 59600 3540
Michigan 25670 484
Erie 19010 1640
28.1.3.1 Accessing
Matrices can be accessed in the same way vectors were, but now there are two dimensions. If a dimension is blank, then we will get all elements in that dimension.
# Matrix accessing by numbersm[1,2]
[1] 12100
m[1:2,]
area volume
Superior 82100 12100
Ontario 57800 4920
m[-3,]
area volume
Superior 82100 12100
Ontario 57800 4920
Michigan 25670 484
Erie 19010 1640
Superior Ontario Huron Michigan Erie
12100 4920 3540 484 1640
m["Ontario",]
area volume
57800 4920
It is common to forget the comma. You will likely get something you probably don’t expect.
# Forgot the commam[4]
[1] 25670
m[7]
[1] 4920
28.1.4 Arrays
Arrays can be constructed that extend beyond the two-dimensional objects to higher dimensions. These are not commonly used and thus we will only show a quick example of how to construct one.
# Construct an arraya <-array(1:12,dim =4:2)# Print the arraya
Logical (or boolean) variables have only two values: TRUE and FALSE.
# LogicalTRUE
[1] TRUE
FALSE
[1] FALSE
is.logical(TRUE)
[1] TRUE
is.logical(FALSE)
[1] TRUE
other variable types are not logicals
is.logical(1)
[1] FALSE
is.logical("a")
[1] FALSE
28.2.1.1 Use TRUE and FALSE
Returning to our discussion of logicals. By default, T and F are assigned the values TRUE and FALSE.
# Use TRUE and FALSET
[1] TRUE
F
[1] FALSE
is.logical(T)
[1] TRUE
is.logical(F)
[1] TRUE
Caution: the objects T and F can be redefined.
# Confusing redefinitionT <-"a"is.logical(T)
[1] FALSE
28.2.2 Numeric
Numeric is the default data type for any number.
# Numeric4
[1] 4
is.numeric(4)
[1] TRUE
2.5
[1] 2.5
is.numeric(2.5)
[1] TRUE
is.numeric(pi)
[1] TRUE
x <-sqrt(2)is.numeric(x)
[1] TRUE
28.2.2.1 Constructing
Numeric vectors can be constructed in a variety of ways.
# Sequential integer vector construction2:5
[1] 2 3 4 5
5:1
[1] 5 4 3 2 1
-1:3
[1] -1 0 1 2 3
1:5.99
[1] 1 2 3 4 5
# General vector constructionseq(from =2, to =10, by =2)
[1] 2 4 6 8 10
seq(from =0, to =1, by =0.2)
[1] 0.0 0.2 0.4 0.6 0.8 1.0
28.2.3 Integer
In R, a numeric can be thought of as equivalent to a double in other languages. You can also explicitly create and use integers. To create an integer postpend the integer with L.
# Integer3L
[1] 3
is.integer(3L)
[1] TRUE
I mention integers so that if you see code where a number is postpended with an L you will know what is going on. Otherwise, this distinction is not important because 1) an integer is treated as a numeric by R and 2) when doing any calculation, the integer is converted to a numeric for calculations.
# Integer conversionx <-3Lis.integer(x)
[1] TRUE
is.numeric(x)
[1] TRUE
is.integer(sqrt(x^2))
[1] FALSE
is.numeric(sqrt(x^2))
[1] TRUE
28.2.4 Character
A character is the default data type for any non-logical and non-numeric value. A character variable must be enclosed in quotes. You can use either single or double quotes, but you must use the same (single or double) to define a character. If not, R will look for an object with the same name.
# Characteris.character("a")
[1] TRUE
is.character("2b")
[1] TRUE
is.character("2023-07-06")
[1] TRUE
is.character("Is this really a character?")
[1] TRUE
As you can see from the example above, the character refers to single characters but also strings of multiple characters.
28.2.4.1 Construct
To construct character vectors, the paste() and paste0() functions are often useful. By default, the paste() includes a space as a separator while the paste0() function has no separation.
# Example paste() usagepaste( "group", 1)
[1] "group 1"
paste( "group", 1:2)
[1] "group 1" "group 2"
paste0("group", 1:2)
[1] "group1" "group2"
paste( "group", 1:2, sep ="-")
[1] "group-1" "group-2"
28.2.4.2 Extract
To extract individual characters, use the substr() function.
# Extract characterss <-"This is my very long character."substr(s, start =1, stop =4)
[1] "This"
substr(s, start =nchar(s)-9, stop =nchar(s))
[1] "character."
As we will see characters form the basis for many advanced data types including dates and factors.
The stringr package provides many helpful functions for interacting with characters or, equivalently, strings. This package is included in the tidyverse.
28.3 Advanced Types
Here we introduce a variety of advanced data types including factors, dates, data frames, and lists.
28.3.1 Factors
Factors are a special type of character vector that allows the user more control over how the elements of the vector are used in visualizations and modeling.
Character vectors are inherently ordered alphabetically with lowercase letters coming before their uppercase equivalents.
If we are creating a visualization or performing an analysis, the order we probably wanted was “control” followed by treatment A doses 10, 20, and 100 followed by treatment D doses 10, 20, and 100.
To create a factor vector use the factor() function.
# Construct factorf <-factor(c("A","A","B","B","B"))# Check if `f` is a factoris.factor(f)
[1] TRUE
# Check class of `f`class(f)
[1] "factor"
# Look at `f`f
[1] A A B B B
Levels: A B
Compare the output above to a character vector:
# Convert factor to characteras.character(f)
[1] "A" "A" "B" "B" "B"
Notice that there are no quotes and there is a second line that indicates the levels of the character vector.
The internal representation of factors in R is a numeric vector with a lookup table for the levels of the factor.
# Numeric vectoras.numeric(f)
[1] 1 1 2 2 2
# Lookup tablelevels(f)
[1] "A" "B"
You can use the levels function to change the values for the factor.
# Change levelslevels(f) <-c("C","D")f
[1] C C D D D
Levels: C D
Rather than changing the levels this way, I suggest you use forcats::fct_recode() or something similar.
28.3.1.1 Reorder levels
By default, factor levels will be ordered alphabetically. For example,
# Factor default orderingf <-factor(rep(c("Dose10","Dose20","Dose100"), each =2))f
In the factor function, there is an argument called ordered. This serves a different purpose and is beyond the scope of this course. Thus, just ignore the ordered argument.
Sometimes, you simply need to just reorder one level to the first position. This is particularly true when performing a regression analysis and you are trying to set the reference level.
To move a single level to the first position, use the relevel function. For example,
# Construct a factorf <-factor(c("Control","A10","A20","A30"))# Check factor orderlevels(f)
[1] "A10" "A20" "A30" "Control"
# Put `Control` as first factor levelf <-relevel(f, ref ="Control")levels(f)
[1] "Control" "A10" "A20" "A30"
In regression modeling, the first factor level will be treated as the reference level, i.e. the level associated with the intercept.
28.3.2 Dates
Dates can be extremely difficult to work with due to inconsistency in how dates are formatted to dealing with time zones.
For any organization that utilizes dates, those dates should be recorded using the ISO 8601 standard. Specifically, the date should be represented in
YYYY-MM-DD
where YYYY represents the 4 digit year, MM represents the 2 digit month, and DD represents the two digit day. For example, 2023-07-02 is July 2, 2023. Note that the preceding zeros are required.
This date can be converted to a Date object by using the as.Date() function.
# Dated2 <-as.Date(d1)d2
[1] "2023-07-02"
class(d2)
[1] "Date"
The date will be printed out (by default) using the YYYY-MM-DD standard.
# ISO Formatas.Date("2023-07-02")
[1] "2023-07-02"
Most other formats will not give the date you are expecting.
# Alternative formatsas.Date("07-02-2023")
[1] "0007-02-20"
as.Date("02/07/2023")
[1] "0002-07-20"
as.Date("07-02-23")
[1] "0007-02-23"
as.Date("02/07/23")
[1] "0002-07-23"
If you need to read dates that are not in the standard YYYY-MM-DD format, you can use the format argument to specify the correct format.
# Read dates using formatas.Date("07-02-2023", format ="%m-%d-%Y")
[1] "2023-07-02"
as.Date("07/02/2023", format ="%m/%d/%Y")
[1] "2023-07-02"
as.Date("07-02-23", format ="%m-%d-%y")
[1] "2023-07-02"
as.Date("07/02/23", format ="%m/%d/%y")
[1] "2023-07-02"
Read the helpfile for strptime for more options for setting the format of date (and time) objects.
28.3.2.2 Write
When printing out the date, the default is to print out using the YYYY-MM-DD standard. If you want to print out in a different format, you will need to specify the desired format using the format function.
# Format date outputd2 # default
[1] "2023-07-02"
format(d2, "%m/%d/%y")
[1] "07/02/23"
format(d2, "%a %b %d, %Y")
[1] "Sun Jul 02, 2023"
28.3.3 Data Frame
Vectors, matrices, and arrays can only have one type of data.
# Mixing typesc(TRUE, FALSE, 2) # logicals converted to numeric
[1] 1 0 2
c("a", 2, pi) # numeric converted to character
[1] "a" "2" "3.14159265358979"
c(TRUE, "a", FALSE) # logicals converted to character
[1] "TRUE" "a" "FALSE"
c(TRUE, "a", 2) # everything converted to character
[1] "TRUE" "a" "2"
c(c(TRUE, 1), "a") # logical converted to numeric first
[1] "1" "1" "a"
Data frames are special matrices that allow each column to be a different data type.
28.3.3.1 Construct
There are a variety of ways to construct a data.frame within R.
# Construct using data.framed <-data.frame(var1 =c(1:2),var2 =c(TRUE, FALSE),var3 =c("a","b"))is.data.frame(d)
[1] TRUE
# Construct using tibbled2 <-tibble(var1 =c(1:2),var2 =c(TRUE, FALSE),var3 =c("a","b"))is.data.frame(d2)
[1] TRUE
# Construct using tribble (note the `r`)d3 <-tribble(~var1, ~var2, ~var3,1, TRUE, "a",2, FALSE, "b")is.data.frame(d3)
[1] TRUE
Sometimes, you want a data.frame that contains every combination of the values of the variables in a collection of vectors. This is particularly useful for designing experiments, especially Monte Carlo simulation experiments.
# Every combinationeg <-expand.grid(var1 =c(1:2),var2 =c(TRUE, FALSE),var3 =c("a","b"))eg
var1 var2 var3
1 1 TRUE a
2 2 TRUE a
3 1 FALSE a
4 2 FALSE a
5 1 TRUE b
6 2 TRUE b
7 1 FALSE b
8 2 FALSE b
28.3.3.2 Access
The techniques to access matrix elements can also be used with data frames.
# Access using indicesd[1, 2]
[1] TRUE
d[1, ]
var1 var2 var3
1 1 TRUE a
d[ ,-2]
var1 var3
1 1 a
2 2 b
# Access using namesrownames(d)
[1] "1" "2"
colnames(d)
[1] "var1" "var2" "var3"
d$var1
[1] 1 2
d[,c("var2","var3")]
var2 var3
1 TRUE a
2 FALSE b
# Type conversionis.vector(d$var1) # converted to vector
[1] TRUE
is.data.frame(d[,c("var2","var3")]) # still a data frame
[1] TRUE
is.vector( d[,"var1"])
[1] TRUE
is.data.frame(d[,"var1"])
[1] FALSE
There are a couple of other functions that making construct data frames easier. The expand.grid() function constructs a data frame with every combination of the variables provided.
28.3.3.3 Read
When we read data from files, e.g. using the read_csv function, the result is a data.frame. This leads directly into using these data frames for statistical analysis.
d <-read_csv("ToothGrowth.csv") # returns a data.frame
Generally tidyverse functions are constructed to 1) have a data.frame as the first argument and 2) return a data.frame (or tibble). This allows for a tidyverse data pipeline.
28.3.4 List
Vectors in R can only contain one data type. A list is an alternative type of vector that can contain any other type of object in each element of the list.
To construct a list, you can use the list() function.
# Construct a simple listl <-list(1, "a", TRUE)# Check if this is a listis.list(l)
[1] TRUE
List elements can be named.
# By default there are no namesnames(l)
NULL
# Assign some namesnames(l) <-c("one", "two", "three")# View the assigned namesnames(l)
[1] "one" "two" "three"
If the list elements are named, they can be accessed using those names and a $ sign.
l$one # first element of the list
[1] 1
l$two # second element of the list
[1] "a"
Lists can also be constructed by using the names during construction.
# Construct a list using namesl <-list(numbers =c(1, 2, 3.5),characters =c("a", "character", "vector", "in", "a", "list"),logicals =c(TRUE, FALSE, TRUE))l[[1]] # first list element
[1] 1.0 2.0 3.5
l$characters # list element named `characters`
[1] "a" "character" "vector" "in" "a" "list"
l[['characters']]
[1] "a" "character" "vector" "in" "a" "list"
As we have already seen, list elements can be any type of object. Thus you can have lists within lists.
# Construct a list within a listl <-list(this =1,that =list(a ="list element within a list"))is.list(l)
[1] TRUE
is.list(l$that)
[1] TRUE
Lists can be accessed in a variety of ways similar to vectors, but you need to use double square brackets.
# Access using indicesl[[1]]
[1] 1
l[[2]]
$a
[1] "list element within a list"
# Access using namesl[['that']]
$a
[1] "list element within a list"
l[['that']][['a']]
[1] "list element within a list"
# Accessing using $l$that
$a
[1] "list element within a list"
l$that$a
[1] "list element within a list"
Many functions that produce statistical analyses have output that are lists.
# Regressionm <-lm(breaks ~ wool + tension, data = warpbreaks)is.list(m)