Monday, March 9, 2015

Basics of Lists

Lists are a data type in R that are perhaps a bit daunting at first, but soon become amazingly useful. They are especially wonderful once you combine them with the powers of the apply() functions. This post will be part 1 of a two-part series on the uses of lists. In this post, we will discuss the basics - how to create lists, manipulate them, describe them, and convert them. In part 2, we’ll see how using lapply() and sapply() on lists can really improve your R coding.

Constructing a list

Let’s start with what lists are and how to construct them. A list is a data structure that can hold any number of any types of other data structures. If you have vector, a dataframe, and a character object, you can put all of those into one list object like so:

# create three different classes of objects
vec <- 1:4
df <- data.frame(y = c(1:3), x = c("m", "m", "f"))
char <- "Hello!"

# add all three objects to one list using list() function
list1 <- list(vec, df, char)

# print list
list1
## [[1]]
## [1] 1 2 3 4
## 
## [[2]]
##   y x
## 1 1 m
## 2 2 m
## 3 3 f
## 
## [[3]]
## [1] "Hello!"

We can also turn an object into a list by using the as.list() function. Notice how every element of the vector becomes a different component of the list.

# coerce vector into a list
as.list(vec)
## [[1]]
## [1] 1
## 
## [[2]]
## [1] 2
## 
## [[3]]
## [1] 3
## 
## [[4]]
## [1] 4

Manipulating a list

  • We can put names on the components of a list using the names() function. This is useful for extracting components. We could have also named the components when we created the list. See below:
# name the components of the list
names(list1) <- c("Numbers", "Some.data", "Letters")
list1
## $Numbers
## [1] 1 2 3 4
## 
## $Some.data
##   y x
## 1 1 m
## 2 2 m
## 3 3 f
## 
## $Letters
## [1] "Hello!"
# could have named them when we created list
another.list <- list(Numbers = vec, Letters = char)
  • Extract components from a list (many ways): the first is using the [[ ]] operator (notice two brackets, not just one). Note that we can use the single [ ] operator on a list, but it will return a list rather than the data structure that is the component of the list, which is normally not what we would want to do. See what I mean here:
# extract 3rd component using [[]] -> this returns a *string*
list1[[3]]
## [1] "Hello!"
# print a list containing the 3rd component -> this returns a *list*
list1[3]
## $Letters
## [1] "Hello!"

It is also possible to extract components using the component’s name as we see below. Again, be careful about the [ ] vs [[ ]] operator in the second way. You need the [[ ]] to return the data structure of the component.

# extract 3rd component using $
list1$Letters
## [1] "Hello!"
# extract 3rd component using [[ ]] and the name of the component
list1[["Letters"]]
## [1] "Hello!"
  • Subsetting a list - use the single [ ] operator and c() to choose the components
# subset the first and third components
list1[c(1, 3)]
## $Numbers
## [1] 1 2 3 4
## 
## $Letters
## [1] "Hello!"
  • We can also add a new component to the list or replace a component using the $ or [[ ]] operators. This time I’ll add a linear model to the list (remember we can put anything into a list).
# add new component to existing list using $
list1$newthing <- lm(y ~ x, data = df)

# add a new component to existing list using [[ ]]
list1[[5]] <- "new component"
  • Finally, we can delete a component of a list by setting it equal to NULL.
# delete a component of existing list
list1$Letters <- NULL
list1
## $Numbers
## [1] 1 2 3 4
## 
## $Some.data
##   y x
## 1 1 m
## 2 2 m
## 3 3 f
## 
## $newthing
## 
## Call:
## lm(formula = y ~ x, data = df)
## 
## Coefficients:
## (Intercept)           xm  
##         3.0         -1.5  
## 
## 
## [[4]]
## [1] "new component"

The Letters component is gone, so there are now only 4. Notice how the 4th component doesn’t have a name because we didn’t assign it one when we added it in.

More extracting: If we want to extract the dataframe we have in the list, and just look at it’s first row, we would do list1[[2]][1,]. This code would take the second component in the list using the [[ ]] operator (which is the dataframe) and once it has the dataframe, it subsets the first row and all columns using only the [ ] operator since that is what is used to subset dataframes (or matrices).

For help on subsetting matrices and dataframes, check out this post.

# extract first row of dataframe that is in a list
list1[[2]][1, ]
##   y x
## 1 1 m

Describing a list

To describe a list, we may want to know the following:

  • the class of the list (which is a list class!) and the class of the first component of the list.
# describe class of the whole list
class(list1)
## [1] "list"
# describe the class of the first component of the list
class(list1[[1]])
## [1] "integer"
  • the number of components in the list - use the length function()
# find out how many components are in the list
length(list1)
## [1] 4
  • a short summary of each component in the list - use str(). (I take out the model because the output is really long)
# take out the model from list and then show summary of what's in the list
list1$newthing <- NULL
str(list1)
## List of 3
##  $ Numbers  : int [1:4] 1 2 3 4
##  $ Some.data:'data.frame':   3 obs. of  2 variables:
##   ..$ y: int [1:3] 1 2 3
##   ..$ x: Factor w/ 2 levels "f","m": 2 2 1
##  $          : chr "new component"
  • Now we can combine these functions to append a component to the end of the list, by assigning it to the length of the list plus 1:
# construct new list of two components
new.list <- list(vec, char)

# notice that it has two components
length(new.list)
## [1] 2
# append a component to the end and print
new.list[[length(new.list) + 1]] <- "Appended"

new.list
## [[1]]
## [1] 1 2 3 4
## 
## [[2]]
## [1] "Hello!"
## 
## [[3]]
## [1] "Appended"

Notice you could keep doing this as the length is now 3. You could also use the $ operator to name a new component and that would append it at the end, as we saw above.

Initializing a list

To initialize a list to a certain number of null components, we use the vector function like this:

# initialize list to have 3 null components and print
list2 <- vector("list", 3)
list2
## [[1]]
## NULL
## 
## [[2]]
## NULL
## 
## [[3]]
## NULL

Converting a list into matrix or dataframe

Finally, we can convert a list into a matrix, dataframe, or vector in a number of different ways. The first, most basic way is to use unlist(), which just turns the whole list into one long vector:

# convert to one long string - use unlist
unlist(list1)
##        Numbers1        Numbers2        Numbers3        Numbers4 
##             "1"             "2"             "3"             "4" 
##    Some.data.y1    Some.data.y2    Some.data.y3    Some.data.x1 
##             "1"             "2"             "3"             "2" 
##    Some.data.x2    Some.data.x3                 
##             "2"             "1" "new component"

But often we have matrices or dataframes as components of a list and we would like to combind them or stack them into one dataframe. The following shows the two good ways I’ve found to do this (from this StackOverflow page) using ldply() from the plyr package or rbind().

First, we create a list of matrices and then convert the list of matrices into a dataframe.

#create list of matrices and print
mat.list <- list(mat1=matrix(c(1,2,3,4), nrow=2), mat2=matrix(c(5,6,7,8), nrow=2))
mat.list
## $mat1
##      [,1] [,2]
## [1,]    1    3
## [2,]    2    4
## 
## $mat2
##      [,1] [,2]
## [1,]    5    7
## [2,]    6    8
#convert to data frame
#1. use ldply
require(plyr)
ldply(mat.list, data.frame)
##    .id X1 X2
## 1 mat1  1  3
## 2 mat1  2  4
## 3 mat2  5  7
## 4 mat2  6  8
#2. use rbind
do.call(rbind.data.frame, mat.list)
##        V1 V2
## mat1.1  1  3
## mat1.2  2  4
## mat2.1  5  7
## mat2.2  6  8

Get ready for part 2 next time, when we’ll see what we can use lists for and why we should use them at all.


Lists are a data type in R that are perhaps a bit daunting at first, but soon become amazingly useful. They are especially wonderful once you combine them with the powers of the apply() functions. This post will be part 1 of a two-part series on the uses of lists. In this post, we will discuss the basics - how to create lists, manipulate them, describe them, and convert them. In part 2, we'll see how using lapply() and sapply() on lists can really improve your R coding. Creating a list Let's start with some basics: what lists are and how to create them. A list is a data structure that can hold any number of any types of other data structures. If you have vector, a dataframe, and a character object, you can put all of those into one list object like so: create three different classes of objects add all three objects to one list using list() function print list list1 We can also turn an object into a list by using the as.list() function. Notice how every element of the vector becomes a different component of the list. coerce vector into a list Manipulating a list We can put names on the components of a list using the names() function. This is useful for extracting components. We could have also named the components when we created the list. See below: name the components of the list list1 could have named them when we created list Extract components from a list (many ways): the first is using the [[ ]] operator (notice two brackets, not just one). Note that we can use the single [ ] operator on a list, but it will return a list rather than the data structure that is the component of the list, which is normally not what we would want to do. See what I mean here: extract 3rd component using -> this returns a *string* list1[[3]] print a list containing the 3rd component -> this returns a *list* list1[3] It is also possible to extract components using the operator or the components name. Again, be careful about the [ ] vs [[ ]] operator in the second way. You need the [[ ]] to return the data structure of the component. extract 3rd component using extract 3rd component using [[ ]] and the name of the component Subsetting a list - use the single [ ] operator and c() to choose the components subset the first and third components list1[c(1,3)] We can also add a new component to the list or replace a component using the $ or [[ ]] operators. This time I'll add a linear model to the list (remember we can put anything into a list). add new component to existing list using add a new component to existing list using Finally, we can delete a component of a list by setting it equal to NULL. delete a component of existing list list1 Notice how the 5th component doesn't have a name because we didn't assign it one when we added it in. Now if we want to extract the dataframe we have in the list, and just look at it's first row, we would do list1[[2]][1,]. This code would take the second component in the list using the [[ ]] operator (which is the dataframe) and once it has the dataframe, it subsets the first row and all columns using only the [ ] operator since that is what is used for dataframes (or matrices). For help on subsetting matrices and dataframes, check out [this post](http://rforpublichealth.blogspot.com/2012/10/quick-and-easy-subsetting.html). extract first row of dataframe that is in a list list1[[2]][1,] Describing a list To describe a list, we may want to know the following: the class of the list (which is a list class!) and the class of the first component of the list. describe class of the whole list class(list1) describe the class of the first component of the list class(list1[[1]]) the number of components in the list - use the length function() find out how many components are in the list length(list1) a short summary of each component in the list - use str(). (I take out the model because the output is really long) take out the model from list and then show summary of what's in the list str(list1) Now we can combine these functions to append a component to the end of the list, by assigning it to the length of the list plus 1: Initializing a list To initialize a list to a certain number of null components, we use the vector function like this: initialize list to have 3 null components and print list2 Converting list into matrix or dataframe Finally, we can convert a list into a matrix, dataframe, or vector in a number of different ways. The first, most basic way is to use unlist(), which just turns the whole list into one long vector: convert to one long string - use unlist unlist(list1) But often we have matrices or dataframes as components of a list and we would like to combind them or stack them into one dataframe. The following shows the two good ways I've found to do this (from [this StackOverflow](http://stackoverflow.com/questions/4227223/r-list-to-data-frame) page) using ldply() from the plyr package or rbind(). Here, we first create a list of matrices and then convert the list of matrices into a dataframe. create list of matrices and print mat.list convert to data frame 1. use ldply require(plyr) ldply(mat.list, data.frame) 2. use rbind do.call(rbind.data.frame, mat.list) Get ready for part 2 next time, when we'll get to what we can use lists for and why we should use them at all.