Prerequisites

Introduction

Oh hi and welcome!.We are going to pack up our our toolbox for a new programming language called R.

R is a programming language used for data modeling and statistics.Popularly known as data science.

Without much talking, let's get started.

What We'll Need

To begin using R, let's head to downloading and installing R by visiting https://www.r-project.org/ .

And this is actually all you need to get started. But to get useful features like such as syntax highlighting and a tab for suggested code auto-completion and more.

 I'd suggest also installing the free R integrated development environment (IDE) RStudio. From http://www.rstudio.com/ide/ . And that's all we can pack into our toolbox for now to get started.

Welcome

Welcome to this course on R, a programming language for statistics and data modeling!

To warm up, Let's calculate the average temperature of two days.

> (88 + 72) / 2

//Output Below

[1] 80

Exactly! At it's core, we can use R as a calculator.

Why R?

Seems like we can just use a simple calculator instead of R, doesn't it? But what if you love barbecues? Could R help you plan your BBQ's?

How would you calculate the average weekend temperature throughout an entire year?

Yup! R can do all of that in a jiffy. We'll find out more about how R can help us plan our BBQ's after we've learned a few of the basics.

Input

When using R on a computer, we'll need to use a terminal. At the very beginning of the command line you'll always see the >.

> "Hello there!"

//Output Below

[1] Hello there!

That sign is a prompt that lets us know that R expects a command from us.

Output

R is made for large collections of numbers and can sometimes get confusing to read. Luckily, R helps us out when providing output.

What might be the correct order for this sequence of numbers?

[1]  5  6  7  8  9 10 11 12
[9] 13 14 15 16 17 18 19 20
[17] 21 22 23 24 25

R prints out the index of the first number in each line:13 is the 9th number in our collection and 21 is the 17th. We'll see [1] a lot in our results.

Assigning values

As we know, programming languages often use variables to store different values.

Let's guess how variables are assigned in R.

> variable <- 5

Exactly! We can read that as variablegets5.

Psst: once something is created in R, it's called an object.

Operators

We can also use arithmetic operators with variables. To display the contents of a variable, we just need to write it out in the prompt.

Let's add the two variables.

> var_1 <- 9
> var_2 <- 3
> result <- var_1 + var_2
> result

//Output Below

[1] 12

Wahey! We can perform basic arithmetic operations with +, -, *, and /. We printed out the sum by simply typing in result.

Data structures

However, R was designed for collections of data, not simple numbers.

To combine different values into a collection, we can use a function called c().

> var <- c(0,1,1,2,3)
> var

//Output Below

[1] 0 1 1 2 3

Awesome! We just created a collection of data and assigned it to var.

Data structures II

Which of these do you think are collections in R.

Yup! R treats everything as a collection. Single values are collections with just 1 element.

Atomic vectors

These collections have a special name in R: atomic vectors. They can also be made out of more than just numbers.

Let's use a function called class to check out other atomic vectors.

> a <- c(1,2)
> b <- c("a","bee","cdefg")
> c <- c(TRUE,FALSE)
> class(a)
> class(b)
> class(c)

//Output Below

[1] "numeric"
[1] "character"
[1] "logical"

Great! a is a numeric , b is a character, and c is a logical vector.

Psst: atomic means that it's the smallest component of something bigger.

Other structures

Besides atomic vectors, data can be stored in many other ways in R: matrices, factors, data frames, and lists.

Why would we need so many different structures?

Yup! Data comes in all shapes and sizes we need to store it in understandable ways.

Functions

Just like in other languages, we can also make our own functions in R. These are sets of instructions we can reuse as often as we want.

function_name <- function(x,y){
  return x * y
}
function_name(2,4)

//Output Below

[1] 8

We call a function by typing its name folowed by (). Here, we're taking 2 variables as input, multiplying them and displaying the result.

Comments

In R, we can use # to comment out parts of our code.

> # comments begin with a '#'
> # comments are ignored by the program
> a = 2
> # a = 3
> a

//Output Below

[1] 2

See how the line # a = 3 was ignored? You can use comments to write notes for yourself or explanations for others.

Creating a vector

Let's take a closer look at numericvectors.

Do you remember how to combine different values into one vector?

> x <- c(29,3,17)

Nice! We can use the c() function to combine values into one vector.

Sequences

A lot of times, we might want a list of numbers generated for us. If that's the case, we can use the colon operator : to create so-called sequences.

> my_sequence <- c(10:15)

//Output Below

[1] 10 11 12 13 14 15

Nice! The colon operator will create a sequence by adding 1 until it reaches 15.

Seq()

R also has a function that can create specific sequences: the seq() function. What might the following code print out?

> x <- seq(10,20,by=5)
  • [1] 10 15 20

Yass! The seq() function creates a sequence from 10 to 20. The by=5 part means we increment by 5 to get the next value.

Decimal

What if we want to create a sequence that has the increment size different from 1?

> my_sequence <- seq(2.1,4,by=0.5)

//Output Below

[1] 2.1 2.6 3.1 3.6

Awesome! The seq() function allows us to be very specific with decimal numbers. Here, we used the 0.5 decimal to increment our sequence.

Vector types

An important thing to remember is that R vectors can only contain elements of the same data type.

> my_vector <- c(8,3,11)
> my_vector

//Output Below

[1] 8 3 11

Awesome! To combine different type of elements in R we have to use lists! We'll explore lists a bit later.

Type coercion

We can't combine strings and numbers in one vector, but what about different kinds of numbers, such as integer and decimal?

> y <- c(1.1, 3.14)
> x <- c(y, 555)
> x
  • [1] 1.10 3.14 555.00

Woah, good job! If we add an integer to a decimal vector, R will automatically change it's type to decimal. This is called type coercion.

Psst: type coercion happens when there are values of different types. R converts to a type that works for all cases and doesn't lose any data.

Subsetting

Smaller pieces of our collected data are called subsets. To get a subset, we need to use the square brackets [] and an index of the value we want.

How would we get the 3rd value of a vector?

> x <- c(75,80,83)
> x[3]

//Output Below

[1] 83

Great! The 3rd element is in the 3rd position in the vector.

Psst: R is designed for humans, which is why the index starts at 1, rather than 0 as most programming languages.

Bigger subsets

Remember how we used the colon operator to create sequences? Using :, Let's try selecting the three values in the middle of the vector.

> x <- c(90,85,81,84,84) 
> x[2:4]

//Output Below

[1] 85 81 84

Boom! R makes it super easy to get the values we need.

Name attribute

Vectors can also have attributes, extra information which helps clarify data. The names() function allows us to give a name attribute to each value.

> x <- c(75,80,83)
> days <- c("Mon","Tue","Wed")
> names(x) <- days
> x

//Output Below

Monday   Tuesday Wednesday 
    75        80        83 

Sweet! Notice how both the days and x vectors are of the same length.

Psst: R doesn't number each line when we print out vectors with a name attribute.

Subsetting by name

Now that we've given names to our values, let's try subsetting the value recorded on Wednesday.

> x <- c(75,80,83)
> names(x) <- c("Mon","Tue","Wed")
> x ["Wed"]

//Output Below

Wed
 83

Woop woop! We simply need to type in the word "Wed" as a string and we'll get our result.

Length

Because R is made for large collections of data, it can come in handy to know how many elements a vector has.

Can you guess the name of the function that does that?

> year <- 365
> even_days <- seq(2,year,by=2)
> length(even_days)

//Output Below

[1] 182

Awesome! We can use length() to find out how many elements a vector has.

Hello again

Data can come in many more forms, including words. To store a string in R, we need to use a vector of type character.

> sentence <- "Ay caramba"
> sentence

//Output Below

Ay caramba

Boom! We just created a character vector.

Length

We can also use the length() function on character vectors. What do you think is the length of sentence?

> sentence <- "Bird is the word."
> length(sentence)
  • [1] 1

Wahey! sentence is a character vector with 1 element. The element is made out of 4 words and 17 characters, but that doesn't affect the vector length.

Characters

If we want to measure the number of characters, we can use the nchar() function.

> sentence <- c("Hello", "my name is", "the real")
> nchar(sentence)
  • [1] 5 10 8

Super! The output is the number of characters for each element belonging to the vector.

Adding words

Adding other entries is the same process as with numeric vectors.

> flavors <- c("blueberry","kiwi")
> flavors <- c(flavors, "banana")
> flavors

//Output Below

[1] "blueberry"     "kiwi"       "banana"

See that? We can easily add another entry with the c() function.

Paste

A very handy feature is the paste() function. It lets us combine all of our vector elements into one. We just need to decide on how to link them.

Use the paste() function to link the words with spaces in-between.

> word <- c("We","all","scream","for","ice","cream")
> paste(word,collapse = " ")

//Output Below

[1] "We all scream for ice cream"

Perfect! We can use the paste() function with a character vector and a collapseargument to combine vector elements into one.

Logical

Let's move on to another vector type: logical.

> answers <- c(TRUE,FALSE)
> answers

//Output Below

[1] TRUE FALSE

See that? Logical vectors only have two main values: TRUE or FALSE.

Psst: in R, all vectors can also have a value called NA. We'll get to that soon!

Operators

Logical vectors are usually the result of operations carried out on other vectors. Common logical operators are >,<, and == among others.

How might we check if x is longer than y?

> x <- c(1,2,3,4)
> y <- c(5,6,7)
> length(x) > length(y)

//Output Below

[1] TRUE

Awesome work! We use the > to see if the value on the left is greater than the one on the right.

Operations

What might happen if we checked the vector values instead of their lengths?

> x <- c(517,234,10)
> y <- c(-38,307,10)
> result <- x > y
> result

//Output Below

[1] TRUE FALSE FALSE

Nice! We compared the values of two vectors. we can do this with other logical operators such as > or == as well.

Comparing

Logical vectors can also be a result of comparing character vectors. Let's see if the two vectors have equal values.

> name <- c("Mr.","Bond")
> name2 <- c("Mrs.","Bond")
> result <- name == name2
> result
  • [1] FALSE  TRUE

Excellent! "Mr." and "Mrs." are not the same, so the result is FALSE. The result is a logical vector of the same length as the compared vectors.

Sum

Remember talking about how R is a calculator at it's core? Well, doing math with vectors is where R really shines.

Let's see how many friends and family we'll invite over for 3 different BBQs.

> friends <- c(7,10,24)
> family <- c(20,15,3)
> guests <- friends + family
> guests

//Output Below

[1] 27 25 27

Sweet! We used the + operator to sum two vectors. The result is a vector containing sums of the corresponding individual elements.

Adding a constant

To make sure we have enough for everyone, we can assume that we'll always have at least 5 extra guests for each BBQ.

We'll do so by adding a constant to our guests vector.

> guests <- c(27,25,27)
> guests <- guests + 5

//Output Below

[1] 32 30 32

Awesome! We used 5 as a constant and now we'll be able to plan  for our party.

Multiplying

We can now figure out how many supplies we need for each guest,  whether it's food, drinks or even balloons.

Let's see how our food budget would look like.

> guests <- c(32,30,32)
> food <- c(1.5,1.5,2.5)
> drinks <- c(2,2,5)
> balloons <- c(0,0,2)
> result <- guests * food

//Output Below

[1] 48 45 80

Boom! We're able to figure out all of our party needs with R. Let's see more about how we can plan our budget

Balloons

The last BBQ is also a birthday party, so we want to celebrate properly. Let's see how many balloons we'll need for the summer.

> guests <- c(32,30,32)
> food <- c(1.5,1.5,2.5)
> drinks <- c(2,2,5)
> balloons <- c(0,0,2)
> result <- guests * balloons

//Output Below

[1]  0  0 64

The evidence is clear. We need more balloons at parties.

Dividing

After calculating the total cost of each party, you want to know how many BBQs of each type you could afford each month.

You have a monthly budget of $500. Let's see how many times we can light up the grill.

> bbq_cost <- c(200,190,530)
> budget <- 500
> bbq <- budget/bbq_cost
> bbq

//Output Below

[1] 2.5000 2.6315 0.9433

Yass! We divided our budget by the price of each BBQ. However, those numbers seem a bit confusing.

Rounding

Let's use a function called floor() to round down our previous result.

> bbq <- c(2.5000,2.6315,0.9433)
> floor(bbq)

//Output Below

[1] 2 2 0

Great! We can use functions like floor() to simplify our results. Unfortunately, it seems like we need a bigger budget for birthday BBQs.

Missing values

But what if we sometimes forget a value? How would we write down a missing value?

  • > bbq_cost <- c(200,190,530,NA)

Exactly! Rather than using 0, we use NA to write down values we do not know. NA's can give us valuable insight about our data.

NA operations

NA stands for "Not Available" and helps keep our data consistent. Any operations with a NA value will always result in NA.

> x <- c(5,NA,4,8)
> y <- c(2,4,5,NA)
> x * y

//Output Below

[1] 10 NA 20 NA

Nice! NA multiplied or divided with another number still results in NA.

NA operations II

What do you think is the result of the following expression?

> result <- NA != 0
> result
  • [1] NA

Great work! NA is a placeholder for a missing value, so it doesn't make sense comparing it with other values.

Checking for NA

We've invited some friends and family to one last BBQ, but some didn't confirm if they can make it.

Let's use a function called is.na() to see how many guests have not confirmed.

> rsvp_family <- c(TRUE,FALSE,NA,NA,NA,FALSE)
> rsvp_friends <- c(NA,NA,TRUE,TRUE,TRUE)
> rsvp <- c(rspv_family,rsvp_friends)      
> is.na(rsvp)

//Output Below

[1] FALSE FALSE  TRUE  TRUE  
[5] TRUE FALSE  TRUE  TRUE 
[9] FALSE FALSE FALSE

Boom! The result is a logical vector which displays TRUE for every NA value. Five TRUE values means that 5 people did not get back to us.

Matrices

A matrix is a two-dimensional object. In other words, we can store data in a matrix in both rows and columns.

> my_matrix

//Output Below

     [,1]    [,2]    [,3]
[1,]    1      3      5
[2,]    2      4      6

See that? We just printed out a matrix with 2 rows and 3 columns.

Creating a matrix

We use the matrix() function to create a matrix from vectors. Similar to vectors, a matrix can only contain one type of value. 

Let's create a matrix that has only two rows.

> data <- c(1, 2, 3, 4, 5, 6)
> my_matrix <- matrix(data,nrow=2,ncol=3)
> my_matrix

//Output Below

     [,1]   [,2]   [,3]
[1,]    1      3      5
[2,]    2      4      6

Awesome! To create a matrix we need to provide a vector and specify the number of rows and columns.

Creating a matrix II

Now let's make a matrix that only has two columns.

> data <- c(1, 2, 3, 4, 5, 6)
> my_matrix <- matrix(data,nrow=3,ncol=2)
> my_matrix

//Output Below

     [,1] [,2]
[1,]    1    4
[2,]    2    5
[3,]    3    6

Whop whop! Creating matrices is easy in R. Let's see what else we can find out about matrices.

Creating by row

We can fill our matrices in two ways: by rows or by columns. To do so, we just need to add the byrow attribute and set it to TRUE or FALSE.

> data <- 1:6
> my_matrix <- matrix(data,nrow=2,ncol=3,byrow=TRUE)
> my_matrix

//Output Below

     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    4    5    6

Sweet! If byrow=TRUE, the first row is filled, followed by the second.

By column

Let's figure out how to add the values by column.

> data <- 1:6
> my_matrix <- matrix(data,nrow=2,ncol=3,byrow=FALSE)
> my_matrix

//Output Below

      [,1] [,2] [,3]
[1,]    1    3    5
[2,]    2    4    6

Wohoo! When byrow is FALSE, the first column is filled, followed by the second and so on.

Column names

We can also rename the columns of our matrix using the colnames() function.

> my_matrix <- matrix(1:6,3)
> colnames(my_matrix) <- c("run","cycle")
> my_matrix

//Output Below

     run cycle
[1,]   1     4
[2,]   2     5
[3,]   3     6

Amazing! Notice how we don't have to type nrow = 3. The second variable in matrix() is automatically treated as the number of rows.

Row names

We can rename the rows of the matrix in a similar fashion using the rownames() function.

> my_matrix <- matrix(1:6, 3) 
> rownames(my_matrix) <- c("Anne","Luke","Emma")
> my_matrix

//Output Below

     run cycle
Anne   1     4
Luke   2     5
Emma   3     6

Nice! We can now see who's a better runner or biker.

Row sums

Let's calculate the total distance per person by computing a sum of each row using our previous matrix.

> my_matrix
> rowSums(my_matrix)

//Output Below

     run cycle
Anne   1     4
Luke   2     5
Emma   3     6

Anne Luke Emma 
   5    7    9 

Awesome! rowSums() returns a vector containing sums for each row in the matrix.

Column sums

We can now try computing the distance covered by all individuals per activity.

> my_matrix
> colSums(my_matrix)

//Output Below

     run cycle
Anne   1     4
Luke   2     5
Emma   3     6

[1] 6 15

Nice! colSums() returns a vector containing sums for each column in the matrix.

Subsetting

We subset matrices by typing the row and then the column indices inside [].

Let's try to get the 3rd row and second column element from my_matrix.

> my_matrix
> my_matrix[3,2]

//Output Below

     [,1] [,2]
[1,]    1    4
[2,]    2    5
[3,]    3    6

[1] 6

Nice! As with vectors, we subset matrices using brackets in the following format: [row_index , column_index].

Adding a row

If we want to add a row to our matrix, we can do so by simply using the rbind() function.

> rbind(my_matrix, c(7,8))
> my_matrix

//Output Below

     [,1] [,2]
[1,]    1    4
[2,]    2    5
[3,]    3    6
[4,]    7    8

Nice! Remember that the row being added must have the same length as the number of columns in the matrix.

Adding a column

To add a column, we need to use the cbind() function. We also have to keep a careful eye on the length of the added column.

> my_matrix <- matrix(1:6, 3)
> cbind(my_matrix, c(9, 8, 7))

//Output Below

     [,1] [,2] [,3]
[1,]    1    4    9
[2,]    2    5    8
[3,]    3    6    7

Awesome! The column we are adding should have the same length as the number of rows in our matrix.

Matrix Arithmetics

Dividing or multiplying a matrix with a constant will divide or multiply every matrix element with that constant.

Let's see what happens when we divide a matrix by 2.

> my_matrix <- matrix(1:6, 3)
> my_matrix / 2

//Output Below

     [,1] [,2]
[1,]  0.5  2.0
[2,]  1.0  2.5
[3,]  1.5  3.0

Sweet! *, +, and - work the same way. The operation is carried out between the constant and each individual element.

More Arithmetics

Doing arithmetic calculation with two matrices of the same size will compute the corresponding matrix elements.

> my_matrix <- matrix(1:6, 3)
> my_matrix + my_matrix

//Output Below

     [,1] [,2]
[1,]    2    8
[2,]    4   10
[3,]    6   12

Great! The arithmetic operation is performed between each corresponding element.

Factors

If we have a lot of repeating values that can be grouped into a limited number of distinct categories, we can store them in a factor.

> os <- c("Android", "iOS", "Android")
> os_factor <- factor(os)
> os_factor

//Output Below

[1] Android iOS Android
Levels: Android iOS

Nice! The levels row lets us know we only have two distinct categories, Android and iOS.

Unordered factors

Let's see what R thinks about different operating systems.

> os <- c("Android", "iOS", "Android")
> os_factor <- factor(os)
> android <- os_factor[1]
> ios <- os_factor[2]
> android > iOS

//Output Below

[1] NA
Warning message:In Ops.factor(android, ios) : ‘>’ not meaningful for factors

Phew! By default factors do not discriminate between its levels.

Ordered factors

We can also have categories that have an order. We can specify this by creating so-called ordinal categorical variables.

> length <- c("medium", "short", "long", "short", "medium")
> l_fctr  <- factor(length, order = TRUE, levels= c("short", "medium", "long"))
> l_fctr 

//Output Below

[1] medium short  long   short  medium
Levels: short < medium < long

Awesome! Note how the levels are now displayed in our given order.

Ordered factors II

Let's now see if "medium" is greater than "short" in our ordered factor.

> length <- c("medium", "short", "long")
> l_fctr <- factor(length, order = TRUE, levels= c("short", "medium", "long"))
> l_fctr[1] > l_fctr[2]

//Output Below

[1] TRUE

Nice! Once we set order as TRUE we can compare different levels from our factor.

Factor summary

R has a cool ability to summarize the data from our objects.

> group_vector <- c("C", "B", "A", "C", "A")
> group_fac <- factor(group_vector)
> summary(group_fac)

//Output Below

A B C
2 1 2

Nice! summary() displays the categories and the number of so-called observations per category.

Renaming levels

Sometimes our data might contain factors with long level names and we may want to rename them.

> vector <- c("A_grp", "B_grp", "A_grp")
> fctr <- factor(vector)
> levels(fctr) <- c("A", "B")
> fctr

//Output Below

[1] A B A
Levels: A B

Nice! We rename factor levels with the levels() function.

Data frames

Data frames are similar to matrices, but can also contain elements of different types.

> quantity <- c(200, 300, 100)
> crop <- c("corn", "leek", "pea")
> subsidy <- c(TRUE, FALSE, TRUE)
> my_df <- data.frame(quantity, crop, subsidy)
> my_df

//Output Below

  quantity crop subsidy
1      200 corn    TRUE
2      300 leek   FALSE
3      100  pea    TRUE

Nice! Data frames are two-dimensional objects where variables are stored as columns and observations as rows.

Data frame structure

A great way to explore your data is by using the str() function. Let's apply it to our data frame.

> quantity <- c(200, 300, 100)
> crop <- c("corn", "leek", "pea")
> subsidy <- c(TRUE, FALSE, TRUE)
> my_df <- data.frame(quantity, crop, subsidy)
> str(my_df)

//Output Below

'data.frame':  3 obs. of  3 variables:
 $ quantity: num  200 300 100
 $ crop    : Factor w/ 3 levels "corn","leek",..: 1 2 3
 $ subsidy : logi  TRUE FALSE TRUE

See that? We can now look at the number of observations and variable types. Notice that crop is of type factor instead of character.

Strings as factors

We saw that by default, strings were saved as factors in our data frame. Sometimes we may want to override this, and store them just as strings.

> farmer <- c("Bob", "Sam", "Mike")
> my_df <- data.frame(quantity, crop, farmer, subsidy, stringsAsFactors = FALSE)
> class(my_df$crop)

//Output Below

[1] "character"

It makes sense to leave "crop" as a factor since it's a finite category. Personal names, however, are better saved as character.

Adding a new variable

We can add a new variable to our data frame and name it at the same time. Let's add the farmer column.

> my_df$farmer <- c("Bob", "Sam", "Mike")
> my_df

//Output Below

  quantity crop subsidy farmer
1      200 corn    TRUE    Bob
2      300 leek   FALSE    Sam
3      100  pea    TRUE   Mike

Yass! With the $ symbol, we can appended and name a column. Adding string vector in this way saves it as character.

Selecting variables

We can subset a particular variable column by typing the data frame name followed by $ and the variable name.

Let's see if we can also subset crop in this code snippet.

> my_df$quantity
> my_df$crop

//Output Below

[1] 200 300 100
[1] "corn" "leek" "pea"

Nice! Output is an atomic vector of a particular type.

Selecting variables II

Another attribute we can use to select a data frame variable is a double square bracket[[ ]].

> my_df[["quantity"]]
> my_df[[1]]

//Output Below

[1] 200 300 100
[1] 200 300 100

Note that we can either type the column index or the column name inside the [[ ]].

Subsetting using index

As with vectors and matrices, we can call various data frame subsets by using simple square brackets.

How would we select all of the rows, but only just 2 columns?

> my_df[ , 3:4]

//Output Below

  subsidy farmer
1    TRUE    Bob
2   FALSE    Sam
3    TRUE   Mike

Yass! We've just subsetted 2 columns. By using different index combinations we can subset single elements, rows or two-dimensional arrays.

Selecting 1 column

What might be the best way to only select 1 column?

> my_df[ , 3]

//Output Below

[1]  TRUE FALSE  TRUE

Awesome! By leaving the row empty and only writing the index of one column we are able to get the column.

Subsetting with boolean

Let's select only the rows that are receiving subsidy. Remember, subsidy is a logical variable.

my_df[subsidy, ]

//Output Below

  quantity crop subsidy farmer
1      200 corn    TRUE    Bob
3      100  pea    TRUE   Mike

Awesome! we see only the rows containing TRUE in the subsidy column.

Sorting

We can sort our data frame by a particular column using the order() function. Sort my_df by the quantity columns.

> order(my_df$quantity)
> my_df[order(my_df$quantity), ]

//Output Below

[1] 3 1 2

  quantity crop subsidy farmer
3      100  pea    TRUE   Mike
1      200 corn    TRUE    Bob
2      300 leek   FALSE    Sam

See that? The order() functions allows us to sort vectors. By default, it sorts in ascending order.

What are lists?

Unlike vectors, R lists can hold components of different data types.

> my_list <- list(2,"c",TRUE)
> my_list

//Output Below

[[1]]
[1] 2

[[2]]
[1] "c"

[[3]]
[1] TRUE

Nice! we just created a list using the list() function and
assigned it a numeric, a character, and a logical element type.

More lists

Lists can also contain more complex data components such as whole vectors, matrices or data frames.

> vec <- 1:6
> mat <- matrix(1:6,nrow=2)
> df <- data.frame(mat)
> list(vec, mat, df)

//Output Below

[[1]]
[1] 1 2 3 4 5 6

[[2]]
     [,1] [,2] [,3]
[1,]       1       3       5
[2,]    2    4    6

[[3]]
  X1 X2 X3
1  1  3  5
2  2  4  6

Great! We can see that lists are very flexible. We could even store a list inside another list if we wanted to.

Renaming list components

By default, list component names are simply their indices.

Let's rename the first component of our list to  vector.

> vec <- 1:6
> mat <- matrix(1:6,nrow=2)
> df <- data.frame(mat)
> my_list <- list(vector=vec,matrix=mat,dframe=df)

//Output Below

$vector
[1] 1 2 3 4 5 6

$matrix
     [,1] [,2] [,3]
[1,]    1    3    5
[2,]    2    4    6

$dframe
  X1 X2 X3
1  1  3  5
2  2  4  6

Pretty straightforward, right?

Extracting list components

We can extract list components by using double brackets around the index or the name. We can also use the $ sign with the name.

Let's see if we can extract a component by using it's index.

> my_list <- list(int=2,vec=1:6,bool=TRUE)
> my_list[[2]]

//Output Below

[1] 1 2 3 4 5 6

Nice! Lists provide us with many different ways of extracting components.

Extracting by name

Let's extract the same element, only this time, let's use it's name.

> my_list <- list(int=2,vec=1:6,bool=TRUE)
> my_list$vec

//Output Below

[1] 1 2 3 4 5 6

Exactly! We just need to use the $ symbol alongside the name.

Further subsetting

What if we want to extract a 3rd element of a vector that belongs to a list?

> my_list <- list(int=2,vec=1:6,bool=TRUE)
> print(my_list[[2]][3])

//Output Below

[1] 3

Awesome! The first [[2]] indicates the vector element in the list, while the [3] indicates the element we want from that vector.

Appending components

We can add a component of any type to our list by using the c() function.

> my_list <- list(int=2,vec=1:6,bool=TRUE)
> my_list <- c(my_list,TRUE)
> my_list

//Output Below

$int
[1] 2

$vec
[1] 1 2 3 4 5 6

$bool
[1] TRUE

[[4]]
[1] TRUE

See that? R automatically converts other elements in the c() function into list elements.