- A computer/ Device
- A brain

Oh hi and welcome!.We are going to pack up our our toolbox for a new programming language called R.

R is a programming language used for data modeling and statistics.Popularly known as data science.

Without much talking, let's get started.

To begin using R, let's head to downloading and installing R by visiting https://www.r-project.org/ .

And this is actually all you need to get started. But to get useful features like such as syntax highlighting and a tab for suggested code auto-completion and more.

I'd suggest also installing the free R integrated development environment (IDE) RStudio. From http://www.rstudio.com/ide/ . And that's all we can pack into our toolbox for now to get started.

Welcome to this course on **R**, a programming language for **statistics** and **data modeling**!

To warm up, Let's calculate the average temperature of two days.

> (88 + 72) / 2

//Output Below

[1] 80

Exactly! At it's core, we can use R as a calculator.

Seems like we can just use a simple calculator instead of R, doesn't it? But what if you love **barbecues**? Could R help you plan your BBQ's?

How would you calculate the average weekend temperature throughout an **entire year**?

- Use R to find the result

Yup! R can do all of that in a jiffy. We'll find out more about how R can help us plan our BBQ's after we've learned a few of the basics.

When using R on a computer, we'll need to use a **terminal**. At the very beginning of the command line you'll always see the `>`

.

> "Hello there!"

//Output Below

[1] Hello there!

That sign is a **prompt** that lets us know that R expects a command from us.

R is made for large collections of numbers and can sometimes get confusing to read. Luckily, R helps us out when providing **output**.

What might be the correct order for this sequence of numbers?

[1] 5 6 7 8 9 10 11 12 [9] 13 14 15 16 17 18 19 20 [17] 21 22 23 24 25

R prints out the index of the first number in each line:`13`

is the 9th number in our collection and `21`

is the 17th. We'll see `[1]`

a *lot* in our results.

As we know, programming languages often use **variables** to store different values.

Let's guess how variables are assigned in R.

> variable <- 5

Exactly! We can read that as `variable`

*gets*`5`

.

Psst: once something is created in R, it's called an **object**.

We can also use **arithmetic operators** with variables. To display the contents of a variable, we just need to write it out in the prompt.

Let's add the two variables.

> var_1 <- 9 > var_2 <- 3 > result <- var_1 + var_2 > result

//Output Below

[1] 12

Wahey! We can perform basic arithmetic operations with `+`

, `-`

, `*`

, and `/`

. We printed out the sum by simply typing in `result`

.

However, R was designed for collections of data, not simple numbers.

To combine different values into a collection, we can use a **function** called `c()`

.

> var <- c(0,1,1,2,3) > var

//Output Below

[1] 0 1 1 2 3

Awesome! We just created a collection of data and assigned it to `var`

.

Which of these do you think are **collections** in R.

- var <- 3
- var <- c(5,-3,27)

Yup! R treats *everything* as a collection. Single values are collections with just **1 element**.

These collections have a special name in R: **atomic vectors**. They can also be made out of more than just numbers.

Let's use a function called `class`

to check out other atomic vectors.

> a <- c(1,2) > b <- c("a","bee","cdefg") > c <- c(TRUE,FALSE) > class(a) > class(b) > class(c)

//Output Below

[1] "numeric" [1] "character" [1] "logical"

Great! `a`

is a **numeric** , `b`

is a **character**, and `c`

is a **logical** vector.

Psst: atomic means that it's the smallest component of something bigger.

Besides atomic vectors, data can be stored in many other ways in R: matrices, factors, data frames, and lists.

Why would we need so many different structures?

- To define relationships between data
- To get better results
- We don't, they are all basically the same

Yup! Data comes in all shapes and sizes we need to store it in **understandable** ways.

Just like in other languages, we can also make our own functions in R. These are sets of instructions we can reuse as often as we want.

function_name <- function(x,y){ return x * y } function_name(2,4)

//Output Below

[1] 8

We call a function by typing its name folowed by `()`

. Here, we're taking 2 **variables** as **input**, multiplying them and displaying the result.

In R, we can use `#`

to comment out parts of our code.

> # comments begin with a '#' > # comments are ignored by the program > a = 2 > # a = 3 > a

//Output Below

[1] 2

See how the line `# a = 3`

was ignored? You can use comments to write notes for yourself or explanations for others.

Let's take a closer look at **numeric****vectors**.

Do you remember how to combine different values into one vector?

> x <- c(29,3,17)

Nice! We can use the `c()`

function to combine values into one vector.

A lot of times, we might want a list of numbers generated for us. If that's the case, we can use the colon operator `:`

to create so-called **sequences**.

> my_sequence <- c(10:15)

//Output Below

[1] 10 11 12 13 14 15

Nice! The **colon operator** will create a sequence by adding 1 until it reaches `15`

.

R also has a **function** that can create specific sequences: the `seq()`

function. What might the following code print out?

> x <- seq(10,20,by=5)

- [1] 10 15 20

Yass! The `seq()`

function creates a sequence from `10`

to `20`

. The `by=5`

part means we increment by 5 to get the next value.

What if we want to create a **sequence** that has the increment size different from 1?

> my_sequence <- seq(2.1,4,by=0.5)

//Output Below

[1] 2.1 2.6 3.1 3.6

Awesome! The `seq()`

function allows us to be very specific with **decimal** numbers. Here, we used the `0.5`

decimal to increment our sequence.

An important thing to remember is that R **vectors** can only contain elements of the **same data type**.

> my_vector <- c(8,3,11) > my_vector

//Output Below

[1] 8 3 11

Awesome! To combine **different type** of elements in R we have to use **lists**! We'll explore lists a bit later.

We can't combine strings and numbers in one vector, but what about different kinds of numbers, such as integer and decimal?

> y <- c(1.1, 3.14) > x <- c(y, 555) > x

- [1] 1.10 3.14 555.00

Woah, good job! If we add an **integer** to a **decimal** vector, R will automatically change it's **type** to decimal. This is called **type coercion**.

Psst: type coercion happens when there are values of different types. R converts to a type that works for all cases and doesn't lose any data.

Smaller pieces of our collected data are called **subsets**. To get a subset, we need to use the square brackets `[]`

and an index of the value we want.

How would we get the 3rd value of a vector?

> x <- c(75,80,83) > x[3]

//Output Below

[1] 83

Great! The 3rd element is in the 3rd position in the vector.

Psst: R is designed for humans, which is why the index starts at 1, rather than 0 as most programming languages.

Remember how we used the **colon operator** to create sequences? Using `:`

, Let's try selecting the three values in the middle of the vector.

> x <- c(90,85,81,84,84) > x[2:4]

//Output Below

[1] 85 81 84

Boom! R makes it super easy to get the values we need.

Vectors can also have **attributes**, extra information which helps clarify data. The `names()`

function allows us to give a name attribute to each value.

> x <- c(75,80,83) > days <- c("Mon","Tue","Wed") > names(x) <- days > x

//Output Below

Monday Tuesday Wednesday 75 80 83

Sweet! Notice how both the `days`

and `x`

vectors are of the same length.

Psst: R doesn't number each line when we print out vectors with a name attribute.

Now that we've given names to our values, let's try subsetting the value recorded on Wednesday.

> x <- c(75,80,83) > names(x) <- c("Mon","Tue","Wed") > x ["Wed"]

//Output Below

Wed 83

Woop woop! We simply need to type in the word `"Wed"`

as a string and we'll get our result.

Because R is made for large collections of data, it can come in handy to know how many elements a vector has.

Can you guess the name of the function that does that?

> year <- 365 > even_days <- seq(2,year,by=2) > length(even_days)

//Output Below

[1] 182

Awesome! We can use `length()`

to find out how many elements a vector has.

Data can come in many more forms, including words. To store a string in R, we need to use a vector of type **character**.

> sentence <- "Ay caramba" > sentence

//Output Below

Ay caramba

Boom! We just created a **character vector**.

We can also use the `length()`

function on character vectors. What do you think is the length of `sentence`

?

> sentence <- "Bird is the word." > length(sentence)

- [1] 1

Wahey! `sentence`

is a character vector with 1 element. The element is made out of 4 words and 17 characters, but that doesn't affect the **vector length**.

If we want to measure the number of characters, we can use the `nchar()`

function.

> sentence <- c("Hello", "my name is", "the real") > nchar(sentence)

- [1] 5 10 8

Super! The output is the number of characters for **each element** belonging to the vector.

Adding other entries is the same process as with numeric vectors.

> flavors <- c("blueberry","kiwi") > flavors <- c(flavors, "banana") > flavors

//Output Below

[1] "blueberry" "kiwi" "banana"

See that? We can easily add another entry with the `c()`

function.

A very handy feature is the `paste()`

function. It lets us combine all of our vector elements into one. We just need to decide on how to link them.

Use the `paste()`

function to link the words with **spaces** in-between.

> word <- c("We","all","scream","for","ice","cream") > paste(word,collapse = " ")

//Output Below

[1] "We all scream for ice cream"

Perfect! We can use the `paste()`

function with a character vector and a `collapse`

**argument** to combine vector elements into one.

Let's move on to another vector type: **logical**.

> answers <- c(TRUE,FALSE) > answers

//Output Below

[1] TRUE FALSE

See that? **Logical vectors** only have two main values: `TRUE`

or `FALSE`

.

Psst: in R, all vectors can also have a value called `NA`

. We'll get to that soon!

Logical vectors are usually the result of operations carried out on other vectors. Common **logical operators** are `>`

,`<`

, and `==`

among others.

How might we check if `x`

is longer than `y`

?

> x <- c(1,2,3,4) > y <- c(5,6,7) > length(x) > length(y)

//Output Below

[1] TRUE

Awesome work! We use the `>`

to see if the value on the left is greater than the one on the right.

What might happen if we checked the vector values instead of their lengths?

> x <- c(517,234,10) > y <- c(-38,307,10) > result <- x > y > result

//Output Below

[1] TRUE FALSE FALSE

Nice! We compared the values of two vectors. we can do this with other logical operators such as `>`

or `==`

as well.

Logical vectors can also be a result of comparing character vectors. Let's see if the two vectors have **equal** values.

> name <- c("Mr.","Bond") > name2 <- c("Mrs.","Bond") > result <- name == name2 > result

- [1] FALSE TRUE

Excellent! `"Mr."`

and `"Mrs."`

are not the same, so the result is `FALSE`

. The result is a logical vector of the **same length** as the compared vectors.

Remember talking about how R is a calculator at it's core? Well, doing math with **vectors** is where R really shines.

Let's see how many friends and family we'll invite over for 3 different BBQs.

> friends <- c(7,10,24) > family <- c(20,15,3) > guests <- friends + family > guests

//Output Below

[1] 27 25 27

Sweet! We used the `+`

operator to sum two vectors. The result is a vector containing sums of the corresponding individual elements.

To make sure we have enough for everyone, we can assume that we'll always have at least 5 extra guests for each BBQ.

We'll do so by adding a **constant** to our `guests`

vector.

> guests <- c(27,25,27) > guests <- guests + 5

//Output Below

[1] 32 30 32

Awesome! We used `5`

as a constant and now we'll be able to plan for our party.

We can now figure out how many supplies we need for each guest, whether it's food, drinks or even balloons.

Let's see how our food budget would look like.

> guests <- c(32,30,32) > food <- c(1.5,1.5,2.5) > drinks <- c(2,2,5) > balloons <- c(0,0,2) > result <- guests * food

//Output Below

[1] 48 45 80

Boom! We're able to figure out all of our party needs with R. Let's see more about how we can plan our budget

The last BBQ is also a birthday party, so we want to celebrate properly. Let's see how many balloons we'll need for the summer.

> guests <- c(32,30,32) > food <- c(1.5,1.5,2.5) > drinks <- c(2,2,5) > balloons <- c(0,0,2) > result <- guests * balloons

//Output Below

[1] 0 0 64

The evidence is clear. We need more balloons at parties.

After calculating the total cost of each party, you want to know how many BBQs of each type you could afford each month.

You have a monthly budget of $500. Let's see how many times we can light up the grill.

> bbq_cost <- c(200,190,530) > budget <- 500 > bbq <- budget/bbq_cost > bbq

//Output Below

[1] 2.5000 2.6315 0.9433

Yass! We divided our budget by the price of each BBQ. However, those numbers seem a bit confusing.

Let's use a function called `floor()`

to **round down** our previous result.

> bbq <- c(2.5000,2.6315,0.9433) > floor(bbq)

//Output Below

[1] 2 2 0

Great! We can use functions like `floor()`

to simplify our results. Unfortunately, it seems like we need a bigger budget for birthday BBQs.

But what if we sometimes forget a value? How would we write down a **missing value**?

- > bbq_cost <- c(200,190,530,NA)

Exactly! Rather than using `0`

, we use `NA`

to write down values we do not know. `NA`

's can give us valuable insight about our data.

`NA`

stands for "Not Available" and helps keep our data consistent. Any operations with a `NA`

value will **always** result in `NA`

.

> x <- c(5,NA,4,8) > y <- c(2,4,5,NA) > x * y

//Output Below

[1] 10 NA 20 NA

Nice! `NA`

multiplied or divided with another number still results in `NA`

.

What do you think is the result of the following expression?

> result <- NA != 0 > result

- [1] NA

Great work! `NA`

is a placeholder for a missing value, so it doesn't make sense comparing it with other values.

We've invited some friends and family to one last BBQ, but some didn't confirm if they can make it.

Let's use a function called `is.na()`

to see how many guests have not confirmed.

> rsvp_family <- c(TRUE,FALSE,NA,NA,NA,FALSE) > rsvp_friends <- c(NA,NA,TRUE,TRUE,TRUE) > rsvp <- c(rspv_family,rsvp_friends) > is.na(rsvp)

//Output Below

[1] FALSE FALSE TRUE TRUE [5] TRUE FALSE TRUE TRUE [9] FALSE FALSE FALSE

Boom! The result is a logical vector which displays `TRUE`

for every `NA`

value. Five `TRUE`

values means that 5 people did not get back to us.

A **matrix** is a two-dimensional object. In other words, we can store data in a matrix in both **rows** and **columns**.

> my_matrix

//Output Below

[,1] [,2] [,3] [1,] 1 3 5 [2,] 2 4 6

See that? We just printed out a matrix with 2 rows and 3 columns.

We use the `matrix()`

function to create a matrix from vectors. Similar to vectors, a matrix can only contain **one type** of value.

Let's create a matrix that has only **two rows**.

> data <- c(1, 2, 3, 4, 5, 6) > my_matrix <- matrix(data,nrow=2,ncol=3) > my_matrix

//Output Below

[,1] [,2] [,3] [1,] 1 3 5 [2,] 2 4 6

Awesome! To create a matrix we need to provide a **vector** and specify the number of **rows** and **columns**.

Now let's make a matrix that only has two columns.

> data <- c(1, 2, 3, 4, 5, 6) > my_matrix <- matrix(data,nrow=3,ncol=2) > my_matrix

//Output Below

[,1] [,2] [1,] 1 4 [2,] 2 5 [3,] 3 6

Whop whop! Creating matrices is easy in R. Let's see what else we can find out about matrices.

We can fill our matrices in two ways: by rows or by columns. To do so, we just need to add the `byrow`

attribute and set it to `TRUE`

or `FALSE`

.

> data <- 1:6 > my_matrix <- matrix(data,nrow=2,ncol=3,byrow=TRUE) > my_matrix

//Output Below

[,1] [,2] [,3] [1,] 1 2 3 [2,] 4 5 6

Sweet! If `byrow=TRUE`

, the first **row** is filled, followed by the second.

Let's figure out how to add the values by column.

> data <- 1:6 > my_matrix <- matrix(data,nrow=2,ncol=3,byrow=FALSE) > my_matrix

//Output Below

[,1] [,2] [,3] [1,] 1 3 5 [2,] 2 4 6

Wohoo! When `byrow`

is `FALSE`

, the first **column** is filled, followed by the second and so on.

We can also **rename** the **columns** of our matrix using the `colnames()`

function.

> my_matrix <- matrix(1:6,3) > colnames(my_matrix) <- c("run","cycle") > my_matrix

//Output Below

run cycle [1,] 1 4 [2,] 2 5 [3,] 3 6

Amazing! Notice how we don't have to type `nrow = 3`

. The second variable in `matrix()`

is automatically treated as the number of rows.

We can rename the rows of the matrix in a similar fashion using the `rownames()`

function.

> my_matrix <- matrix(1:6, 3) > rownames(my_matrix) <- c("Anne","Luke","Emma") > my_matrix

//Output Below

run cycle Anne 1 4 Luke 2 5 Emma 3 6

Nice! We can now see who's a better runner or biker.

Let's calculate the total distance per person by computing a **sum** of each **row** using our previous matrix.

> my_matrix > rowSums(my_matrix)

//Output Below

run cycle Anne 1 4 Luke 2 5 Emma 3 6 Anne Luke Emma 5 7 9

Awesome! `rowSums()`

returns a vector containing sums for each **row** in the matrix.

We can now try computing the distance covered by all individuals **per activity**.

> my_matrix > colSums(my_matrix)

//Output Below

run cycle Anne 1 4 Luke 2 5 Emma 3 6 [1] 6 15

Nice! `colSums()`

returns a vector containing sums for each **column** in the matrix.

We subset matrices by typing the **row** and then the **column** indices inside `[]`

.

Let's try to get the 3rd row and second column element from `my_matrix`

.

> my_matrix > my_matrix[3,2]

//Output Below

[,1] [,2] [1,] 1 4 [2,] 2 5 [3,] 3 6 [1] 6

Nice! As with vectors, we subset matrices using brackets in the following format: `[row_index , column_index]`

.

If we want to **add a row** to our matrix, we can do so by simply using the `rbind()`

function.

> rbind(my_matrix, c(7,8)) > my_matrix

//Output Below

[,1] [,2] [1,] 1 4 [2,] 2 5 [3,] 3 6 [4,] 7 8

Nice! Remember that the row being added must have the same **length** as the number of columns in the matrix.

To **add a column**, we need to use the `cbind()`

function. We also have to keep a careful eye on the **length** of the added column.

> my_matrix <- matrix(1:6, 3) > cbind(my_matrix, c(9, 8, 7))

//Output Below

[,1] [,2] [,3] [1,] 1 4 9 [2,] 2 5 8 [3,] 3 6 7

Awesome! The column we are adding should have the same **length** as the number of rows in our matrix.

Dividing or multiplying a matrix with a **constant** will divide or multiply every matrix element with that constant.

Let's see what happens when we divide a matrix by `2`

.

> my_matrix <- matrix(1:6, 3) > my_matrix / 2

//Output Below

[,1] [,2] [1,] 0.5 2.0 [2,] 1.0 2.5 [3,] 1.5 3.0

Sweet! `*`

, `+`

, and `-`

work the same way. The operation is carried out between the constant and each **individual element**.

Doing arithmetic calculation with two matrices of the same size will compute the corresponding matrix elements.

> my_matrix <- matrix(1:6, 3) > my_matrix + my_matrix

//Output Below

[,1] [,2] [1,] 2 8 [2,] 4 10 [3,] 6 12

Great! The arithmetic operation is performed between each **corresponding element**.

If we have a lot of repeating values that can be grouped into a limited number of distinct **categories**, we can store them in a **factor**.

> os <- c("Android", "iOS", "Android") > os_factor <- factor(os) > os_factor

//Output Below

[1] Android iOS Android Levels: Android iOS

Nice! The **levels** row lets us know we only have two **distinct** categories, `Android`

and `iOS`

.

Let's see what R thinks about different operating systems.

> os <- c("Android", "iOS", "Android") > os_factor <- factor(os) > android <- os_factor[1] > ios <- os_factor[2] > android > iOS

//Output Below

[1] NA Warning message:In Ops.factor(android, ios) : ‘>’ not meaningful for factors

Phew! By default **factors** do not discriminate between its **levels**.

We can also have categories that have an **order**. We can specify this by creating so-called **ordinal categorical** variables.

> length <- c("medium", "short", "long", "short", "medium") > l_fctr <- factor(length, order = TRUE, levels= c("short", "medium", "long")) > l_fctr

//Output Below

[1] medium short long short medium Levels: short < medium < long

Awesome! Note how the **levels** are now displayed in our given **order**.

Let's now see if `"medium"`

is greater than `"short"`

in our **ordered** factor.

> length <- c("medium", "short", "long") > l_fctr <- factor(length, order = TRUE, levels= c("short", "medium", "long")) > l_fctr[1] > l_fctr[2]

//Output Below

[1] TRUE

Nice! Once we set **order** as `TRUE`

we can compare different **levels** from our factor.

R has a cool ability to **summarize** the data from our objects.

> group_vector <- c("C", "B", "A", "C", "A") > group_fac <- factor(group_vector) > summary(group_fac)

//Output Below

A B C 2 1 2

Nice! `summary()`

displays the categories and the number of so-called **observations** per category.

Sometimes our data might contain factors with long **level** names and we may want to rename them.

> vector <- c("A_grp", "B_grp", "A_grp") > fctr <- factor(vector) > levels(fctr) <- c("A", "B") > fctr

//Output Below

[1] A B A Levels: A B

Nice! We **rename** factor levels with the `levels()`

function.

**Data frames** are similar to matrices, but can also contain elements of **different types**.

> quantity <- c(200, 300, 100) > crop <- c("corn", "leek", "pea") > subsidy <- c(TRUE, FALSE, TRUE) > my_df <- data.frame(quantity, crop, subsidy) > my_df

//Output Below

quantity crop subsidy 1 200 corn TRUE 2 300 leek FALSE 3 100 pea TRUE

Nice! Data frames are two-dimensional objects where **variables** are stored as columns and **observations** as rows.

A great way to explore your data is by using the `str()`

function. Let's apply it to our data frame.

> quantity <- c(200, 300, 100) > crop <- c("corn", "leek", "pea") > subsidy <- c(TRUE, FALSE, TRUE) > my_df <- data.frame(quantity, crop, subsidy) > str(my_df)

//Output Below

'data.frame': 3 obs. of 3 variables: $ quantity: num 200 300 100 $ crop : Factor w/ 3 levels "corn","leek",..: 1 2 3 $ subsidy : logi TRUE FALSE TRUE

See that? We can now look at the number of observations and variable types. Notice that `crop`

is of type **factor** instead of **character**.

We saw that by default, **strings** were saved as factors in our data frame. Sometimes we may want to override this, and store them just as strings.

> farmer <- c("Bob", "Sam", "Mike") > my_df <- data.frame(quantity, crop, farmer, subsidy, stringsAsFactors = FALSE) > class(my_df$crop)

//Output Below

[1] "character"

It makes sense to leave `"crop"`

as a **factor** since it's a finite **category**. Personal names, however, are better saved as **character**.

We can add a **new variable** to our data frame and **name** it at the same time. Let's add the `farmer`

column.

> my_df$farmer <- c("Bob", "Sam", "Mike") > my_df

//Output Below

quantity crop subsidy farmer 1 200 corn TRUE Bob 2 300 leek FALSE Sam 3 100 pea TRUE Mike

Yass! With the `$`

symbol, we can appended and name a column. Adding string vector in this way saves it as **character**.

We can **subset** a particular **variable** column by typing the data frame name followed by `$`

and the variable name.

Let's see if we can also subset crop in this code snippet.

> my_df$quantity > my_df$crop

//Output Below

[1] 200 300 100 [1] "corn" "leek" "pea"

Nice! Output is an **atomic vector** of a particular **type**.

Another attribute we can use to select a data frame variable is a **double square bracket**`[[ ]]`

.

> my_df[["quantity"]] > my_df[[1]]

//Output Below

[1] 200 300 100 [1] 200 300 100

Note that we can either type the column **index** or the column **name** inside the `[[ ]]`

.

As with vectors and matrices, we can call various data frame **subsets** by using simple **square brackets**.

How would we select all of the rows, but only just 2 columns?

> my_df[ , 3:4]

//Output Below

subsidy farmer 1 TRUE Bob 2 FALSE Sam 3 TRUE Mike

Yass! We've just subsetted 2 columns. By using different **index** combinations we can subset single elements, rows or two-dimensional arrays.

What might be the best way to only select 1 column?

> my_df[ , 3]

//Output Below

[1] TRUE FALSE TRUE

Awesome! By leaving the **row** empty and only writing the index of one column we are able to get the column.

Let's select only the **rows** that are receiving `subsidy`

. Remember, `subsidy`

is a logical variable.

my_df[subsidy, ]

//Output Below

quantity crop subsidy farmer 1 200 corn TRUE Bob 3 100 pea TRUE Mike

Awesome! we see only the rows containing `TRUE`

in the `subsidy`

column.

We can **sort** our data frame by a particular column using the `order()`

function. Sort `my_df`

by the `quantity`

columns.

> order(my_df$quantity) > my_df[order(my_df$quantity), ]

//Output Below

[1] 3 1 2 quantity crop subsidy farmer 3 100 pea TRUE Mike 1 200 corn TRUE Bob 2 300 leek FALSE Sam

See that? The `order()`

functions allows us to sort vectors. By default, it sorts in **ascending** order.

Unlike vectors, R **lists** can hold components of different **data types**.

> my_list <- list(2,"c",TRUE) > my_list

//Output Below

[[1]] [1] 2 [[2]] [1] "c" [[3]] [1] TRUE

Nice! we just created a **list** using the `list()`

function and

assigned it a numeric, a character, and a logical element **type**.

Lists can also contain more complex data components such as whole vectors, matrices or data frames.

> vec <- 1:6 > mat <- matrix(1:6,nrow=2) > df <- data.frame(mat) > list(vec, mat, df)

//Output Below

[[1]] [1] 1 2 3 4 5 6 [[2]] [,1] [,2] [,3] [1,] 1 3 5 [2,] 2 4 6 [[3]] X1 X2 X3 1 1 3 5 2 2 4 6

Great! We can see that **lists** are very flexible. We could even store a list inside another list if we wanted to.

By default, list component **names** are simply their indices.

Let's rename the first component of our list to `vector`

.

> vec <- 1:6 > mat <- matrix(1:6,nrow=2) > df <- data.frame(mat) > my_list <- list(vector=vec,matrix=mat,dframe=df)

//Output Below

$vector [1] 1 2 3 4 5 6 $matrix [,1] [,2] [,3] [1,] 1 3 5 [2,] 2 4 6 $dframe X1 X2 X3 1 1 3 5 2 2 4 6

Pretty straightforward, right?

We can extract list components by using double brackets around the **index** or the **name**. We can also use the `$`

sign with the name.

Let's see if we can extract a component by using it's index.

> my_list <- list(int=2,vec=1:6,bool=TRUE) > my_list[[2]]

//Output Below

[1] 1 2 3 4 5 6

Nice! Lists provide us with many different ways of extracting components.

Let's extract the same element, only this time, let's use it's name.

> my_list <- list(int=2,vec=1:6,bool=TRUE) > my_list$vec

//Output Below

[1] 1 2 3 4 5 6

Exactly! We just need to use the `$`

symbol alongside the name.

What if we want to extract a 3rd element of a **vector** that belongs to a list?

> my_list <- list(int=2,vec=1:6,bool=TRUE) > print(my_list[[2]][3])

//Output Below

[1] 3

Awesome! The first `[[2]]`

indicates the vector element in the list, while the `[3]`

indicates the element we want from that vector.

We can add a component of any type to our list by using the `c()`

function.

> my_list <- list(int=2,vec=1:6,bool=TRUE) > my_list <- c(my_list,TRUE) > my_list

//Output Below

$int [1] 2 $vec [1] 1 2 3 4 5 6 $bool [1] TRUE [[4]] [1] TRUE

See that? R automatically converts other elements in the `c()`

function into **list elements**.