Factors

Factors

If we have a lot of repeating values that can be grouped into a limited number of distinct categories, we can store them in a factor.

> os <- c("Android", "iOS", "Android")
> os_factor <- factor(os)
> os_factor

//Output Below

[1] Android iOS Android
Levels: Android iOS

Nice! The levels row lets us know we only have two distinct categories, Android and iOS.

Unordered factors

Let's see what R thinks about different operating systems.

> os <- c("Android", "iOS", "Android")
> os_factor <- factor(os)
> android <- os_factor[1]
> ios <- os_factor[2]
> android > iOS

//Output Below

[1] NA
Warning message:In Ops.factor(android, ios) : ‘>’ not meaningful for factors

Phew! By default factors do not discriminate between its levels.

Ordered factors

We can also have categories that have an order. We can specify this by creating so-called ordinal categorical variables.

> length <- c("medium", "short", "long", "short", "medium")
> l_fctr  <- factor(length, order = TRUE, levels= c("short", "medium", "long"))
> l_fctr 

//Output Below

[1] medium short  long   short  medium
Levels: short < medium < long

Awesome! Note how the levels are now displayed in our given order.

Ordered factors II

Let's now see if "medium" is greater than "short" in our ordered factor.

> length <- c("medium", "short", "long")
> l_fctr <- factor(length, order = TRUE, levels= c("short", "medium", "long"))
> l_fctr[1] > l_fctr[2]

//Output Below

[1] TRUE

Nice! Once we set order as TRUE we can compare different levels from our factor.

Factor summary

R has a cool ability to summarize the data from our objects.

> group_vector <- c("C", "B", "A", "C", "A")
> group_fac <- factor(group_vector)
> summary(group_fac)

//Output Below

A B C
2 1 2

Nice! summary() displays the categories and the number of so-called observations per category.

Renaming levels

Sometimes our data might contain factors with long level names and we may want to rename them.

> vector <- c("A_grp", "B_grp", "A_grp")
> fctr <- factor(vector)
> levels(fctr) <- c("A", "B")
> fctr

//Output Below

[1] A B A
Levels: A B

Nice! We rename factor levels with the levels() function.