Factors
Factors
If we have a lot of repeating values that can be grouped into a limited number of distinct categories, we can store them in a factor.
> os <- c("Android", "iOS", "Android") > os_factor <- factor(os) > os_factor
//Output Below
[1] Android iOS Android Levels: Android iOS
Nice! The levels row lets us know we only have two distinct categories, Android
and iOS
.
Unordered factors
Let's see what R thinks about different operating systems.
> os <- c("Android", "iOS", "Android") > os_factor <- factor(os) > android <- os_factor[1] > ios <- os_factor[2] > android > iOS
//Output Below
[1] NA Warning message:In Ops.factor(android, ios) : ‘>’ not meaningful for factors
Phew! By default factors do not discriminate between its levels.
Ordered factors
We can also have categories that have an order. We can specify this by creating so-called ordinal categorical variables.
> length <- c("medium", "short", "long", "short", "medium") > l_fctr <- factor(length, order = TRUE, levels= c("short", "medium", "long")) > l_fctr
//Output Below
[1] medium short long short medium Levels: short < medium < long
Awesome! Note how the levels are now displayed in our given order.
Ordered factors II
Let's now see if "medium"
is greater than "short"
in our ordered factor.
> length <- c("medium", "short", "long") > l_fctr <- factor(length, order = TRUE, levels= c("short", "medium", "long")) > l_fctr[1] > l_fctr[2]
//Output Below
[1] TRUE
Nice! Once we set order as TRUE
we can compare different levels from our factor.
Factor summary
R has a cool ability to summarize the data from our objects.
> group_vector <- c("C", "B", "A", "C", "A") > group_fac <- factor(group_vector) > summary(group_fac)
//Output Below
A B C 2 1 2
Nice! summary()
displays the categories and the number of so-called observations per category.
Renaming levels
Sometimes our data might contain factors with long level names and we may want to rename them.
> vector <- c("A_grp", "B_grp", "A_grp") > fctr <- factor(vector) > levels(fctr) <- c("A", "B") > fctr
//Output Below
[1] A B A Levels: A B
Nice! We rename factor levels with the levels()
function.
Comments