library(tidyverse)
library(forcats)Day 05: fct_collapse()
Easily combine multiple factor levels with forcats::fct_collapse().
The forcats package (Wickham 2022) provides several functions for changing the value of factor levels—the most generic being fct_recode(), which lets you change factor levels by hand.
fct_collapse() is a variant of fct_recode() that lets you combine (or collapse) a vector of old levels into a new level (rather than specifying the new level for each individual old level, as you’d have to do with fct_recode()).
It’s a function I use when I want to effectively lower the resolution of my data—i.e. the original levels are more specific than I need for the purposes of my analysis.1
In this simple example, we’ll combine political affiliations (partyid) from the General Social Survey dataset (gss_cat) included in forcats.
Let’s look at the levels we have to begin with:
levels(gss_cat$partyid) [1] "No answer" "Don't know" "Other party"
[4] "Strong republican" "Not str republican" "Ind,near rep"
[7] "Independent" "Ind,near dem" "Not str democrat"
[10] "Strong democrat"
Perhaps I just want to know if they’re a Republican, Democrat, Independent, or something else–a perfect use case for fct_collapse().
gss_cat |>
mutate(
party_generic = fct_collapse(partyid,
"Other" = c("No answer", "Don't know", "Other party"),
"Republican" = c("Strong republican", "Not str republican"),
"Independent" = c("Independent", "Ind,near dem", "Ind,near rep"),
"Democrat" = c("Strong democrat", "Not str democrat")
)
) |>
count(party_generic)# A tibble: 4 × 2
party_generic n
<fct> <int>
1 Other 548
2 Republican 5346
3 Independent 8409
4 Democrat 7180
Learn more
For more information, see the fct_collapse() function reference, as well as the section on modifying factor levels in R for Data Science (Wickham, Grolemund, and Çetinkaya-Rundel 2022).