library(tidyverse)
library(forcats)
Day 05: fct_collapse()
Easily combine multiple factor levels with forcats::fct_collapse()
.
The forcats package (Wickham 2022) provides several functions for changing the value of factor levels—the most generic being fct_recode()
, which lets you change factor levels by hand.
fct_collapse()
is a variant of fct_recode()
that lets you combine (or collapse) a vector of old levels into a new level (rather than specifying the new level for each individual old level, as you’d have to do with fct_recode()
).
It’s a function I use when I want to effectively lower the resolution of my data—i.e. the original levels are more specific than I need for the purposes of my analysis.1
In this simple example, we’ll combine political affiliations (partyid
) from the General Social Survey dataset (gss_cat
) included in forcats.
Let’s look at the levels we have to begin with:
levels(gss_cat$partyid)
[1] "No answer" "Don't know" "Other party"
[4] "Strong republican" "Not str republican" "Ind,near rep"
[7] "Independent" "Ind,near dem" "Not str democrat"
[10] "Strong democrat"
Perhaps I just want to know if they’re a Republican, Democrat, Independent, or something else–a perfect use case for fct_collapse()
.
|>
gss_cat mutate(
party_generic = fct_collapse(partyid,
"Other" = c("No answer", "Don't know", "Other party"),
"Republican" = c("Strong republican", "Not str republican"),
"Independent" = c("Independent", "Ind,near dem", "Ind,near rep"),
"Democrat" = c("Strong democrat", "Not str democrat")
)|>
) count(party_generic)
# A tibble: 4 × 2
party_generic n
<fct> <int>
1 Other 548
2 Republican 5346
3 Independent 8409
4 Democrat 7180
Learn more
For more information, see the fct_collapse()
function reference, as well as the section on modifying factor levels in R for Data Science (Wickham, Grolemund, and Çetinkaya-Rundel 2022).