Day 05: fct_collapse()

Easily combine multiple factor levels with forcats::fct_collapse().

Published

December 5, 2022

The forcats package (Wickham 2022) provides several functions for changing the value of factor levels—the most generic being fct_recode(), which lets you change factor levels by hand.

fct_collapse() is a variant of fct_recode() that lets you combine (or collapse) a vector of old levels into a new level (rather than specifying the new level for each individual old level, as you’d have to do with fct_recode()).

It’s a function I use when I want to effectively lower the resolution of my data—i.e. the original levels are more specific than I need for the purposes of my analysis.1

In this simple example, we’ll combine political affiliations (partyid) from the General Social Survey dataset (gss_cat) included in forcats.

library(tidyverse)
library(forcats)

Let’s look at the levels we have to begin with:

levels(gss_cat$partyid)
 [1] "No answer"          "Don't know"         "Other party"       
 [4] "Strong republican"  "Not str republican" "Ind,near rep"      
 [7] "Independent"        "Ind,near dem"       "Not str democrat"  
[10] "Strong democrat"   

Perhaps I just want to know if they’re a Republican, Democrat, Independent, or something else–a perfect use case for fct_collapse().

gss_cat |>
  mutate(
    party_generic = fct_collapse(partyid,
      "Other" = c("No answer", "Don't know", "Other party"),
      "Republican" = c("Strong republican", "Not str republican"),
      "Independent" = c("Independent", "Ind,near dem", "Ind,near rep"),
      "Democrat" = c("Strong democrat", "Not str democrat")
    )
  ) |> 
  count(party_generic)
# A tibble: 4 × 2
  party_generic     n
  <fct>         <int>
1 Other           548
2 Republican     5346
3 Independent    8409
4 Democrat       7180

Learn more

For more information, see the fct_collapse() function reference, as well as the section on modifying factor levels in R for Data Science (Wickham, Grolemund, and Çetinkaya-Rundel 2022).

References

Wickham, Hadley. 2022. forcats: Tools for Working with Categorical Variables (Factors). https://CRAN.R-project.org/package=forcats.
Wickham, Hadley, Garrett Grolemund, and Mine Çetinkaya-Rundel. 2022. R for Data Science (2e). Second. O’Reilly. https://r4ds.hadley.nz.