Day 14: `c_across()`

Designed to work with rowwise(), c_across() uses tidyselect semantics to make it easier to perform row-wise aggregations.

Published

December 14, 2022

Columns are special in R, which means that operating on data by row takes a bit more thought. The pros and cons of various approaches featuring purrr have been very well elucidated by Jenny Bryan in Row-oriented workflows in R with the tidyverse (which includes code examples, a slide deck, and a webinar recording), and I won’t seek to re-create that here.

c_across() came to dplyr (Wickham et al. 2022) in 2020 (after Jenny’s webinar) to make it easier to select variables when doing operations with rowwise(). Powered by tidyselect (Henry and Wickham 2022), it uses the same semantics as dplyr::select() , allowing you to do things like use : to select a range of consecutive variables, and use selection helpers such as everything() and where().

First, we’ll load our library and create some data to play with.

library(tidyverse)

df <- tibble(
  id = 1:5,
  w = seq(5, 25, 5),
  x = seq(10, 50, 10),
  y = seq(60, 100, 10),
  z = seq(100, 500, 100)
)

df

# A tibble: 5 × 5
     id     w     x     y     z
  <int> <dbl> <dbl> <dbl> <dbl>
1     1     5    10    60   100
2     2    10    20    70   200
3     3    15    30    80   300
4     4    20    40    90   400
5     5    25    50   100   500

As a reminder, rowwise() (like group_by()) changes how the other verbs work. Without rowwise(), the output of our mutate() call, below will be the same for every row (since it thinks you are selecting all of each column every time). With rowwise(), it will compute by row.

# WITHOUT `rowwise()`
df |>
  mutate(m = mean(c(w, x, y, z)))

# A tibble: 5 × 6
     id     w     x     y     z     m
  <int> <dbl> <dbl> <dbl> <dbl> <dbl>
1     1     5    10    60   100  106.
2     2    10    20    70   200  106.
3     3    15    30    80   300  106.
4     4    20    40    90   400  106.
5     5    25    50   100   500  106.

# WITH `rowwise()`
df |>
  rowwise() |>
  mutate(m = mean(c(w, x, y, z)))

# A tibble: 5 × 6
# Rowwise: 
     id     w     x     y     z     m
  <int> <dbl> <dbl> <dbl> <dbl> <dbl>
1     1     5    10    60   100  43.8
2     2    10    20    70   200  75  
3     3    15    30    80   300 106. 
4     4    20    40    90   400 138. 
5     5    25    50   100   500 169.

Typing all those variable names can get pretty annoying. Let’s do our rowwise() mutate again, this time with c_across() and selecting our range of variables with :.

df |>
  rowwise() |>
  mutate(m = mean(c_across(w:z)))

# A tibble: 5 × 6
# Rowwise: 
     id     w     x     y     z     m
  <int> <dbl> <dbl> <dbl> <dbl> <dbl>
1     1     5    10    60   100  43.8
2     2    10    20    70   200  75  
3     3    15    30    80   300 106. 
4     4    20    40    90   400 138. 
5     5    25    50   100   500 169.

rowwise() is a special form of grouping (it groups by row). So, like group_by(), you can give it a grouping variable to preserve for each row. Below, we’ll preserve id for each row. As always, you can use mutate() to add a new column to the data frame, or summarise() to get just the summary value. We can also use other tidyselect helpers in c_across(), such as everything() and where(), which will produce the same results in the cases below.

df |> 
  rowwise(id) |> 
  mutate(total = sum(c_across(everything())))

# A tibble: 5 × 6
# Rowwise:  id
     id     w     x     y     z total
  <int> <dbl> <dbl> <dbl> <dbl> <dbl>
1     1     5    10    60   100   175
2     2    10    20    70   200   300
3     3    15    30    80   300   425
4     4    20    40    90   400   550
5     5    25    50   100   500   675

df |> 
  rowwise(id) |> 
  summarise(total = sum(c_across(where(is.numeric))))

`summarise()` has grouped output by 'id'. You can override using the `.groups`
argument.

# A tibble: 5 × 2
# Groups:   id [5]
     id total
  <int> <dbl>
1     1   175
2     2   300
3     3   425
4     4   550
5     5   675

Learn more

To learn more about working with rows in dplyr, see the Row-wise operations vignette. For more on row-oriented workflows in the tidyverse with purrr, see Jenny Bryan’s linked resources, as well as the purrr documentation for the pmap() family of functions.

References

Henry, Lionel, and Hadley Wickham. 2022. tidyselect: Select from a Set of Strings. https://tidyselect.r-lib.org.

Wickham, Hadley, Romain François, Lionel Henry, and Kirill Müller. 2022. dplyr: A Grammar of Data Manipulation. https://dplyr.tidyverse.org.