Day 06: glimpse()

An easy way to take a quick peek at all of the columns in your data frame (and as much data as fits on a row), glimpse() is like a transposed version of print() that lets you see your columns as rows.
Published

December 6, 2022

A function so nice we’ve exported (and documented) it twice, glimpse() is provided by the pillar package (Müller and Wickham 2022), but is re-exported by dplyr (Wickham et al. 2022) for your convenience. Heck, we even named a newsletter after it!

What exactly does glimpse do? Well, according to the function reference:

glimpse() is like a transposed version of print(): columns run down the page, and data runs across. This makes it possible to see every column in a data frame. It’s a little like str() applied to a data frame but it tries to show you as much data as possible. (And it always shows the underlying data, even when applied to a remote data source.)

This is a case where it’s easier to show than tell. Let’s take a look at some real data, we’ll use today’s 2022-2023 NBA Player Stats: Totals acquired through Basketball Reference.

library(tidyverse)
bref_data <- read_csv(here::here("data", "bref-data.csv"))

Our data is in a tibble, so the default print formatting is pretty good. It shows us as many columns as can fit on the screen as determined by the width option with their types and first ten rows, followed by a summary of the remaining rows and columns.

bref_data
#> # A tibble: 480 × 30
#>    player  position   age team  games game_starts minutes   fgm   fga fg_percent
#>    <chr>   <chr>    <dbl> <chr> <dbl>       <dbl>   <dbl> <dbl> <dbl>      <dbl>
#>  1 Kevin … PF          34 BRK      25          25     922   265   479      0.553
#>  2 Jayson… PF          24 BOS      24          24     887   242   504      0.48 
#>  3 Luka D… PG          23 DAL      22          22     810   254   498      0.51 
#>  4 Stephe… PG          34 GSW      23          23     800   233   467      0.499
#>  5 Shai G… SG          24 OKC      22          22     786   235   464      0.506
#>  6 Devin … SG          26 PHO      24          24     859   244   500      0.488
#>  7 Donova… SG          26 CLE      22          22     803   216   443      0.488
#>  8 Jaylen… SF          26 BOS      23          23     826   228   452      0.504
#>  9 Trae Y… PG          24 ATL      22          22     781   194   476      0.408
#> 10 Gianni… PF          28 MIL      19          19     626   223   410      0.544
#> # … with 470 more rows, and 20 more variables: fg3m <dbl>, fg3a <dbl>,
#> #   fg3_percent <dbl>, fg2m <dbl>, fg2a <dbl>, fg2_percent <dbl>,
#> #   e_fg_percent <dbl>, ftm <dbl>, fta <dbl>, ft_percent <dbl>, oreb <dbl>,
#> #   dreb <dbl>, treb <dbl>, ast <dbl>, stl <dbl>, blk <dbl>, tov <dbl>,
#> #   pf <dbl>, pts <dbl>, player_additional <chr>

However, because this tibble is quite wide, there’s a lot of information being relegated to that dense summary at the bottom. glimpse() is the perfect way to get a bit more detail about what’s in there before doing my analysis.

Below, I’ll put the data directly into the function, but you can also use the pipe if you prefer.

glimpse(bref_data)
#> Rows: 480
#> Columns: 30
#> $ player            <chr> "Kevin Durant", "Jayson Tatum", "Luka Dončić", "Step…
#> $ position          <chr> "PF", "PF", "PG", "PG", "SG", "SG", "SG", "SF", "PG"…
#> $ age               <dbl> 34, 24, 23, 34, 24, 26, 26, 26, 24, 28, 33, 25, 29, …
#> $ team              <chr> "BRK", "BOS", "DAL", "GSW", "OKC", "PHO", "CLE", "BO…
#> $ games             <dbl> 25, 24, 22, 23, 22, 24, 22, 23, 22, 19, 23, 26, 20, …
#> $ game_starts       <dbl> 25, 24, 22, 23, 22, 24, 22, 23, 22, 19, 23, 26, 20, …
#> $ minutes           <dbl> 922, 887, 810, 800, 786, 859, 803, 826, 781, 626, 80…
#> $ fgm               <dbl> 265, 242, 254, 233, 235, 244, 216, 228, 194, 223, 20…
#> $ fga               <dbl> 479, 504, 498, 467, 464, 500, 443, 452, 476, 410, 41…
#> $ fg_percent        <dbl> 0.553, 0.480, 0.510, 0.499, 0.506, 0.488, 0.488, 0.5…
#> $ fg3m              <dbl> 39, 82, 61, 117, 22, 57, 85, 55, 48, 15, 10, 69, 9, …
#> $ fg3a              <dbl> 115, 225, 178, 271, 67, 146, 202, 157, 162, 63, 32, …
#> $ fg3_percent       <dbl> 0.339, 0.364, 0.343, 0.432, 0.328, 0.390, 0.421, 0.3…
#> $ fg2m              <dbl> 226, 160, 193, 116, 213, 187, 131, 173, 146, 208, 19…
#> $ fg2a              <dbl> 364, 279, 320, 196, 397, 354, 241, 295, 314, 347, 38…
#> $ fg2_percent       <dbl> 0.621, 0.573, 0.603, 0.592, 0.537, 0.528, 0.544, 0.5…
#> $ e_fg_percent      <dbl> 0.594, 0.562, 0.571, 0.624, 0.530, 0.545, 0.584, 0.5…
#> $ ftm               <dbl> 178, 172, 165, 106, 197, 136, 108, 104, 171, 145, 16…
#> $ fta               <dbl> 194, 198, 230, 117, 212, 156, 122, 125, 190, 234, 18…
#> $ ft_percent        <dbl> 0.918, 0.869, 0.717, 0.906, 0.929, 0.872, 0.885, 0.8…
#> $ oreb              <dbl> 9, 27, 19, 12, 21, 20, 18, 22, 17, 39, 11, 57, 68, 2…
#> $ dreb              <dbl> 157, 172, 168, 140, 84, 97, 67, 141, 47, 176, 90, 16…
#> $ treb              <dbl> 166, 199, 187, 152, 105, 117, 85, 163, 64, 215, 101,…
#> $ ast               <dbl> 135, 101, 188, 162, 132, 142, 107, 84, 212, 104, 107…
#> $ stl               <dbl> 18, 24, 40, 24, 39, 24, 30, 22, 16, 16, 21, 15, 27, …
#> $ blk               <dbl> 44, 25, 15, 6, 25, 10, 10, 10, 3, 20, 9, 18, 48, 5, …
#> $ tov               <dbl> 86, 66, 79, 72, 73, 61, 68, 71, 76, 70, 45, 56, 40, …
#> $ pf                <dbl> 62, 54, 64, 49, 54, 64, 54, 63, 30, 65, 63, 53, 57, …
#> $ pts               <dbl> 747, 738, 734, 689, 689, 681, 625, 615, 607, 606, 58…
#> $ player_additional <chr> "duranke01", "tatumja01", "doncilu01", "curryst01", …

Behold! My columns are all lined up (as rows), with their types and previews of the data therein. It’s not a detailed summary1, but it’s a nice glimpse of what your data hold.

Note

If you’d really like the nitty-gritty details behind glimpse()’s formatting, pillar’s format_glimpse() provides the logic for its printing of vectors.

References

Müller, Kirill, and Hadley Wickham. 2022. pillar: Coloured Formatting for Columns. https://pillar.r-lib.org.
Waring, Elin, Michael Quinn, Amelia McNamara, Eduardo Arino de la Rubia, Hao Zhu, and Shannon Ellis. 2022. skimr: Compact and Flexible Summaries of Data. https://CRAN.R-project.org/package=skimr.
Wickham, Hadley, Romain François, Lionel Henry, and Kirill Müller. 2022. dplyr: A Grammar of Data Manipulation. https://dplyr.tidyverse.org.