library(tidyverse)
<- read_csv(here::here("data", "bref-data.csv")) bref_data
Day 06: glimpse()
glimpse()
is like a transposed version of print()
that lets you see your columns as rows.
A function so nice we’ve exported (and documented) it twice, glimpse()
is provided by the pillar package (Müller and Wickham 2022), but is re-exported by dplyr (Wickham et al. 2022) for your convenience. Heck, we even named a newsletter after it!
What exactly does glimpse do? Well, according to the function reference:
glimpse()
is like a transposed version ofprint()
: columns run down the page, and data runs across. This makes it possible to see every column in a data frame. It’s a little likestr()
applied to a data frame but it tries to show you as much data as possible. (And it always shows the underlying data, even when applied to a remote data source.)
This is a case where it’s easier to show than tell. Let’s take a look at some real data, we’ll use today’s 2022-2023 NBA Player Stats: Totals acquired through Basketball Reference.
Our data is in a tibble, so the default print formatting is pretty good. It shows us as many columns as can fit on the screen as determined by the width
option with their types and first ten rows, followed by a summary of the remaining rows and columns.
bref_data#> # A tibble: 480 × 30
#> player position age team games game_starts minutes fgm fga fg_percent
#> <chr> <chr> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Kevin … PF 34 BRK 25 25 922 265 479 0.553
#> 2 Jayson… PF 24 BOS 24 24 887 242 504 0.48
#> 3 Luka D… PG 23 DAL 22 22 810 254 498 0.51
#> 4 Stephe… PG 34 GSW 23 23 800 233 467 0.499
#> 5 Shai G… SG 24 OKC 22 22 786 235 464 0.506
#> 6 Devin … SG 26 PHO 24 24 859 244 500 0.488
#> 7 Donova… SG 26 CLE 22 22 803 216 443 0.488
#> 8 Jaylen… SF 26 BOS 23 23 826 228 452 0.504
#> 9 Trae Y… PG 24 ATL 22 22 781 194 476 0.408
#> 10 Gianni… PF 28 MIL 19 19 626 223 410 0.544
#> # … with 470 more rows, and 20 more variables: fg3m <dbl>, fg3a <dbl>,
#> # fg3_percent <dbl>, fg2m <dbl>, fg2a <dbl>, fg2_percent <dbl>,
#> # e_fg_percent <dbl>, ftm <dbl>, fta <dbl>, ft_percent <dbl>, oreb <dbl>,
#> # dreb <dbl>, treb <dbl>, ast <dbl>, stl <dbl>, blk <dbl>, tov <dbl>,
#> # pf <dbl>, pts <dbl>, player_additional <chr>
However, because this tibble is quite wide, there’s a lot of information being relegated to that dense summary at the bottom. glimpse()
is the perfect way to get a bit more detail about what’s in there before doing my analysis.
Below, I’ll put the data directly into the function, but you can also use the pipe if you prefer.
glimpse(bref_data)
#> Rows: 480
#> Columns: 30
#> $ player <chr> "Kevin Durant", "Jayson Tatum", "Luka Dončić", "Step…
#> $ position <chr> "PF", "PF", "PG", "PG", "SG", "SG", "SG", "SF", "PG"…
#> $ age <dbl> 34, 24, 23, 34, 24, 26, 26, 26, 24, 28, 33, 25, 29, …
#> $ team <chr> "BRK", "BOS", "DAL", "GSW", "OKC", "PHO", "CLE", "BO…
#> $ games <dbl> 25, 24, 22, 23, 22, 24, 22, 23, 22, 19, 23, 26, 20, …
#> $ game_starts <dbl> 25, 24, 22, 23, 22, 24, 22, 23, 22, 19, 23, 26, 20, …
#> $ minutes <dbl> 922, 887, 810, 800, 786, 859, 803, 826, 781, 626, 80…
#> $ fgm <dbl> 265, 242, 254, 233, 235, 244, 216, 228, 194, 223, 20…
#> $ fga <dbl> 479, 504, 498, 467, 464, 500, 443, 452, 476, 410, 41…
#> $ fg_percent <dbl> 0.553, 0.480, 0.510, 0.499, 0.506, 0.488, 0.488, 0.5…
#> $ fg3m <dbl> 39, 82, 61, 117, 22, 57, 85, 55, 48, 15, 10, 69, 9, …
#> $ fg3a <dbl> 115, 225, 178, 271, 67, 146, 202, 157, 162, 63, 32, …
#> $ fg3_percent <dbl> 0.339, 0.364, 0.343, 0.432, 0.328, 0.390, 0.421, 0.3…
#> $ fg2m <dbl> 226, 160, 193, 116, 213, 187, 131, 173, 146, 208, 19…
#> $ fg2a <dbl> 364, 279, 320, 196, 397, 354, 241, 295, 314, 347, 38…
#> $ fg2_percent <dbl> 0.621, 0.573, 0.603, 0.592, 0.537, 0.528, 0.544, 0.5…
#> $ e_fg_percent <dbl> 0.594, 0.562, 0.571, 0.624, 0.530, 0.545, 0.584, 0.5…
#> $ ftm <dbl> 178, 172, 165, 106, 197, 136, 108, 104, 171, 145, 16…
#> $ fta <dbl> 194, 198, 230, 117, 212, 156, 122, 125, 190, 234, 18…
#> $ ft_percent <dbl> 0.918, 0.869, 0.717, 0.906, 0.929, 0.872, 0.885, 0.8…
#> $ oreb <dbl> 9, 27, 19, 12, 21, 20, 18, 22, 17, 39, 11, 57, 68, 2…
#> $ dreb <dbl> 157, 172, 168, 140, 84, 97, 67, 141, 47, 176, 90, 16…
#> $ treb <dbl> 166, 199, 187, 152, 105, 117, 85, 163, 64, 215, 101,…
#> $ ast <dbl> 135, 101, 188, 162, 132, 142, 107, 84, 212, 104, 107…
#> $ stl <dbl> 18, 24, 40, 24, 39, 24, 30, 22, 16, 16, 21, 15, 27, …
#> $ blk <dbl> 44, 25, 15, 6, 25, 10, 10, 10, 3, 20, 9, 18, 48, 5, …
#> $ tov <dbl> 86, 66, 79, 72, 73, 61, 68, 71, 76, 70, 45, 56, 40, …
#> $ pf <dbl> 62, 54, 64, 49, 54, 64, 54, 63, 30, 65, 63, 53, 57, …
#> $ pts <dbl> 747, 738, 734, 689, 689, 681, 625, 615, 607, 606, 58…
#> $ player_additional <chr> "duranke01", "tatumja01", "doncilu01", "curryst01", …
Behold! My columns are all lined up (as rows), with their types and previews of the data therein. It’s not a detailed summary1, but it’s a nice glimpse of what your data hold.
If you’d really like the nitty-gritty details behind glimpse()
’s formatting, pillar’s format_glimpse()
provides the logic for its printing of vectors.