Day 20: fs::dir_ls()

Equivalent to the ls command, dir_ls() returns filenames as a named character vector of fs_paths, making it perfect for passing into the .id argument of purrr::map_df() or the names_to argument of purrr::list_rbind().

Published

December 20, 2022

The fs package (Hester, Wickham, and Csárdi 2022) is built on the libuv C library, and provides a cross-platform, uniform interface to file-system operations. fs functions fall into four main categories, each of which has its own function prefix:

dir_ls() is equivalent to the Shell ls command. It returns filenames from the directory to which it is applied as a named fs_path1 character vector where the names are equivalent to the values. This is perfect for use with the .id argument of functions like purrr::map_df() the output of which stores the name (if .x is named).

  • 1 fs returns ‘tidy’ paths that: always use / to delimit directories, and never have multiple / or trailing /.

  • Let’s look at how we can use these functions together to read in a directory of CSVs with their filenames in the resulting data frame. In this example, I’ll use data from Basketball Reference with player statistics from several years of NBA Playoffs.

    I’ll start out by navigating to the directory using here::here(), since it’s located below my project root (as opposed to in the directory in which I’m writing this post).

    The target directory happens only to contain .csv files but, if it contained other filetypes, I could use the glob argument set to "*.csv" to restrict my results.

    library(fs)
    
    # get the files
    playoff_files <- dir_ls(here::here("data", "nba-playoffs"))
    
    # see the files
    playoff_files
    /Users/maraaverick/2022-tidyverse-advent/data/nba-playoffs/2018-player-totals.csv
    /Users/maraaverick/2022-tidyverse-advent/data/nba-playoffs/2019-player-totals.csv
    /Users/maraaverick/2022-tidyverse-advent/data/nba-playoffs/2020-player-totals.csv
    /Users/maraaverick/2022-tidyverse-advent/data/nba-playoffs/2021-player-totals.csv
    /Users/maraaverick/2022-tidyverse-advent/data/nba-playoffs/2022-player-totals.csv
    # read them into a single data frame with the filenames
    playoff_df <- playoff_files |> 
      purrr::map_df(readr::read_csv, .id = "file", show_col_types = FALSE)
    
    
    # glimpse the resulting data frame
    dplyr::glimpse(playoff_df)
    Rows: 1,095
    Columns: 32
    $ file                <chr> "/Users/maraaverick/2022-tidyverse-advent/data/nba…
    $ Rk                  <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,…
    $ Player              <chr> "Álex Abrines", "Steven Adams", "Bam Adebayo", "La…
    $ Pos                 <chr> "SG", "C", "C", "C", "PF", "SF", "SF", "PF", "PF",…
    $ Age                 <dbl> 24, 24, 20, 32, 27, 24, 24, 29, 23, 33, 20, 32, 28…
    $ Tm                  <chr> "OKC", "OKC", "MIA", "SAS", "POR", "PHI", "SAS", "…
    $ G                   <dbl> 6, 6, 5, 5, 4, 7, 5, 11, 7, 6, 10, 17, 2, 4, 1, 19…
    $ GS                  <dbl> 0, 6, 0, 5, 4, 0, 1, 0, 7, 6, 10, 17, 0, 0, 0, 12,…
    $ MP                  <dbl> 110, 200, 77, 177, 131, 33, 73, 95, 280, 194, 238,…
    $ FG                  <dbl> 8, 27, 7, 37, 27, 3, 12, 7, 69, 27, 29, 50, 0, 1, …
    $ FGA                 <dbl> 20, 46, 15, 80, 52, 8, 20, 20, 121, 72, 52, 139, 1…
    $ `FG%`               <dbl> 0.400, 0.587, 0.467, 0.463, 0.519, 0.375, 0.600, 0…
    $ `3P`                <dbl> 6, 0, 0, 3, 13, 2, 0, 5, 4, 6, 13, 26, 0, 1, 0, 11…
    $ `3PA`               <dbl> 13, 0, 1, 5, 30, 7, 4, 15, 14, 28, 29, 91, 0, 1, 0…
    $ `3P%`               <dbl> 0.462, NA, 0.000, 0.600, 0.433, 0.286, 0.000, 0.33…
    $ `2P`                <dbl> 2, 27, 7, 34, 14, 1, 12, 2, 65, 21, 16, 24, 0, 0, …
    $ `2PA`               <dbl> 7, 46, 14, 75, 22, 1, 16, 5, 107, 44, 23, 48, 1, 3…
    $ `2P%`               <dbl> 0.286, 0.587, 0.500, 0.453, 0.636, 1.000, 0.750, 0…
    $ `eFG%`              <dbl> 0.550, 0.587, 0.467, 0.481, 0.644, 0.500, 0.600, 0…
    $ FT                  <dbl> 2, 9, 3, 41, 2, 0, 3, 0, 38, 11, 8, 23, 0, 1, 0, 1…
    $ FTA                 <dbl> 2, 13, 14, 42, 2, 0, 4, 0, 55, 15, 11, 31, 0, 2, 0…
    $ `FT%`               <dbl> 1.000, 0.692, 0.214, 0.976, 1.000, NA, 0.750, NA, …
    $ ORB                 <dbl> 3, 19, 9, 13, 12, 1, 5, 4, 8, 3, 9, 9, 0, 0, 0, 45…
    $ DRB                 <dbl> 13, 26, 11, 33, 24, 8, 8, 9, 59, 31, 12, 56, 0, 3,…
    $ TRB                 <dbl> 16, 45, 20, 46, 36, 9, 13, 13, 67, 34, 21, 65, 0, …
    $ AST                 <dbl> 2, 9, 0, 12, 5, 0, 3, 6, 44, 2, 7, 22, 0, 3, 0, 19…
    $ STL                 <dbl> 5, 4, 0, 3, 4, 1, 6, 3, 10, 10, 6, 19, 0, 1, 0, 4,…
    $ BLK                 <dbl> 2, 4, 2, 2, 2, 0, 1, 1, 6, 4, 4, 2, 0, 1, 0, 11, 2…
    $ TOV                 <dbl> 1, 4, 2, 9, 6, 2, 3, 3, 17, 6, 6, 10, 0, 3, 0, 8, …
    $ PF                  <dbl> 11, 15, 8, 9, 8, 8, 10, 10, 28, 14, 23, 44, 0, 1, …
    $ PTS                 <dbl> 24, 63, 17, 118, 69, 8, 27, 19, 180, 71, 79, 149, …
    $ `Player-additional` <chr> "abrinal01", "adamsst01", "adebaba01", "aldrila01"…

    As you can see, the filenames are stored in my result, which means I can use those names to extract important data I would have otherwise lost (in this case, the year).

    This works the same way using purrr::map() in combination with purrr::list_rbind()2 and the names_to argument to keep the file names. Here’s what that looks like using the same playoff_files paths we retrieved with fs::dir_ls() before.

  • 2 The recommended method starting with purrr 1.0.0, is which map_df*() are superseded.

  • playoff_files |> 
      purrr::map(readr::read_csv, show_col_types = FALSE) |> 
      purrr::list_rbind(names_to = "file") |> 
      dplyr::glimpse()
    Rows: 1,095
    Columns: 32
    $ file                <chr> "/Users/maraaverick/2022-tidyverse-advent/data/nba…
    $ Rk                  <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,…
    $ Player              <chr> "Álex Abrines", "Steven Adams", "Bam Adebayo", "La…
    $ Pos                 <chr> "SG", "C", "C", "C", "PF", "SF", "SF", "PF", "PF",…
    $ Age                 <dbl> 24, 24, 20, 32, 27, 24, 24, 29, 23, 33, 20, 32, 28…
    $ Tm                  <chr> "OKC", "OKC", "MIA", "SAS", "POR", "PHI", "SAS", "…
    $ G                   <dbl> 6, 6, 5, 5, 4, 7, 5, 11, 7, 6, 10, 17, 2, 4, 1, 19…
    $ GS                  <dbl> 0, 6, 0, 5, 4, 0, 1, 0, 7, 6, 10, 17, 0, 0, 0, 12,…
    $ MP                  <dbl> 110, 200, 77, 177, 131, 33, 73, 95, 280, 194, 238,…
    $ FG                  <dbl> 8, 27, 7, 37, 27, 3, 12, 7, 69, 27, 29, 50, 0, 1, …
    $ FGA                 <dbl> 20, 46, 15, 80, 52, 8, 20, 20, 121, 72, 52, 139, 1…
    $ `FG%`               <dbl> 0.400, 0.587, 0.467, 0.463, 0.519, 0.375, 0.600, 0…
    $ `3P`                <dbl> 6, 0, 0, 3, 13, 2, 0, 5, 4, 6, 13, 26, 0, 1, 0, 11…
    $ `3PA`               <dbl> 13, 0, 1, 5, 30, 7, 4, 15, 14, 28, 29, 91, 0, 1, 0…
    $ `3P%`               <dbl> 0.462, NA, 0.000, 0.600, 0.433, 0.286, 0.000, 0.33…
    $ `2P`                <dbl> 2, 27, 7, 34, 14, 1, 12, 2, 65, 21, 16, 24, 0, 0, …
    $ `2PA`               <dbl> 7, 46, 14, 75, 22, 1, 16, 5, 107, 44, 23, 48, 1, 3…
    $ `2P%`               <dbl> 0.286, 0.587, 0.500, 0.453, 0.636, 1.000, 0.750, 0…
    $ `eFG%`              <dbl> 0.550, 0.587, 0.467, 0.481, 0.644, 0.500, 0.600, 0…
    $ FT                  <dbl> 2, 9, 3, 41, 2, 0, 3, 0, 38, 11, 8, 23, 0, 1, 0, 1…
    $ FTA                 <dbl> 2, 13, 14, 42, 2, 0, 4, 0, 55, 15, 11, 31, 0, 2, 0…
    $ `FT%`               <dbl> 1.000, 0.692, 0.214, 0.976, 1.000, NA, 0.750, NA, …
    $ ORB                 <dbl> 3, 19, 9, 13, 12, 1, 5, 4, 8, 3, 9, 9, 0, 0, 0, 45…
    $ DRB                 <dbl> 13, 26, 11, 33, 24, 8, 8, 9, 59, 31, 12, 56, 0, 3,…
    $ TRB                 <dbl> 16, 45, 20, 46, 36, 9, 13, 13, 67, 34, 21, 65, 0, …
    $ AST                 <dbl> 2, 9, 0, 12, 5, 0, 3, 6, 44, 2, 7, 22, 0, 3, 0, 19…
    $ STL                 <dbl> 5, 4, 0, 3, 4, 1, 6, 3, 10, 10, 6, 19, 0, 1, 0, 4,…
    $ BLK                 <dbl> 2, 4, 2, 2, 2, 0, 1, 1, 6, 4, 4, 2, 0, 1, 0, 11, 2…
    $ TOV                 <dbl> 1, 4, 2, 9, 6, 2, 3, 3, 17, 6, 6, 10, 0, 3, 0, 8, …
    $ PF                  <dbl> 11, 15, 8, 9, 8, 8, 10, 10, 28, 14, 23, 44, 0, 1, …
    $ PTS                 <dbl> 24, 63, 17, 118, 69, 8, 27, 19, 180, 71, 79, 149, …
    $ `Player-additional` <chr> "abrinal01", "adamsst01", "adebaba01", "aldrila01"…

    The result is exactly the same!

    Learn more

    There’s much more to fs than this one function, so it’s well worth your time to check out the fs intro. Also be sure to peep Comparison of fs functions, base R, and shell commands for how those systems stack up.

    I would be remiss not to mention that there are lots of ways to read and combine multiple .csv files into one data frame in R. For an extensive (though not exhaustive) exploration of options, see the responses to the Stack Overflow question: How to import multiple .csv files at once?

    References

    Hester, Jim, Hadley Wickham, and Gábor Csárdi. 2022. fs: Cross-Platform File System Operations Based on ’libuv’. https://fs.r-lib.org.