Day 16: rbind_pages()

Got paginated JSON data? The rbind_pages() function from Jeroen Ooms’ jsonlite package is here to save the day.

Published

December 16, 2022

Note

The jsonlite package is authored and maintained by Jeroen Ooms. I’ve decided to include it since it’s installed and recommended by the tidyverse package for reading JSON data, and also super helpful!

To borrow from its description on GitHub, jsonlite (Ooms 2014):

Offers simple, flexible tools for working with JSON in R, and is particularly powerful for building pipelines and interacting with a web API.

Basically, it provides mapping between JSON data and R classes and objects (like data frames).

When working with web APIs, you’ll often encounter paginated data where the amount of data return is broken into pages with a given number of records. Once you’ve read in the data with the fromJSON() function, chances are that you’ll want to combine those pages into a single data frame for your analysis. This is where rbind_pages() comes in. It uses vctrs::vec_rbind() to combine a list of data frames into a single data frame.

Let’s do a quick example using data on the most popular TV shows from the EpisoDate API. We’ll get the first three pages of data.

library(jsonlite)

# base URL of the page
baseurl <- "https://www.episodate.com/api/most-popular?page="

mydata1 <- fromJSON(paste0(baseurl, "1"), flatten = TRUE)
mydata2 <- fromJSON(paste0(baseurl, "2"), flatten = TRUE)
mydata3 <- fromJSON(paste0(baseurl, "3"), flatten = TRUE)

# look at the data for one of the pages
dplyr::glimpse(mydata1$tv_shows)
Rows: 20
Columns: 9
$ id                   <int> 35624, 23455, 29560, 43467, 43234, 46692, 24010, …
$ name                 <chr> "The Flash", "Game of Thrones", "Arrow", "Lucifer…
$ permalink            <chr> "the-flash", "game-of-thrones", "arrow", "lucifer…
$ start_date           <chr> "2014-10-07", "2011-04-17", "2012-10-10", "2016-0…
$ end_date             <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ country              <chr> "US", "US", "US", "US", "US", "US", "US", "US", "…
$ network              <chr> "The CW", "HBO", "The CW", "Netflix", "The CW", "…
$ status               <chr> "Running", "Ended", "Ended", "Ended", "Ended", "E…
$ image_thumbnail_path <chr> "https://static.episodate.com/images/tv-show/thum…

Each page contains 20 records. We’ll use rbind_pages() to combine the data from each page, which is stored in the tv_shows element of each page, then look at the number of rows to see that combining three pages results in the expected 60 records.

tv_shows <- rbind_pages(
  list(mydata1$tv_shows, mydata2$tv_shows, mydata3$tv_shows)
)

nrow(tv_shows)
[1] 60

Learn more

For more details on using rbind_pages(), including example code for automatically combining many pages, see the jsonlite vignette, Combining pages of JSON data with jsonlite. Also be sure to check out the other vignettes:

You can learn the details of how jsonlite works in the paper The jsonlite Package: A Practical and Consistent Mapping Between JSON Data and R Objects (Ooms 2014).

References

Ooms, Jeroen. 2014. “The jsonlite Package: A Practical and Consistent Mapping Between JSON Data and R Objects.” arXiv:1403.2805 [Stat.CO]. https://arxiv.org/abs/1403.2805.