Day 03: parse_number()

readr’s parsing functions can come in handy even after you’ve imported data. parse_number() is flexible enough to deal with things like leading characters, trailing white space, and formatting from different locales.

Published

December 3, 2022

Numeric data do not always arrive cleanly formatted. In fact, what constitutes “correct” formatting varies depending on location and convention. For example, the decimal separator may be representated as a period or a comma according to the ISO 31-0 standards, while the grouping mark (that divides numbers into groups of thousands) is usually a comma in the US but a period in many non-English-speaking countries (Wikipedia 2022).

Similarly, your numeric data may come with leading or trailing non-numeric characters representing units or currency (e.g. $1,000 or 1,000USD). Depending on the situation, you might need this information, which is why the parse_guess() function in readr (Wickham, Hester, and Bryan 2022) interprets these as character strings.

parse_number() is your flexible-number-parsing friend for when you know you just want the number part of the data. It drops non-numeric characters before the number, and all characters after the first number.

library(readr)
parse_number("$1,000")
[1] 1000
parse_number("1000USD")
[1] 1000
parse_number("t1000t1000") # you only get the first number here
[1] 1000

You can use pass readr’s locale() function to the locale argument to specify decimal and grouping marks.1

format(
c(
  parse_number("1,234,567.89"), # `locale()` uses US English default
  parse_number("1 234 567.89", locale = locale(decimal_mark = ".", grouping_mark = " ")),
  parse_number("1.234.567,89", locale = locale(decimal_mark = ",", grouping_mark = "."))
),
nsmall = 2)
[1] "1234567.89" "1234567.89" "1234567.89"

References

Wickham, Hadley, Jim Hester, and Jennifer Bryan. 2022. readr: Read Rectangular Text Data. https://readr.tidyverse.org.
Wikipedia. 2022. Decimal separatorWikipedia, the Free Encyclopedia.” https://en.wikipedia.org/w/index.php?title=Decimal%20separator&oldid=1124244000.