Day 18: brio

brio (short for Basic R Input Output) provides functions to do just that–handle basic input output in R. Its functions always read and write UTF-8 files, and provide more explicit control over line endings.

Published

December 18, 2022

String encoding in R (or any programming language for that matter) is no simple matter. Kevin Ushey wrote an excellent blog post, String Encoding and R, that:

is an attempt to explore, and answer, the surprisingly difficult question:

How do I write UTF-8 encoded content to a file? (Ushey 2018)

The aim of brio (Hester and Csárdi 2022) is to make that practice easier. brio (an initialism for Basic R Input Output) provides functions that always read and write UTF-8 files¹, and provide more explicit control over line endings.

¹ See Kevin’s blog post to understand why this is a good idea.

In addition to providing consistency and control over encoding and line endings, brio’s primary functions, read_lines() and write_lines(), happen to be faster than their base R and readr equivalents (see the benchmarks section) of the README for data), which is a nice added bonus.

Learn more

The brio documentation provides details on how to use its full suite of functions, including drop-in replacements for base readLines() and writeLines().

To learn more on encoding and character sets, see the two articles Kevin recommends at the end of his post on string encoding and R:

The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) by Joel Spolsky (Spolsky 2003); and
What Every Programmer Absolutely, Positively Needs To Know About Encodings And Character Sets To Work With Text by David C. Zentgraf (Zentgraf 2015).

References

Hester, Jim, and Gábor Csárdi. 2022. brio: Basic R Input Output. https://brio.r-lib.org.

Spolsky, Joel. 2003. “The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!).” Joel on Software. October 8, 2003. https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/.

Ushey, Kevin. 2018. “String Encoding and R.” R and C++. February 21, 2018. https://kevinushey.github.io/blog/2018/02/21/string-encoding-and-r/.

Zentgraf, David C. 2015. “What Every Programmer Absolutely, Positively Needs to Know About Encodings and Character Sets to Work With Text.” Kunststube. April 27, 2015. https://kunststube.net/encoding/.