class: center, middle, inverse, title-slide # Package Management in R ### Isabella Velásquez, Bill & Melinda Gates Foundation ### Mike Garcia, ProCogia ### May 7th, 2020 --- class: inverse, center, middle # Upgrading R in the time of coronavirus --- # R version 4.0.0 is out! 🌲 ## What's new? * `list2DF` * `sort.list` for non-atomic objects * New color palettes * And a lot of other stuff! -- ... -- and, `stringsAsFactors = FALSE` --- # Chat How do you feel about upgrading R? Self-identify and share something you may be worried about in the chat: * 😎 Already upgraded * 😀 Not worried * 🙂 Not too worried * 😕 A little worried * 😨 Very worried * 😵 Confused about why anyone is worried --- # What does it mean to upgrade a major version of R? <img src="" height="250" style="display: block; margin: auto;" /> * For some - apprehension! * For others - opportunity! --- # What's wrong with reinstalling packages? -- 😨 😨 😨 <img src="img/not_found.png" height="250" style="display: block; margin: auto;" /> --- # Upgrade R // Install As You Go You can "start fresh" with no packages installed when you upgrade R. <img src="" height="250" style="display: block; margin: auto;" /> * Another option is to use {pacman} that installs packages that are not already installed but mentioned in code. * For previously written scripts, RStudio provides a helpful message if it detects a package that is not installed. <!-- --> --- # Upgrade R // Reinstall All Packages ```r # setup if (!require(tidyverse)) install.packages("tidyverse") if (!require(fs)) install.packages("fs") library(tidyverse) library(fs) ``` -- ```r # get all installed R versions if ([["sysname"]] == "Darwin") { r_dir <- tibble::tibble(path = fs::dir_ls(fs::path_dir(fs::path_dir(fs::path_dir(.libPaths()[[1]]))))) } if ([["sysname"]] %in% c("Linux", "Windows")) { r_dir <- tibble::tibble(path = fs::dir_ls(fs::path_dir(.libPaths()[[1]]))) } ``` --- ```r # cue music r_dir <- r_dir %>% # drop current R version dplyr::filter(!(stringr::str_detect(path, "Current"))) %>% # extract the current and penultimate R versions as strings dplyr::rowwise() %>% dplyr::mutate(version = as.numeric(stringr::str_extract(path, "[0-9]\\.[0-9]"))) %>% dplyr::ungroup() %>% dplyr::mutate(new_r = dplyr::nth(version, -1L), old_r = dplyr::nth(version, -2L)) %>% dplyr::mutate_at(vars("new_r", "old_r"), ~as.character(formatC(.x, digits = 1L, format = "f"))) %>% dplyr::filter(version == old_r) ``` -- ```r # get new and old R library paths new_libpath <- .libPaths() old_libpath <- stringr::str_replace(new_libpath, r_dir$new_r, r_dir$old_r) # get list of old installed R packages pkg_list <- as.list(list.files(old_libpath)) ``` --- ```r # get new and old R library paths new_libpath <- .libPaths() old_libpath <- stringr::str_replace(new_libpath, r_dir$new_r, r_dir$old_r) # get list of old installed R packages pkg_list <- as.list(list.files(old_libpath)) ``` -- ```r # define install_all() function install_all <- function(x) { print(x) install.packages(x, quiet = TRUE) } ``` -- ```r # install all R packages in pkg_list purrr::quietly(purrr::walk(pkg_list, install_all)) ``` --- # Upgrade R // Reinstall All Packages ## Other Options <img src="img/script.png" height="250" style="display: block; margin: auto;" /> * Formal packages: {installr}, {yamlpack}... --- # Upgrade R // Keep Old Packages If you want to keep the previous versions of your package but upgrade R, you can bring over the old versions. <!-- --> <!-- --> .footnote[ Warning image from ] --- # Think of your Dependencies <blockquote class="twitter-tweet"><p lang="und" dir="ltr"> <a href=""></a></p>— Andrew Heiss (@andrewheiss) <a href="">July 25, 2018</a></blockquote> <script async src="" charset="utf-8"></script> --- # Do Not Upgrade R <blockquote class="twitter-tweet"><p lang="en" dir="ltr">Yeah, I'm just gonna act like <a href="">#rstats</a> 4.0 doesn't exist until I finish these papers. <br>MRAN January snapshot continues! <a href=""></a></p>— Chase Clark (@ChasingMicrobes) <a href="">April 22, 2020</a></blockquote> <script async src="" charset="utf-8"></script> --- # Do Not Upgrade R Even if you’d rather wait for R v4.1.0, there are a few things you can do: * Test it out before you upgrade * Make it so you can switch between R versions: use RSwitch on macOS by @hrbrmstr <img src="" height="250" style="display: block; margin: auto;" /> --- # Considerations * How do you code? * How feasily can you refind your non-CRAN packages? * How fragile is your code? * Do you want a 'fresh start'? * Will your packages work with R 4.0.0? * What do your dependencies look like? * How fast is your internet? * How quick is your deadline? -- <img src="img/error.png" height="200" style="display: block; margin: auto;" /> --- # Other Options * packrat (old RStudio package management solution) * RStudio Package Manager * Docker (rocker) * Conda * and.... -- * renv --- class: inverse, center, middle # Hello! Mike Garcia Twitter: @cascadianzeno Email: --- class: inverse, center, middle # Package management in projects <img src="img/renv-hex.svg" height="300" /> --- # Poll What is your opinion/familiarity with Python? * 🐍 What's Python? Set up `renv`: ```r renv::init() ``` This will initialize a `lockfile` with all packages and dependencies detected in the project. .footnote[ [1] For our purposes, the R Markdown part will be make-believe. I'll just show the results in the slides ] --- Let's look at average IMDB ratings over the seasons: ```r theoffice %>% select(season, imdb_rating) %>% distinct() %>% group_by(season) %>% summarize(avg_imdb = mean(imdb_rating)) ``` ``` ## # A tibble: 9 x 2 ## season avg_imdb ## <int> <dbl> ## 1 1 8.02 ## 2 2 8.53 ## 3 3 8.64 ## 4 4 8.5 ## 5 5 8.65 ## 6 6 8.18 ## 7 7 8.49 ## 8 8 7.55 ## 9 9 8.05 ``` --- ## Make a quick ggplot... .pull-left[ <!-- --> ] .pull-right[ Sounds about right to Office fans: - First few seasons were great - Slump in middle - Things got weird at the end - Slight recovery with final season ] --- # Share with the world! - You share code on GitHub - Eager to build on your groundbreaking results, collaborators fork your repo --- # How about some flair? <img src="" height="300" style="display: block; margin: auto;" /> -- After all your hard work on your report that you've now shared with the world, you find out about the brand new `flair` package <sup>1</sup>. This is a perfect way to highlight those tricky code bits to help your co-workers learn R. .footnote[ [1] Just released on May 4th! ] --- # Install from CRAN ```r install.packages("flair") ``` -- Now our lockfile is out of date! -- That's a simple fix: `renv::snapshot()` will update the lockfile. -- Next, revise your code with the appropriate number of flair: ```r decorate("theoffice %>% select(season, imdb_rating) %>% distinct() %>% group_by(season) %>% summarize(avg_imdb = mean(imdb_rating))") %>% flair('%>%') %>% flair('theoffice', background='Coral') %>% flair('mean', background='Aquamarine') ``` --- <img src="img/flair.png" height="300" style="display: block; margin: auto;" /> --- # Deeper dive `renv::init()` creates: - lockfile named `renv.lock` - project-local environment with private library (`renv/library`) - separate from system libary - created by crawling for dependencies with `renv::dependencies()` - Packages copied from user library if already installed - Instead of re-installing from CRAN <img src="img/project-library.png" height="200" style="display: block; margin: auto;" /> --- # Deeper dive - `renv::snapshot()`: saves state of project to lockfile (`renv.lock`) <img src="img/lockfile.png" width="300" style="display: block; margin: auto auto auto 0;" /> -- - `renv::restore()`: reverts to previous state of lockfile if package installation attempts are unsuccessful --- # Package sources Compatible with - CRAN - GitHub - BioConductor - GitLab - Bitbucket - Custom/local packages The `DESCRIPTION` file of a package is used to infer the source --- # Comparison with Packrat<sup>1</sup>: 1. `renv.lock` is a JSON file, ease of use with other tools 2. `renv` doesn't attempt to track package tarballs 3. `renv` doesn't track "stale" packages, installed but not recorded in lockfile 4. `renv` uses a global cache across all projects maintained with `renv`, reducing disk space 5. More configurable dependency discovery .footnote[ [1] Abbreviated from `renv` documentation website ] --- # Collaboration As in the `schrute`/`flair` example, collaboration is easiest with a centrol version control repository (like GitHub) -- Collaborators can: 1. Run `renv::init()` from `renv.lock` included in repository. 2. Update lockfile with any packages they install via `renv::snapshot()` 3. Push updated lockfile to central repo along with code --- # Real world example -- These slides! -- We built these slides in `xaringan` using GitHub for collaboration. Due to differences in both packages and installed versions of R, the formatting was getting horribly mangled. -- So we practiced what we're preaching, made our versions of R consistent and used `renv` to manage the project. -- ...and the slides rendered perfectly. <img src="" width="300" style="display: block; margin: auto;" /> --- .pull-left[ # Use with Python ] .pull-right[ <img src="" width="100" style="display: block; margin: auto 0 auto auto;" /> ] If your project uses both R and Python then `renv` can manage both! -- Compatible with - `reticulate` - Virtual environments - Conda environments --- # Conclusion - Learning a new language can motivate you to improve skills in a language you already know -- - R has an exciting new option for improving reproducibility even more -- - `renv` has lots of options and flexibility but retains simplicity -- - Virtual environments offer another option to reinstalling packages during upgrades --- class: inverse, center, middle # Questions? ## Isabella Velásquez, @ivelasq3 ## Mike Garcia, @cascadianzeno --- # References ## More on `renv` [renv website](