class: center, middle, inverse, title-slide # Package Management in R ### Isabella Velásquez, Bill & Melinda Gates Foundation ### Mike Garcia, ProCogia ### May 7th, 2020 --- class: inverse, center, middle # Hello! Isabella Velásquez Twitter: @ivelasq3 Website: ivelasq.rbind.io --- class: inverse, center, middle # Upgrading R in the time of coronavirus <img src="https://media1.tenor.com/images/4abc0a93a957de4c1db543ec3f4834a7/tenor.gif?itemid=13633421" height="500" style="display: block; margin: auto;" /> --- # R version 4.0.0 is out! 🌲 ## What’s new? * `list2DF` * `sort.list` for non-atomic objects * New color palettes * And a lot of other stuff! <img src="img/cb.jpg" height="250" style="display: block; margin: auto;" /> -- ... -- and, `stringsAsFactors = FALSE` --- # Chat How do you feel about upgrading R? Self-identify and share something you may be worried about in the chat: * 😎 Already upgraded * 😀 Not worried * 🙂 Not too worried * 😕 A little worried * 😨 Very worried * 😵 Confused about why anyone is worried --- # What does it mean to upgrade a major version of R? <img src="https://giffiles.alphacoders.com/168/168275.gif" height="250" style="display: block; margin: auto;" /> * For some - apprehension! * For others - opportunity! --- # What's wrong with reinstalling packages? -- 😨 😨 😨 <img src="img/not_found.png" height="250" style="display: block; margin: auto;" /> --- # Upgrade R // Install As You Go You can "start fresh" with no packages installed when you upgrade R. <img src="https://i.kym-cdn.com/entries/icons/original/000/006/077/so_good.png" height="250" style="display: block; margin: auto;" /> * Another option is to use {pacman} that installs packages that are not already installed but mentioned in code. * For previously written scripts, RStudio provides a helpful message if it detects a package that is not installed. ![](img/brms.png)<!-- --> --- # Upgrade R // Reinstall All Packages ```r # setup if (!require(tidyverse)) install.packages("tidyverse") if (!require(fs)) install.packages("fs") library(tidyverse) library(fs) ``` -- ```r # get all installed R versions if (Sys.info()[["sysname"]] == "Darwin") { r_dir <- tibble::tibble(path = fs::dir_ls(fs::path_dir(fs::path_dir(fs::path_dir(.libPaths()[[1]]))))) } if (Sys.info()[["sysname"]] %in% c("Linux", "Windows")) { r_dir <- tibble::tibble(path = fs::dir_ls(fs::path_dir(.libPaths()[[1]]))) } ``` --- ```r # cue music r_dir <- r_dir %>% # drop current R version dplyr::filter(!(stringr::str_detect(path, "Current"))) %>% # extract the current and penultimate R versions as strings dplyr::rowwise() %>% dplyr::mutate(version = as.numeric(stringr::str_extract(path, "[0-9]\\.[0-9]"))) %>% dplyr::ungroup() %>% dplyr::mutate(new_r = dplyr::nth(version, -1L), old_r = dplyr::nth(version, -2L)) %>% dplyr::mutate_at(vars("new_r", "old_r"), ~as.character(formatC(.x, digits = 1L, format = "f"))) %>% dplyr::filter(version == old_r) ``` -- ```r # get new and old R library paths new_libpath <- .libPaths() old_libpath <- stringr::str_replace(new_libpath, r_dir$new_r, r_dir$old_r) # get list of old installed R packages pkg_list <- as.list(list.files(old_libpath)) ``` --- ```r # get new and old R library paths new_libpath <- .libPaths() old_libpath <- stringr::str_replace(new_libpath, r_dir$new_r, r_dir$old_r) # get list of old installed R packages pkg_list <- as.list(list.files(old_libpath)) ``` -- ```r # define install_all() function install_all <- function(x) { print(x) install.packages(x, quiet = TRUE) } ``` -- ```r # install all R packages in pkg_list purrr::quietly(purrr::walk(pkg_list, install_all)) ``` --- # Upgrade R // Reinstall All Packages ## Other Options <img src="img/script.png" height="250" style="display: block; margin: auto;" /> * Formal packages: {installr}, {yamlpack}... --- # Upgrade R // Keep Old Packages If you want to keep the previous versions of your package but upgrade R, you can bring over the old versions. ![](img/donot.png)<!-- --> ![](img/stringasfactors.jpg)<!-- --> .footnote[ Warning image from https://rstats.wtf/maintaining-r.html ] --- # Think of your Dependencies <blockquote class="twitter-tweet"><p lang="und" dir="ltr"> <a href="https://t.co/xtfH4w3SiP">pic.twitter.com/xtfH4w3SiP</a></p>— Andrew Heiss (@andrewheiss) <a href="https://twitter.com/andrewheiss/status/1021944992351186944?ref_src=twsrc%5Etfw">July 25, 2018</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script> --- # Do Not Upgrade R <blockquote class="twitter-tweet"><p lang="en" dir="ltr">Yeah, I'm just gonna act like <a href="https://twitter.com/hashtag/rstats?src=hash&ref_src=twsrc%5Etfw">#rstats</a> 4.0 doesn't exist until I finish these papers. <br>MRAN January snapshot continues! <a href="https://t.co/RrFCR86kcV">pic.twitter.com/RrFCR86kcV</a></p>— Chase Clark (@ChasingMicrobes) <a href="https://twitter.com/ChasingMicrobes/status/1253036167941021696?ref_src=twsrc%5Etfw">April 22, 2020</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script> --- # Do Not Upgrade R Even if you’d rather wait for R v4.1.0, there are a few things you can do: * Test it out before you upgrade * Make it so you can switch between R versions: use RSwitch on macOS by @hrbrmstr <img src="https://media0.giphy.com/media/xUA7aUNw61j9Vdzs0U/giphy.gif" height="250" style="display: block; margin: auto;" /> --- # Considerations * How do you code? * How feasily can you refind your non-CRAN packages? * How fragile is your code? * Do you want a 'fresh start'? * Will your packages work with R 4.0.0? * What do your dependencies look like? * How fast is your internet? * How quick is your deadline? -- <img src="img/error.png" height="200" style="display: block; margin: auto;" /> --- # Other Options * packrat (old RStudio package management solution) * RStudio Package Manager * Docker (rocker) * Conda * and.... -- * renv --- class: inverse, center, middle # Hello! Mike Garcia Twitter: @cascadianzeno Email: mike@procogia.com --- class: inverse, center, middle # Package management in projects <img src="img/renv-hex.svg" height="300" /> --- # Poll What is your opinion/familiarity with Python? * 🐍 What's Python? I don't like snakes!! * 👼 I love R and use Python sparingly if at all * 👿 I love Python and use R sparingly if at all * 🤷 I'm an equal opportunity programmer and will use any tool to get the job done --- # A brief history of my programming past: <img src="img/mike-timeline.png" height="200" style="display: block; margin: auto;" /> -- > A language that doesn't affect the way you think about programming, is not worth knowing. > > <footer>--- Alan Perlis</footer> --- # Some things I've learned from Python: <img src="https://upload.wikimedia.org/wikipedia/commons/thumb/c/c3/Python-logo-notext.svg/1200px-Python-logo-notext.svg.png" width="50" style="display: block; margin: auto auto auto 0;" /> .pull-left[ Virtual environments are a big thing and use of package management tools is widespread in Python community: - `pipenv` - `virtualenv` - `venv` - `pyvenv` ] .pull-right[ This is a good thing! - Better reproducibility - Isolate project dependencies - Use different versions of same package for different projects ] --- # Reproducible data science with R <img src="https://www.r-project.org/logo/Rlogo.png" width="50" style="display: block; margin: auto auto auto 0;" /> .pull-left[ ### Literate programming: Strong tradition in R community: - Sweave - knitr - Markdown ] .pull-right[ ### Package management: Less of a focus in the community until recently. - Packrat ] --- .pull-left[ ## Introducing: `renv` ] .pull-right[ <img src="img/renv-hex.svg" width="100" style="display: block; margin: auto 0 auto auto;" /> ] - Oct 2019: `renv 0.8.0` released on CRAN - Feb 2020: latest release (`renv 0.9.3`) Goal is for `renv` to be a stable replacement to `Packrat` -- ### Philosophy of `renv` > Any of your existing workflows should just work as they did before --- # Workflow overview: - `renv::init()`: Initialize a new project-local environment with a private R library -- - Business as usual, installing and removing new R packages as needed -- - `renv::snapshot()`: Save the state of the project library to the lockfile -- - More business as usual - `renv::snapshot()`: Save project library again -- - `renv::restore()`: Revert to the previous state as encoded in the lockfile if your attempts to update packages introduced some new problems. --- # schrute package -- The entire transcript from The Office -- ... -- in Tidy Format! <img src="https://media3.giphy.com/media/lMVNl6XxTvXgs/200w.webp?cid=ecf05e47ff356c33f2489eefdcfff62be8ab2bca7b24bab2&rid=200w.webp" height="300" style="display: block; margin: auto;" /> --- Available on CRAN: ```r # install.packages("schrute") library(schrute) data(theoffice) head(theoffice) ``` <table class="table table-striped table-hover table-condensed table-responsive" style="font-size: 7px; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:right;"> index </th> <th style="text-align:right;"> season </th> <th style="text-align:right;"> episode </th> <th style="text-align:left;"> episode_name </th> <th style="text-align:left;"> director </th> <th style="text-align:left;"> writer </th> <th style="text-align:left;"> character </th> <th style="text-align:left;"> text </th> <th style="text-align:left;"> text_w_direction </th> <th style="text-align:right;"> imdb_rating </th> <th style="text-align:right;"> total_votes </th> <th style="text-align:left;"> air_date </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:left;"> Pilot </td> <td style="text-align:left;"> Ken Kwapis </td> <td style="text-align:left;"> Ricky Gervais;Stephen Merchant;Greg Daniels </td> <td style="text-align:left;"> Michael </td> <td style="text-align:left;"> All right Jim. Your quarterlies look very good. How are things at the library? </td> <td style="text-align:left;"> All right Jim. Your quarterlies look very good. How are things at the library? </td> <td style="text-align:right;"> 7.6 </td> <td style="text-align:right;"> 3706 </td> <td style="text-align:left;"> 2005-03-24 </td> </tr> <tr> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:left;"> Pilot </td> <td style="text-align:left;"> Ken Kwapis </td> <td style="text-align:left;"> Ricky Gervais;Stephen Merchant;Greg Daniels </td> <td style="text-align:left;"> Jim </td> <td style="text-align:left;"> Oh, I told you. I couldn't close it. So... </td> <td style="text-align:left;"> Oh, I told you. I couldn't close it. So... </td> <td style="text-align:right;"> 7.6 </td> <td style="text-align:right;"> 3706 </td> <td style="text-align:left;"> 2005-03-24 </td> </tr> <tr> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:left;"> Pilot </td> <td style="text-align:left;"> Ken Kwapis </td> <td style="text-align:left;"> Ricky Gervais;Stephen Merchant;Greg Daniels </td> <td style="text-align:left;"> Michael </td> <td style="text-align:left;"> So you've come to the master for guidance? Is this what you're saying, grasshopper? </td> <td style="text-align:left;"> So you've come to the master for guidance? Is this what you're saying, grasshopper? </td> <td style="text-align:right;"> 7.6 </td> <td style="text-align:right;"> 3706 </td> <td style="text-align:left;"> 2005-03-24 </td> </tr> <tr> <td style="text-align:right;"> 4 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:left;"> Pilot </td> <td style="text-align:left;"> Ken Kwapis </td> <td style="text-align:left;"> Ricky Gervais;Stephen Merchant;Greg Daniels </td> <td style="text-align:left;"> Jim </td> <td style="text-align:left;"> Actually, you called me in here, but yeah. </td> <td style="text-align:left;"> Actually, you called me in here, but yeah. </td> <td style="text-align:right;"> 7.6 </td> <td style="text-align:right;"> 3706 </td> <td style="text-align:left;"> 2005-03-24 </td> </tr> <tr> <td style="text-align:right;"> 5 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:left;"> Pilot </td> <td style="text-align:left;"> Ken Kwapis </td> <td style="text-align:left;"> Ricky Gervais;Stephen Merchant;Greg Daniels </td> <td style="text-align:left;"> Michael </td> <td style="text-align:left;"> All right. Well, let me show you how it's done. </td> <td style="text-align:left;"> All right. Well, let me show you how it's done. </td> <td style="text-align:right;"> 7.6 </td> <td style="text-align:right;"> 3706 </td> <td style="text-align:left;"> 2005-03-24 </td> </tr> <tr> <td style="text-align:right;"> 6 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:left;"> Pilot </td> <td style="text-align:left;"> Ken Kwapis </td> <td style="text-align:left;"> Ricky Gervais;Stephen Merchant;Greg Daniels </td> <td style="text-align:left;"> Michael </td> <td style="text-align:left;"> Yes, I'd like to speak to your office manager, please. Yes, hello. This is Michael Scott. I am the Regional Manager of Dunder Mifflin Paper Products. Just wanted to talk to you manager-a-manger. All right. Done deal. Thank you very much, sir. You're a gentleman and a scholar. Oh, I'm sorry. OK. I'm sorry. My mistake. That was a woman I was talking to, so... She had a very low voice. Probably a smoker, so... So that's the way it's done. </td> <td style="text-align:left;"> [on the phone] Yes, I'd like to speak to your office manager, please. Yes, hello. This is Michael Scott. I am the Regional Manager of Dunder Mifflin Paper Products. Just wanted to talk to you manager-a-manger. [quick cut scene] All right. Done deal. Thank you very much, sir. You're a gentleman and a scholar. Oh, I'm sorry. OK. I'm sorry. My mistake. [hangs up] That was a woman I was talking to, so... She had a very low voice. Probably a smoker, so... [Clears throat] So that's the way it's done. </td> <td style="text-align:right;"> 7.6 </td> <td style="text-align:right;"> 3706 </td> <td style="text-align:left;"> 2005-03-24 </td> </tr> </tbody> </table> --- Being good package managers and reproducible data scientists, let's 1. Start an R project 2. Start a reproducible report in R Markdown <sup>1</sup> 3. Set up `renv`: ```r renv::init() ``` This will initialize a `lockfile` with all packages and dependencies detected in the project. .footnote[ [1] For our purposes, the R Markdown part will be make-believe. I'll just show the results in the slides ] --- Let's look at average IMDB ratings over the seasons: ```r theoffice %>% select(season, imdb_rating) %>% distinct() %>% group_by(season) %>% summarize(avg_imdb = mean(imdb_rating)) ``` ``` ## # A tibble: 9 x 2 ## season avg_imdb ## <int> <dbl> ## 1 1 8.02 ## 2 2 8.53 ## 3 3 8.64 ## 4 4 8.5 ## 5 5 8.65 ## 6 6 8.18 ## 7 7 8.49 ## 8 8 7.55 ## 9 9 8.05 ``` --- ## Make a quick ggplot... .pull-left[ ![](useR-slides_files/figure-html/unnamed-chunk-27-1.png)<!-- --> ] .pull-right[ Sounds about right to Office fans: - First few seasons were great - Slump in middle - Things got weird at the end - Slight recovery with final season ] --- # Share with the world! - You share code on GitHub - Eager to build on your groundbreaking results, collaborators fork your repo --- # How about some flair? <img src="https://i.imgflip.com/1hzigk.jpg" height="300" style="display: block; margin: auto;" /> -- After all your hard work on your report that you've now shared with the world, you find out about the brand new `flair` package <sup>1</sup>. This is a perfect way to highlight those tricky code bits to help your co-workers learn R. .footnote[ [1] Just released on May 4th! ] --- # Install from CRAN ```r install.packages("flair") ``` -- Now our lockfile is out of date! -- That's a simple fix: `renv::snapshot()` will update the lockfile. -- Next, revise your code with the appropriate number of flair: ```r decorate("theoffice %>% select(season, imdb_rating) %>% distinct() %>% group_by(season) %>% summarize(avg_imdb = mean(imdb_rating))") %>% flair('%>%') %>% flair('theoffice', background='Coral') %>% flair('mean', background='Aquamarine') ``` --- <img src="img/flair.png" height="300" style="display: block; margin: auto;" /> --- # Deeper dive `renv::init()` creates: - lockfile named `renv.lock` - project-local environment with private library (`renv/library`) - separate from system libary - created by crawling for dependencies with `renv::dependencies()` - Packages copied from user library if already installed - Instead of re-installing from CRAN <img src="img/project-library.png" height="200" style="display: block; margin: auto;" /> --- # Deeper dive - `renv::snapshot()`: saves state of project to lockfile (`renv.lock`) <img src="img/lockfile.png" width="300" style="display: block; margin: auto auto auto 0;" /> -- - `renv::restore()`: reverts to previous state of lockfile if package installation attempts are unsuccessful --- # Package sources Compatible with - CRAN - GitHub - BioConductor - GitLab - Bitbucket - Custom/local packages The `DESCRIPTION` file of a package is used to infer the source --- # Comparison with Packrat<sup>1</sup>: 1. `renv.lock` is a JSON file, ease of use with other tools 2. `renv` doesn't attempt to track package tarballs 3. `renv` doesn't track "stale" packages, installed but not recorded in lockfile 4. `renv` uses a global cache across all projects maintained with `renv`, reducing disk space 5. More configurable dependency discovery .footnote[ [1] Abbreviated from `renv` documentation website ] --- # Collaboration As in the `schrute`/`flair` example, collaboration is easiest with a centrol version control repository (like GitHub) -- Collaborators can: 1. Run `renv::init()` from `renv.lock` included in repository. 2. Update lockfile with any packages they install via `renv::snapshot()` 3. Push updated lockfile to central repo along with code --- # Real world example -- These slides! -- We built these slides in `xaringan` using GitHub for collaboration. Due to differences in both packages and installed versions of R, the formatting was getting horribly mangled. -- So we practiced what we're preaching, made our versions of R consistent and used `renv` to manage the project. -- ...and the slides rendered perfectly. <img src="https://media.giphy.com/media/zcCGBRQshGdt6/giphy.gif" width="300" style="display: block; margin: auto;" /> --- .pull-left[ # Use with Python ] .pull-right[ <img src="https://rstudio.github.io/reticulate/images/reticulated_python.png" width="100" style="display: block; margin: auto 0 auto auto;" /> ] If your project uses both R and Python then `renv` can manage both! -- Compatible with - `reticulate` - Virtual environments - Conda environments --- # Conclusion - Learning a new language can motivate you to improve skills in a language you already know -- - R has an exciting new option for improving reproducibility even more -- - `renv` has lots of options and flexibility but retains simplicity -- - Virtual environments offer another option to reinstalling packages during upgrades --- class: inverse, center, middle # Questions? ## Isabella Velásquez, @ivelasq3 ## Mike Garcia, @cascadianzeno --- # References ## More on `renv` [renv website](rstudio.github.io.renv)