From c3c5dd1226008bfec96a7d9c6b01da28962d455b Mon Sep 17 00:00:00 2001 From: Adam Smielewski Date: Fri, 6 Mar 2026 19:49:25 +0100 Subject: [PATCH 1/3] docs(): rework docker docs --- vignettes/docker.Rmd | 187 +++++++++++-------------------------------- 1 file changed, 48 insertions(+), 139 deletions(-) diff --git a/vignettes/docker.Rmd b/vignettes/docker.Rmd index 322624a85..5f40fefef 100644 --- a/vignettes/docker.Rmd +++ b/vignettes/docker.Rmd @@ -22,200 +22,110 @@ While renv can help capture the state of your R library at some point in time, t - The LAPACK / BLAS system(s) in use, - The versions of system libraries installed and in use, -And so on. [Docker](https://www.docker.com/) is a tool that can help solve this problem through the use of **containers**. Very roughly speaking, one can think of a container as a small, self-contained system within which different applications can be run. Using Docker, one can declaratively state how a container should be built (what operating system it should use, and what system software should be installed within), and use that system to run applications. (For more details, please see .) +and so on. [Docker](https://www.docker.com/) is a tool that can help solve this problem through the use of **containers**. Very roughly speaking, one can think of a container as a small, self-contained system within which different applications can be run. Using Docker, one can declaratively state how a container should be built, and then use that system to run applications. For more details, please see . Using Docker and renv together, one can then ensure that both the underlying system, alongside the required R packages, are fixed and constant for a particular application. -The main challenges in using Docker with renv are: - -- Ensuring that the renv cache is visible to Docker containers, and - -- Ensuring that required R package dependencies are available at runtime. - This vignette will assume you are already familiar with Docker; if you are not yet familiar with Docker, the [Docker Documentation](https://docs.docker.com/) provides a thorough introduction. To learn more about using Docker to manage R environments, visit [environments.rstudio.com](https://environments.rstudio.com/docker.html). -We'll discuss two strategies for using renv with Docker: - -1. Using renv to install packages when the Docker image is generated; -2. Using renv to install packages when Docker containers are run. - -We'll also explore the pros and cons of each strategy. +We focus here on the most common case: you already have an existing renv project and want to build a Docker image from it. We assume that your project already contains `renv.lock`, `.Rprofile`, `renv/activate.R`, and `renv/settings.json`. +## Containerizing an existing renv project -## Creating Docker images with renv - -With Docker, [Dockerfiles](https://docs.docker.com/engine/reference/builder/) are used to define new images. Dockerfiles can be used to declaratively specify how a Docker image should be created. A Docker image captures the state of a machine at some point in time -- e.g., a Linux operating system after downloading and installing R `r getRversion()[1, 1:2]`. Docker containers can be created using that image as a base, allowing different independent applications to run using the same pre-defined machine state. - -First, you'll need to get renv installed on your Docker image. For example, you could install the latest release of renv from CRAN: +For an existing renv project, a good default is to copy the renv metadata first, restore packages, and only then copy the rest of the repository: ```dockerfile -RUN R -e "install.packages('renv', repos = c(CRAN = 'https://cloud.r-project.org'))" -``` - -Alternatively, if you need to use the development version of renv, you could use: - -```dockerfile -RUN R -e "install.packages('renv', repos = 'https://rstudio.r-universe.dev')" -``` - -Next, we'll copy `renv.lock` into the container: +FROM -```dockerfile WORKDIR /project -COPY renv.lock renv.lock -``` - -Now, we `renv::restore()` to install those packages. At this stage, you'll need to decide which of R's library paths you'd like to use for pacakge installation. (See `?.libPaths` for more information.) There are a couple of options available: - - -### Use the default library paths - -This method is appropriate if you'd like these packages to be visible to all R processes launched using this image, and can be done via: - -```dockerfile -RUN R -e "renv::restore()" -``` - -Note that this method may fail if R's default library paths are not on a writable volume in the Docker image. If this is the case, consider one of the alternatives below. - - -### Use the default project library path - -If you want to use renv's default project-local library path, you'll need to initialize the project within the Docker container as an renv project. This can be done with: - -```dockerfile -RUN R -s -e "renv::init(bare = TRUE)" -RUN R -s -e "renv::restore()" -``` - -Or, alternatively, if you already have a project autoloader + settings available -- e.g. because you're creating a Docker image from an existing renv project -- you could use: - -```dockerfile RUN mkdir -p renv + +COPY renv.lock renv.lock COPY .Rprofile .Rprofile COPY renv/activate.R renv/activate.R COPY renv/settings.json renv/settings.json -RUN R -s -e "renv::restore()" -``` -Note that in this mode, the installed packages would only be visible to R sessions launched using `/project` as the working directory. This will be the default behavior as long as `WORKDIR` is not changed, but it's important to keep this in mind. +RUN R -s -e "renv::restore()" +COPY . . +``` -### Use a custom library path +This is a good starting point for most projects. The image restore step uses the same project metadata that you already commit to version control, so the container can recreate the project library before the rest of the source tree is copied. -If you'd like to fully customize the library path used, the simplest approach is likely to use the `RENV_PATHS_LIBRARY` environment variable. This mimics the above approach, but customizes the library paths used by renv. For example: +If you need to customize the library path, set `RENV_PATHS_LIBRARY` before calling `renv::restore()`: ```dockerfile -ENV RENV_PATHS_LIBRARY=renv/library -RUN R -s -e "renv::init(bare = TRUE)" +ENV RENV_PATHS_LIBRARY=/project/renv/library RUN R -s -e "renv::restore()" ``` -Alternatively, you could manage the library paths yourself via `.libPaths()` -- see `?.libPaths` in R for more inforamtion. - - -## Speeding up package installations - -The previously-described approaches are useful if you have multiple applications with identical package requirements. In this case, a single image containing this identical package library could serve as the parent image for several containerized applications. - -However, `renv::restore()` is slow -- it needs to download and install packages, which can take some time. Thus, some care is required to efficiently make use of the renv cache for projects that require: - -1. Building an image multiple times (e.g., to debug the production application as source code is updated), or - -2. Calling `renv::restore()` each time the container is run. - -The former process can be sped up using multi-stage builds, the latter by dynamically provisioning R Libraries, as described below. +## Caching package installs +If you rebuild the same image repeatedly, caching can make `renv::restore()` much faster. There are three common approaches. -### Multi-stage builds +### Basic Docker layer cache -For projects that require repeatedly building an image, [multi-stage builds](https://docs.docker.com/build/building/multi-stage/) can be used to speed up the build process. With multi-stage builds, multiple FROM statements are used in the Dockerfile and files can be copied across build stages. +The Dockerfile above already uses Docker's normal layer cache. Because `renv::restore()` happens before `COPY . .`, changes to application code do not invalidate the restore layer. Docker only needs to run `renv::restore()` again when the copied renv files change. -This approach can be leveraged to generate more efficient builds by dedicating a first stage build to package synchronization and a second stage build to copying files and executing code that may need to be updated often across builds (e.g., code that needs to be debugged in the container). +### Cache mounts -To implement a two stage build, the following code could be used as part of a Dockerfile. +If you are using BuildKit, you can also mount a cache directory into the build. This allows `renv::restore()` to reuse previously cached packages even when the restore layer itself needs to be rebuilt. ```dockerfile -FROM AS base +# syntax=docker/dockerfile:1 +FROM -# intialize the project; assuming renv infrastructure available WORKDIR /project RUN mkdir -p renv + COPY renv.lock renv.lock COPY .Rprofile .Rprofile COPY renv/activate.R renv/activate.R -COPY renv/settings.dcf renv/settings.dcf +COPY renv/settings.json renv/settings.json -# change default location of cache to project folder -RUN mkdir renv/.cache -ENV RENV_PATHS_CACHE=renv/.cache +RUN --mount=type=cache,target=/root/.cache/R/renv/cache \ + R -s -e "renv::restore()" -# restore -RUN R -s -e "renv::restore()" +COPY . . ``` -The above code uses `FROM AS ` to name the first stage of the build `base`. Here, `` should be replaced with an appropriate image name. +This cache only helps on the specific machine or builder that created it. It is useful for repeated local builds, but it will not usually carry over to a different machine or a fresh CI runner. + +### Bind-mounted host caches -Subsequently, the code uses approach 2 (described above) to copy the auto-loader to the project directory in the image. It additionally creates the `renv/.cache` directory that is to be used as the renv cache. +If the host machine already has a populated renv cache, you can bind-mount that cache into the build and let `renv::restore()` reuse it. This is especially useful when the host cache is managed outside Docker. -The second stage of the build is defined by adding the following code to the same Dockerfile, below the previous code chunk. +The Dockerfile can mount a host-provided cache context into the default renv cache path for a root-based Linux container: ```dockerfile +# syntax=docker/dockerfile:1 FROM WORKDIR /project -COPY --from=base /project . - -# add commands that need to be debugged below -``` - -Here, `` could be the same as the parent image of `base`, but does not have to be (see [documentation](https://docs.docker.com/build/building/multi-stage/) for more details). - -The key line is the `COPY` command, which specifies that the contents of `/project` directory from the `base` image are copied into the `/project` directory of this image. - -Any commands that will change frequently across builds could be included below the `COPY` command. If only this code associated with the second stage build is updated then `renv::restore()` will not be called again at build time. Instead, the layers associated with the `base` image will be loaded from Docker's cache, thereby saving significant time in build process. - -In fact, `renv::restore()` will only be called when the `base` image needs to be rebuilt (e.g., when changes are made to `renv.lock`). Docker's cache system is generally good at understanding the dependencies of images. However, if you find that the `base` image is not updating as expected, it is possible to manually enforce a clean build by including the `--no-cache` option in the call to `docker build`. - - -### Dynamically Provisioning R Libraries with renv - -However, on occasion, one will have multiple applications built from a single base image, but each application will have its own independent R package requirements. In this case, rather than including the package dependencies in the image itself, it would be preferable for each container to provision its own library at runtime, based on that application's `renv.lock` lockfile. - -In effect, this is as simple as ensuring that `renv::restore()` happens at container runtime, rather than image build time. However, on its own, `renv::restore()` is slow -- it needs to download and install packages, which could take prohibitively long if an application needs to be run repeatedly. - -The renv package cache can be used to help ameliorate this issue. When the cache is enabled, whenever renv attempts to install or restore an R package, it first checks to see whether that package is already available within the renv cache. If it is, that instance of the package is linked into the project library. Otherwise, the package is first installed into the renv cache, and then that newly-installed copy is linked for use in the project. - -In effect, if the renv cache is available, you should only need to pay the cost of package installation once -- after that, the newly-installed package will be available for re-use across different projects. At the same time, each project's library will remain independent and isolated from one another, so installing a package within one container won't affect another container. - -However, by default, each Docker container will have its own independent filesystem. Ideally, we'd like for *all* containers launched from a particular image to have access to the same renv cache. To accomplish this, we'll have to tell each container to use an renv cache located on a shared mount. +RUN mkdir -p renv -In sum, if we'd like to allow for runtime provisioning of R package dependencies, we will need to ensure the renv cache is located on a shared volume, which is visible to any containers launched. We will accomplish this by: +COPY renv.lock renv.lock +COPY .Rprofile .Rprofile +COPY renv/activate.R renv/activate.R +COPY renv/settings.json renv/settings.json -1. Setting the `RENV_PATHS_CACHE` environment variable, to tell the instance of renv running in each container where the global cache lives; +RUN --mount=type=bind,from=renv-cache,source=.,target=/root/.cache/R/renv/cache \ + R -s -e "renv::restore()" -2. Telling Docker to mount some filesystem location from the host filesystem, at some location (`RENV_PATHS_CACHE_HOST`), to a container-specific location (`RENV_PATHS_CACHE_CONTAINER`). +COPY . . +``` -For example, if you had a container running a Shiny application: +You can then provide that cache directory at build time with `docker buildx build`: -```dockerfile -# the location of the renv cache on the host machine -RENV_PATHS_CACHE_HOST=/opt/local/renv/cache - -# where the cache should be mounted in the container -RENV_PATHS_CACHE_CONTAINER=/renv/cache - -# run the container with the host cache mounted in the container -docker run --rm \ - -e "RENV_PATHS_CACHE=${RENV_PATHS_CACHE_CONTAINER}" \ - -v "${RENV_PATHS_CACHE_HOST}:${RENV_PATHS_CACHE_CONTAINER}" \ - -p 14618:14618 \ - R -s -e 'renv::restore(); shiny::runApp(host = "0.0.0.0", port = 14618)' +```sh +docker buildx build \ + --build-context renv-cache=.cache/renv \ + -t . ``` -Note that the invocation above assumes that the project has already been initialized either via calling `renv::init()` or by copying the requisite `renv` project infrastructure. With this, any calls to renv APIs within the created docker container will have access to the mounted cache. The first time you run a container, renv will likely need to populate the cache, and so some time will be spent downloading and installing the required packages. Subsequent runs will be much faster, as renv will be able to reuse the global package cache. +This approach is most useful when `.cache/renv` has already been populated on the host, for example by running `renv::restore()` outside Docker. It is often the preferred approach on ephemeral hosts such as GitHub Actions runners, because the host-side cache directory can be restored with the CI platform's native cache support before the build starts. GitHub Actions and Azure DevOps both provide native cache features that work well for this: [GitHub Actions cache](https://docs.github.com/actions/concepts/workflows-and-actions/dependency-caching) and [Azure DevOps Cache task](https://learn.microsoft.com/azure/devops/pipelines/release/caching?view=azure-devops). -The primary downside with this approach compared to the image-based approach is that it requires you to modify how containers are created, and requires a bit of extra orchestration in how containers are launched. However, once the renv cache is active, newly-created containers will launch very quickly, and a single image can then be used as a base for a myriad of different containers and applications, each with their own independent package dependencies. +If you want to use a different cache location inside the container, customize the mount target to match your configured renv cache path. ## Handling the renv autoloader @@ -238,4 +148,3 @@ Bootstrapping renv into the project library might be unnecessary for you. If tha ```{sh} R --vanilla -s -e 'renv::restore()' ``` - From 077725fa70a91721b85d9d6e411829aa4f91818d Mon Sep 17 00:00:00 2001 From: Adam Smielewski Date: Sat, 7 Mar 2026 16:14:44 +0100 Subject: [PATCH 2/3] fix(): bug --- vignettes/docker.Rmd | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/vignettes/docker.Rmd b/vignettes/docker.Rmd index 5f40fefef..bc7db72fd 100644 --- a/vignettes/docker.Rmd +++ b/vignettes/docker.Rmd @@ -109,7 +109,7 @@ COPY .Rprofile .Rprofile COPY renv/activate.R renv/activate.R COPY renv/settings.json renv/settings.json -RUN --mount=type=bind,from=renv-cache,source=.,target=/root/.cache/R/renv/cache \ +RUN --mount=type=bind,from=renv-cache,source=.,target=/root/.cache/R/renv/cache,rw \ R -s -e "renv::restore()" COPY . . @@ -123,7 +123,7 @@ docker buildx build \ -t . ``` -This approach is most useful when `.cache/renv` has already been populated on the host, for example by running `renv::restore()` outside Docker. It is often the preferred approach on ephemeral hosts such as GitHub Actions runners, because the host-side cache directory can be restored with the CI platform's native cache support before the build starts. GitHub Actions and Azure DevOps both provide native cache features that work well for this: [GitHub Actions cache](https://docs.github.com/actions/concepts/workflows-and-actions/dependency-caching) and [Azure DevOps Cache task](https://learn.microsoft.com/azure/devops/pipelines/release/caching?view=azure-devops). +This approach is most useful when `.cache/renv` has already been populated on the host, for example by running `renv::restore()` outside Docker. Bind mounts are read-only by default, so the example uses `rw` to avoid write failures if `renv::restore()` needs to update the cache during the build. It is often the preferred approach on ephemeral hosts such as GitHub Actions runners, because the host-side cache directory can be restored with the CI platform's native cache support before the build starts. GitHub Actions and Azure DevOps both provide native cache features that work well for this: [GitHub Actions cache](https://docs.github.com/actions/concepts/workflows-and-actions/dependency-caching) and [Azure DevOps Cache task](https://learn.microsoft.com/azure/devops/pipelines/release/caching?view=azure-devops). If you want to use a different cache location inside the container, customize the mount target to match your configured renv cache path. From b812e082ba05acec5bf9769058cd58b5c0b3a520 Mon Sep 17 00:00:00 2001 From: Adam Smielewski Date: Sun, 15 Mar 2026 12:13:17 +0100 Subject: [PATCH 3/3] docs(): clarify Docker cache mount behavior --- vignettes/docker.Rmd | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/vignettes/docker.Rmd b/vignettes/docker.Rmd index bc7db72fd..b92f3b6cc 100644 --- a/vignettes/docker.Rmd +++ b/vignettes/docker.Rmd @@ -83,12 +83,17 @@ COPY .Rprofile .Rprofile COPY renv/activate.R renv/activate.R COPY renv/settings.json renv/settings.json +ENV RENV_CONFIG_CACHE_SYMLINKS=FALSE RUN --mount=type=cache,target=/root/.cache/R/renv/cache \ R -s -e "renv::restore()" COPY . . ``` +The line `RUN --mount=type=cache,target=/root/.cache/R/renv/cache` tells BuildKit to make a persistent build cache available at renv's cache path for that one `RUN` instruction, so `renv::restore()` can reuse previously downloaded packages; see Docker's [`RUN --mount` documentation](https://docs.docker.com/reference/dockerfile/#run---mount) and [cache mount guide](https://docs.docker.com/build/cache/optimize/). + +Setting `RENV_CONFIG_CACHE_SYMLINKS=FALSE` is important here because the cache mount is not part of the final image. With symlinks enabled, renv could leave the project library pointing at packages in the mounted cache, and those symlinks would be broken once the build step finishes. + This cache only helps on the specific machine or builder that created it. It is useful for repeated local builds, but it will not usually carry over to a different machine or a fresh CI runner. ### Bind-mounted host caches @@ -109,12 +114,15 @@ COPY .Rprofile .Rprofile COPY renv/activate.R renv/activate.R COPY renv/settings.json renv/settings.json +ENV RENV_CONFIG_CACHE_SYMLINKS=FALSE RUN --mount=type=bind,from=renv-cache,source=.,target=/root/.cache/R/renv/cache,rw \ R -s -e "renv::restore()" COPY . . ``` +The line `RUN --mount=type=bind,from=renv-cache,source=.,target=/root/.cache/R/renv/cache,rw` tells BuildKit to mount the named build context `renv-cache` at renv's cache path for that one `RUN` instruction, with temporary write access; see Docker's [`RUN --mount` documentation](https://docs.docker.com/reference/dockerfile/#run---mount) and [named contexts documentation](https://docs.docker.com/build/building/context/#named-contexts). + You can then provide that cache directory at build time with `docker buildx build`: ```sh @@ -123,7 +131,7 @@ docker buildx build \ -t . ``` -This approach is most useful when `.cache/renv` has already been populated on the host, for example by running `renv::restore()` outside Docker. Bind mounts are read-only by default, so the example uses `rw` to avoid write failures if `renv::restore()` needs to update the cache during the build. It is often the preferred approach on ephemeral hosts such as GitHub Actions runners, because the host-side cache directory can be restored with the CI platform's native cache support before the build starts. GitHub Actions and Azure DevOps both provide native cache features that work well for this: [GitHub Actions cache](https://docs.github.com/actions/concepts/workflows-and-actions/dependency-caching) and [Azure DevOps Cache task](https://learn.microsoft.com/azure/devops/pipelines/release/caching?view=azure-devops). +This approach is most useful when `.cache/renv` has already been populated on the host, for example by running `renv::restore()` outside Docker. Bind mounts are read-only by default, so the example uses `rw` to avoid write failures if `renv::restore()` needs to update the cache during the build. Even with `rw`, writes to the bind mount are only available for the duration of that `RUN` instruction and are discarded afterwards, so the host-provided cache context is not modified. This helps keep repeated builds reproducible, including when multiple builds run sequentially or in parallel. `RENV_CONFIG_CACHE_SYMLINKS=FALSE` is needed here for the same reason as in the cache-mount example: the mounted cache is available during the build step, but it is not carried into the final image. It is often the preferred approach on ephemeral hosts such as GitHub Actions runners, because the host-side cache directory can be restored with the CI platform's native cache support before the build starts. GitHub Actions and Azure DevOps both provide native cache features that work well for this: [GitHub Actions cache](https://docs.github.com/actions/concepts/workflows-and-actions/dependency-caching) and [Azure DevOps Cache task](https://learn.microsoft.com/azure/devops/pipelines/release/caching?view=azure-devops). If you want to use a different cache location inside the container, customize the mount target to match your configured renv cache path.