Introduction to Reproducible Coding Environment

Welcome

Every third Tuesday of the Month
Topics and schedule here: SORTEE Code Club Schedule
Do you have a topic in mind? Write us!

About Code Club

SORTEE Code Club is an online meeting where people come together to learn, share, and collaborate on coding-related topics in an informal and supportive environment

SORTEE Resources

Do you know about SORTEE Resources?

SORTEE Library of Code Mistakes

Find an ORTEE Reviewer

GitHub SORTEE

Today’s Agenda

We’ll cover:

What reproducibility actually means
Why reproducibility matters
Using R Projects
Managing package dependencies with renv
Handling different R versions
Some limitations and tips

Disclaimer: today’s Code Club will focus on Reproducibility in R

Introduction to Reproducibility in R

A reproducible coding environment is a computational setup with clearly documented code, data, software versions, and methods allowing exact replication of results.

Original comic from xkcd

The Why

Reproducible coding environments:

Ensure reliability and credibility of research findings
Facilitate collaborative research and transparent scientific communication
Enable efficient troubleshooting and debugging
Save time by reducing redundant efforts to recreate computational setups
Promote best practices in scientific computing and data analysis

The How

There is no single solution or perfect approach to reproducibility

Instead, there are a collection of various best strategies that you can use!

Example

For example, consider different ways of referencing file paths in your R scripts:

# Bad: absolute path
setwd("C:/Users/MyName/Documents/Project/Data")
data <- read.csv("data.csv")

# OK: relative path extensively documented in README (but fragile!)
setwd("~/Downloads/Data")
data <- read.csv("data.csv")

# Better: using here package, but without RProject the structure is unclear
library(here)
data <- read.csv(here("Data", "data.csv"))

# Best: explicit use of an R project (.Rproj file) ensures 'here()' always resolves correctly
library(here)
data <- read.csv(here("Data", "data.csv"))

The How: .RProject

R Projects is a way to organize all your analysis files into one easy-to-use place.

R Projects make your life easier because:

Paths are simple(r)
Easy collaboration
Less confusion

How to make an .RProject

Spoiler alert!
Check Code Club calendar for our Git & Github session!

Example : .RProject

For example:

R Projects is one of the simplest and most effective ways to quickly become more reproducible. When you create an R Project, you get a special .Rproj file that marks the project’s root directory, meaning everything you do in that project stays nicely contained. - Paths are simple(r): You don’t have to worry about absolute paths that break when you move your files. Every file path you use is relative to your project, so it’s easy to share and move around without things breaking. - Easy collaboration: Because all your project’s files are in one clear place, anyone who receives your project can easily set it up on their computer and run your code exactly as you intended. - Less confusion: No more wondering where files are stored or which files belong to your analysis. Everything related to your project stays together and is clearly structured.

.RProject is not enough alone

Even though an R Project helps organize your analysis into one coherent location…

… You still need a structured directory layout to organize your file inside this project to get the most out of it

Organise your Files!

Not this!

But this! NCEAS Learning Hub’s coreR Course

Understanding Packages📦 and Libraries📘

Collections of functions and compiled code that extend R’s functionality

Packages📦 reside within a library📘, a directory on your computer where R stores installed packages

Managing package versions matters for:

Version control
Transparency
Open projects

The How: Managing Packages with `renv`

renv helps make your R projects reproducible by managing your package dependencies

It records which packages (and their versions) you’re using in your project.
It helps you avoid package version conflicts across different projects.
It ensures that your project setup can be easily replicated later.

Using renv: you, starting the project

renv::init(): create an isolated library for your project
renv::snapshot(): save exact versions of packages you’re using into a renv.lock file

This lockfile acts like a recipe that describes the environment your project needs to run.

Using renv: someone else, with your project

renv::restore()

Yes, that’s it.

Limitations

renv doesn’t handle:

R itself (versions)
System dependencies (compilers, external libraries)

Managing R Versions

Sometimes it’s not just about the packages — the version of R itself can matter too.

That’s where Rig comes in.

What is Rig?

Rig is a lightweight tool that helps you manage multiple versions of R on the same machine.

It works on Windows, macOS, and Linux.
You can easily install new R versions.
You can switch between R versions with a simple command.

Why Use Rig?

Makes it easy to test code across R versions.
Simplifies keeping old projects running on older setups.
Plays well with tools like renv, completing the reproducibility setup.

🧩 renv manages your packages
🛠️ Rig manages your R versions

Wrapping Up

Improving reproducibility:

Organize projects clearly
Manage dependencies carefully (renv)
Use version management (Rig)
Document everything!

Resources

R-Ladies - Personal R Administration
Introduction to renv
What they forgot to teach you about R
renv challenges
Reproducible Environments - Posit
The Turing Way - Reproducible Environments
Rig, the R installation Manager
Groundhog, an alternative to renv
Enough targets to Write a Thesis