Introduction to Reproducible Coding Environment
Welcome
Today we’ll chat about reproducible coding environments in R. We’ll cover what reproducibility means, why it matters, and look at practical ways to make your research and data analysis easier to repeat and share.
About Code Club
SORTEE Code Club is an online meeting where people come together to learn, share, and collaborate on coding-related topics in an informal and supportive environment
SORTEE Resources
Do you know about SORTEE Resources ?
Today’s Agenda
We’ll cover:
What reproducibility actually means
Why reproducibility matters
Using R Projects
Managing package dependencies with renv
Handling different R versions
Some limitations and tips
Disclaimer : today’s Code Club will focus on Reproducibility in R
Introduction to Reproducibility in R
A reproducible coding environment is a computational setup with clearly documented code, data, software versions, and methods allowing exact replication of results.
Original comic from xkcd
The Why
Reproducible coding environments:
Ensure reliability and credibility of research findings
Facilitate collaborative research and transparent scientific communication
Enable efficient troubleshooting and debugging
Save time by reducing redundant efforts to recreate computational setups
Promote best practices in scientific computing and data analysis
So, why are we talking about reproducible coding environments? Why is a reproducible coding environment important?
The How
There is no single solution or perfect approach to reproducibility Instead, there are a collection of various best strategies that you can use!
Example
For example, consider different ways of referencing file paths in your R scripts:
# Bad: absolute path
setwd ("C:/Users/MyName/Documents/Project/Data" )
data <- read.csv ("data.csv" )
# OK: relative path extensively documented in README (but fragile!)
setwd ("~/Downloads/Data" )
data <- read.csv ("data.csv" )
# Better: using here package, but without RProject the structure is unclear
library (here)
data <- read.csv (here ("Data" , "data.csv" ))
# Best: explicit use of an R project (.Rproj file) ensures 'here()' always resolves correctly
library (here)
data <- read.csv (here ("Data" , "data.csv" ))
It’s important to begin implementing reproducible practices incrementally, building up complexity and robustness over time rather than aiming for perfection from the start.
The How: .RProject
R Projects is a way to organize all your analysis files into one easy-to-use place.
R Projects make your life easier because:
Paths are simple(r)
Easy collaboration
Less confusion
How to make an .RProject
Example : .RProject
For example:
R Projects is one of the simplest and most effective ways to quickly become more reproducible. When you create an R Project, you get a special .Rproj
file that marks the project’s root directory, meaning everything you do in that project stays nicely contained. - Paths are simple(r): You don’t have to worry about absolute paths that break when you move your files. Every file path you use is relative to your project, so it’s easy to share and move around without things breaking. - Easy collaboration: Because all your project’s files are in one clear place, anyone who receives your project can easily set it up on their computer and run your code exactly as you intended. - Less confusion: No more wondering where files are stored or which files belong to your analysis. Everything related to your project stays together and is clearly structured.
.RProject is not enough alone
Even though an R Project helps organize your analysis into one coherent location…
… You still need a structured directory layout to organize your file inside this project to get the most out of it
Organise your Files!
Not this!
But this!
Having an R project is a good start, but you still need to organise your files in a meaningful way, you can’t just throw everything in the same cauldron.
Understanding Packages📦 and Libraries📘
Collections of functions and compiled code that extend R’s functionality
Packages📦 reside within a library📘 , a directory on your computer where R stores installed packages
Managing package versions matters for:
Version control
Transparency
Open projects
The How: Managing Packages with renv
renv
helps make your R projects reproducible by managing your package dependencies
It records which packages (and their versions) you’re using in your project.
It helps you avoid package version conflicts across different projects.
It ensures that your project setup can be easily replicated later.
renv stands for “R environment,” and it’s a package designed to keep your R project’s packages consistent and reproducible. Think of it like taking a “snapshot” of exactly which R packages you’re using in your project, including their exact versions.
Why does this matter? Because packages in R are constantly being updated, and sometimes updates change the way packages behave. Without controlling package versions, your code might run perfectly today but fail or give different results a few months down the line.
Using renv: you, starting the project
renv::init()
: create an isolated library for your project
renv::snapshot()
: save exact versions of packages you’re using into a renv.lock
file
This lockfile acts like a recipe that describes the environment your project needs to run.
When you’re working on a new project and want to make it reproducible, here’s what you do: - renv::init()
— this creates a private library just for your project. That means any packages you install won’t interfere with other projects. - As you install packages and write code, use renv::snapshot()
to save the current state of your project’s library. This creates a file called renv.lock
, which records exactly which packages you’re using, their versions, and where they came from.
Using renv: someone else, with your project
Let’s say someone else wants to run your code. They don’t need to guess what packages or versions you used — they just run:
The command renv::restore() reads your renv.lock
file and automatically installs all the required packages in the correct versions. It recreates the same package setup you used.
renv greatly simplifies reproducing the exact environment and package setup used in your project
Limitations
renv
doesn’t handle:
R itself (versions)
System dependencies (compilers, external libraries)
However, no single tool handles everything. One of the limitations most R users forget abut renv is that renv does not handle R versions. So you could have a perfectly documented, saved project, but it might break because over time a new R version has been implemented. So you need complementary tools.
Managing R Versions
Sometimes it’s not just about the packages — the version of R itself can matter too.
That’s where Rig comes in.
Small changes between versions of R can affect how packages behave or whether certain functions are even available.
What is Rig?
Rig is a lightweight tool that helps you manage multiple versions of R on the same machine.
It works on Windows, macOS, and Linux .
You can easily install new R versions.
You can switch between R versions with a simple command.
Why Use Rig?
Makes it easy to test code across R versions.
Simplifies keeping old projects running on older setups.
Plays well with tools like renv
, completing the reproducibility setup.
🧩 renv manages your packages 🛠️ Rig manages your R versions
rig supports long-term reproducibility. If someone opens your project years later, they can install the same version of R and restore your setup.
Wrapping Up
Improving reproducibility:
Organize projects clearly
Manage dependencies carefully (renv
)
Use version management (Rig)
Document everything!