Walking the line between reproducibility and efficiency in R Markdown: Three methods

As technology and research methods advance, the data sets tend to be larger and the methods more exhaustive. Consequently, the analyses take longer to run. This poses a challenge when the results are to be presented using R Markdown. One has to balance reproducibility and efficiency. On the one hand, it is desirable to keep the R Markdown document as self-contained as possible, so that those who may later examine the document can easily test and edit the code. On the other hand, it would be inefficient to create a document that is very slow to run or very long. The context of the task will determine how how time-consuming and long the code in an Rmd file can be. For instance, one could decide that the knitting can take up to 15 minutes, and each code chunk can span up to 30 lines.

Several methods can be used in each document to accommodate different types of code. Three methods are presented below, ordered from easier-to-reproduce to easier-to-knit.

  1. For fast- and concise-enough code: Provide the original code in the Rmd file. The code is run as the document is knitted. Example:

     nrow(myData)
  2. For fast-enough but very long code: Store the code in a separate script and source it in the Rmd file. The code is run as the document is knitted. Example:

     source('analysis/model_diagnostics.R')
  3. For very slow and/or long code: Store the code in a separate script and run it prior to knitting the Rmd file, so that the output from the code (e.g., a model, a plot) is saved and can be read into the Rmd. Example:

     model_1 = readRDS('results/model_1.rds')

Importantly, even the third method allows the reproducibility of the code. It just requires a bit of additional documentation to ensure that the end user can also access the script in which the result was produced (e.g., ‘analysis/model_1.R’).

Avatar
Pablo Bernabeu
Postdoctoral fellow at UiT –
The Arctic University of Norway

After doing a research master's, I became a PhD student and graduate teaching assistant in Psychology at Lancaster University, where I investigated how conceptual processing—that is, the comprehension of the meaning of words—is supported by linguistic and sensorimotor brain systems, and how research on this topic is influenced by methodological aspects such as the operationalisation of variables and the sample size of experiments. Currently, I am a postdoctoral fellow at UiT The Arctic University of Norway, where I am investigating the behavioural and neural underpinnings of multilingualism. Throughout my research, I have used methods such as behavioural and electroencephalographic experiments, corpus analysis, statistics and programming. Research materials at https://osf.io/25u3x. CV available here.

comments powered by Disqus