FAIR standards for the creation of research materials, with examples
In the fast-paced world of scientific research, establishing minimum standards for the creation of research materials is essential. Whether it’s stimuli, custom software for data collection, or scripts for statistical analysis, the quality and transparency of these materials significantly impact the reproducibility and credibility of research. This blog post explores the importance of adhering to FAIR (Findable, Accessible, Interoperable, Reusable) principles, and offers practical examples for researchers, with a focus on the cognitive sciences.
Notwithstanding the need for speed in most scientific projects, what should be the minimum acceptable standards in the creation of research materials such as stimuli, custom software for data collection (e.g., experiment in jsPsych, OpenSesame or psychoPy), or scripts for statistical analysis?
The answer to this question is contingent upon the field of research, the purpose and the duration of the project, and many other contextual factors. So, to narrow down the scope and come at a general answer, let’s suppose we asked a researcher in the cognitive sciences (e.g., a linguist, a psychologist or a neuroscientist) who values open science. Perhaps, such a researchers would be satisfied with a method for the creation of materials that allows the creators of the materials, as well as their collaborators and any other stakeholders (e.g., any fellow scientists working in the same field), to explore, understand, reproduce, modify, and reuse the materials following their completion and thereafter. Let’s review some of the implements that can help fulfil these standards.
FAIRness
The FAIR Guiding Principles for scientific data management and stewardship exhaustively describe a protocol for making materials Findable, Accessible, Interoperable and Reusable. These terms cover the five allowances listed above, along with other important aspects.
Let’s look at some instantiations of the FAIR principles.
Reproducibility
It is convenient to allow others, and our future selves, to reproduce the materials throughout their preparation and at any time thereafter. For this purpose, R can be used to register in scripts as many as possible of the steps followed throughout the preparation of the materials. Far from being only a software for data analysis, R allows the preparation of texts, images, audios, etc. Humans err, by definition. That can be counted on. Conveniently, registering the steps followed during weeks or months of preparation allows us to offload part of the documentation efforts. It’s a way of video-recording, as it were, all the additions, subtractions, replacements, transformations and calculations performed with the raw materials, for the creation of the final materials.
Generous documentation
Under the curse of knowledge, the creators of research materials may believe that their materials are self-explanatory. Often they are more obscure than they think. To allow any other stakeholders, including their future selves, to exercise the five allowances listed above—i.e., explore, understand, reproduce, modify, and reuse the materials—, the preparation process and the end materials should be documented with enough detail. This can be done using README.txt files throughout the project. Using the .txt
format/extension is recommended because other formats, such as Microsoft Word, may not be (fully) available in some computers. To exemplify the format and the content of readme files, below is an excerpt from a longitudinal study on which I’ve been working.
Open-source software
Where possible, open-source software should be used. Open-source software is free, and hence more accessible. Open-source software can be classified in various dimensions, such as the size of the user base. The more users, the greater the support, because the core developers have more resources, and the users will often help each other in public forums such as Stack Exchange. For instance, a programming language such as R boasts millions of users worldwide who count on support in public forums and in R-specific forums such as the Posit Community.
Other software are not as large. For instance, open-source software for psychological research (e.g., OpenSesame, psychoPy) are far smaller than R in terms of community. Yet, these software too can count on substantial support. For the more basic uses, most of the way has already been paved, and the existing documentation suffices. For more advanced uses, the smaller size of the community can become more obvious, as one needs to spend more time looking for solutions.
Regardless of the size of the community, all else being equal, open-source software is the right choice to ensure access to one’s work for all (potential) stakeholders in the future. The other option, proprietary software, entails dependence on the services of a private company.
Tidiness and parsimony in computer code
Code scripts should be as tidy and parsimonious as possible. For instance, to prevent overly long scripts that would impair the comprehension of the materials, it is useful to break down large projects into nested scripts, and source
(i.e., run) the smaller scripts in the larger scripts.
# Compose all stimuli for Sessions 2, 3, 4 and 6
# Create participant-specific parameters
source('stimulus_preparation/participant_parameters.R')
# Frame base images
source('stimulus_preparation/base_images.R')
# Session 2
source('stimulus_preparation/Session 2/Session2_compile_all_stimuli.R')
# Session 3
source('stimulus_preparation/Session 3/Session3_compile_all_stimuli.R')
# Session 4
source('stimulus_preparation/Session 4/Session4_compile_all_stimuli.R')
# Session 6
source('stimulus_preparation/Session 6/Session6_compile_all_stimuli.R')
Tidiness and parsimony in project directories
A directory tree is useful to display all the folders in a project. The tree can be produced in the RStudio ‘Terminal’ console using the following one-line command.
find . -type d | sed -e "s/[^-][^\/]*\// |/g" -e "s/|\([^ ]\)/| - \1/"
The output will look like the following (excerpt from https://osf.io/gt5uf/wiki).
.
| - bayesian_priors
| | - plots
| - semanticpriming
| | - analysis_with_visualsimilarity
| | | - model_diagnostics
| | | | - results
| | | | - plots
| | | - results
| | | - plots
| | | - correlations
| | | | - plots
| | - frequentist_bayesian_plots
| | | - plots
| | - frequentist_analysis
| | | - model_diagnostics
| | | | - results
| | | | - plots
| | | - lexical_covariates_selection
| | | | - results
| | | | - plots
| | | - results
| | | - plots
Conclusion
Adhering to best practices—including the FAIR principles—in the creation of research materials enhances transparency, accessibility and reproducibility in scientific research. These standards facilitate researchers’ work beyond the short term, and increase the reliability of scientific work, thus contributing to the best use of resources.
Comments in code scripts
It is helpful for our future selves, for our collaborators, and for any other stakeholders associated with a project—which includes any fellow researchers worldwide—to include comments in code scripts. These comments should introduce the purpose of the script at the top, and the purpose of various components of the code. Some excerpts are shown below as examples.