FAIR standards for the creation of research materials, with examples

2023 research methods, R

In the fast-paced world of scientific research, establishing minimum standards for the creation of research materials is essential. Whether it’s stimuli, custom software for data collection, or scripts for statistical analysis, the quality and transparency of these materials significantly impact the reproducibility and credibility of research. This blog post explores the importance of adhering to FAIR (Findable, Accessible, Interoperable, Reusable) principles, and offers practical examples for researchers, with a focus on the cognitive sciences.

Notwithstanding the need for speed in most scientific projects, what should be the minimum acceptable standards in the creation of research materials such as stimuli, custom software for data collection (e.g., experiment in jsPsych, OpenSesame or psychoPy), or scripts for statistical analysis?

The answer to this question is contingent upon the field of research, the purpose and the duration of the project, and many other contextual factors. So, to narrow down the scope and come at a general answer, let’s suppose we asked a researcher in the cognitive sciences (e.g., a linguist, a psychologist or a neuroscientist) who values open science. Perhaps, such a researchers would be satisfied with a method for the creation of materials that allows the creators of the materials, as well as their collaborators and any other stakeholders (e.g., any fellow scientists working in the same field), to explore, understand, reproduce, modify, and reuse the materials following their completion and thereafter. Let’s review some of the implements that can help fulfil these standards.

FAIRness

The FAIR Guiding Principles for scientific data management and stewardship exhaustively describe a protocol for making materials Findable, Accessible, Interoperable and Reusable. These terms cover the five allowances listed above, along with other important aspects.

Let’s look at some instantiations of the FAIR principles.

Reproducibility

It is convenient to allow others, and our future selves, to reproduce the materials throughout their preparation and at any time thereafter. For this purpose, R can be used to register in scripts as many as possible of the steps followed throughout the preparation of the materials. Far from being only a software for data analysis, R allows the preparation of texts, images, audios, etc. Humans err, by definition. That can be counted on. Conveniently, registering the steps followed during weeks or months of preparation allows us to offload part of the documentation efforts. It’s a way of video-recording, as it were, all the additions, subtractions, replacements, transformations and calculations performed with the raw materials, for the creation of the final materials.

Generous documentation

Under the curse of knowledge, the creators of research materials may believe that their materials are self-explanatory. Often they are more obscure than they think. To allow any other stakeholders, including their future selves, to exercise the five allowances listed above—i.e., explore, understand, reproduce, modify, and reuse the materials—, the preparation process and the end materials should be documented with enough detail. This can be done using README.txt files throughout the project. Using the .txt format/extension is recommended because other formats, such as Microsoft Word, may not be (fully) available in some computers. To exemplify the format and the content of readme files, below is an excerpt from a longitudinal study on which I’ve been working.


-- Post-training test --

In Sessions 2, 3, 4 and 6, if the test is failed in the first attempt, the training and the test are 
repeated (following González Alonso et al., 2020). In such cases, the result is shown at the end 
of the second attempt. The session advances if the accuracy achieved in the second attempt exceeds 
80%, whereas the session stops if the accuracy is lower. In the latter situation, an 'End of session' 
message is presented, flanked by two orange circles, and followed by an acknowledgement for the 
participant. Once the participant has read this screen, the experimenter quits the session by 
pressing 'ESC' and then 'Q'.


== Stimuli ==

The stimulus lists are described in the R functions that were used to create the stimuli, as well as
in the 'list' column in the stimulus files.


== Participant-specific parameters for lab-based sessions ==

Each participant was assigned certain parameters in advance, including the mini-language, the order 
of the resting-state parts, and the stimulus lists. The code that was used to create this assignment 
is available in the 'stimulus_preparation' folder. 

Due to the pre-assignment of the parameters, there is a fixed set of participant IDs that can be 
used in OpenSesame. These identification numbers range between 1 and 144. If an ID outside of this 
range is used, the OpenSesame session does not run.


== General procedure for lab-based sessions ==

At the beginning of lab-based Sessions 2, 3, 4 and 6, the experimenter will first signal the lab is 
busy using a light or a sign. Next, they will ascertain what participant and what session applies. 
This is done using the session logbook that is shared among all session conductors. This session 
logbook is instantly updated online using a cloud service, such as OneDrive. 

Next, the experimenter starts OpenSesame by opening the program directly (not by opening the 
session-specific file), and then opens the appropriate session within OpenSesame. This procedure 
helps prevent the opening of a standalone Python window, the closing of which would result in the 
closing of OpenSesame. Next, the experimenter opens BrainVision Recorder.

When the participant arrives in the lab, they are informed that they can use the toilet outside. 
The participant is also offered some water. 

Next, the size of their head is measured, and an appropriate cap is tried on the participant’s 
head. Next, the cap is placed on a dummy head, and the electrodes are attached to the cap. At 
that point, to prevent signal interference, the participant is kindly asked to either put their 
mobile devices (phone, tablet, smartwatch) in flight mode, or to leave them outside of the booth. 

Next, to protect the participant's clothes from any drops of gel, a towel is placed on their 
upper back, covering shoulders and upper torso. Both ends of the towel are clipped together at 
the front using two or three clothes pegs. Next, the cap is fitted on the participant's head. To 
prevent the cap from being pulled back during the session, the splitter box is attached to the 
towel on the participant's back, right below their head. Next, measures are taken to adjust the 
position of the cap evenly, first from the nasion to the inion, and then from the tip of an ear 
to the other ear. 

Next, the experimenter returns to OpenSesame and runs the session in full screen by clicking on 
the full green triangle at the top left. Then, a file explorer window opens, in which the 
experimenter must assign a subject number consistent with the session logbook, and must select 
the destination folder for the logfile. The destination folder is called 'logfiles'. Any prompts 
to overwrite a logfile must normally be refused, or considered carefully, due to the risk of 
losing data from previous sessions.

In the first screen, the experimenter can disable some of the tasks. This option can be used if a 
session has ended abruptly, in which case the session can be resumed from a near checkpoint. In 
such a case, the experimenter must first note this incident in their logbook, and rename the log 
file that was produced on the first run, by appending '_first_run' to the name. This prevents 
overwriting the file on the second run. Next, they must open a new session, enter the same 
participant ID, and select the appropriate part from which to begin. This part must be the part 
immediately following the last part that was completed in full. For instance, if a session ended
abruptly during the experiment, the beginning selected on the second run would be the experiment. 
Once the session has finished completely, the first log file and the second log file must be 
safely merged into a single file, keeping only the fully completed tasks.

In the first instructional screen, participants are asked to refrain from asking any questions 
unless it is necessary, so that all participants can receive the same instructions.

At the beginning of the resting-state part in Session 2, and at the beginning of the Experiment 
part, instructions are presented on the screen that ask participants to stay as still as possible 
during the following task. The screen contains an orange-coloured square with the letters 'i.s.r', 
that remind the experimenter to check the impedance and the signal, and finally to begin recording 
the EEG signal. If the impedance of any electrodes is poor, the experimenter may enter the booth 
to lower the impedance of the electrodes affected. Otherwise, after validating the signal and the 
impedance, the experimenter can begin the recording in BrainVision, and press the letter 'C' twice 
in the stimulus computer. At that point, a green circle will appear, along with instructions for 
the participant. 

Similarly, at the end of the eyes-closed resting-state measurement (which is five minutes long), 
the experimenter must intervene when they see the screen with the orange stripes, by knocking on 
the door to let the participant open their eyes. 

Furthermore, at the end of the resting-state part and at the end of the Experiment part, a screen
with a crossed-out R appears to remind the experimenter to stop recording the EEG. 

Notice that the above-mentioned stages, characterised by screens with orange stripes, require the 
experimenter's intervention. The experimenter must allow the participant to read any text on these 
screens. Next, the experimenter must press the letter 'C' twice to let the session continue. This 
protocol provides the experimenter with control when necessary. The experimenter should be aware 
of the use of the letter 'C' at these points, as the requirement is not signalled on the screen 
to prevent participants from pressing the letter themselves. 

During the experiment, it is important to monitor the EEG signal. If it ever becomes very noisy, 
the experimenter must wait until the next break and the participant to stop, so that the signal 
can be verified. If the noise in the signal is due to the participant's movement, they should be 
asked again to please stay as still possible. If the noise is due to an increase in the 
impedance of some electrodes, the impedance of those electrodes should be revised.

The experiment in Session 2 contains breaks every 40 trials, whereas the experiments in subsequent 
sessions contain breaks every 50 trials. During these breaks, the number of the current trial 
appears in grey on the bottom right corner of the screen.

If a session ends abruptly during the experiment, but there is not enough time to restart the 
session from the experiment, then the data should be uploaded to the repository. 


== Definition of items in OpenSesame (only for programming purposes, not for in-session use) ==

  -- Each major part of the session is contained in a sequence item that is named in capital 
     letters (e.g., 'PRETRAINING', 'TRAINING', 'TEST', 'EXPERIMENT').

  -- 'continue_space': allows proceeding to the following screen after pressing the space bar, 
     which should be done by the participant. In most cases, two presses are required, as 
     detailed on the screen.

  -- 'continue_c': allows proceeding to the following screen after pressing the letter 'C', 
     which should be done by the experimenter. In most cases, two presses are required, as 
     detailed on the screen.


== Variables in the OpenSesame log files ==

In the log files produced by OpenSesame, each part of the session (e.g., Test, Experiment) is 
identified in the variable 'session_part'. The names of the response variables are 'response',
'response_time' and 'correct'. Item-specific response variables follow the formats of 
'response_[item_name]', 'response_time_[item_name]' and 'correct_[item_name]' 
(see https://osdoc.cogsci.nl/3.3/manual/variables/#response-variables).

The output is verbose and requires preprocessing of the data. For instance, the last response 
in each loop may appear twice in the output, due to the processing of the response. These 
duplicates can--and must--be cleaned up by discarding the rows that have the same trial number
as the preceding row.


== EEG triggers ==

Triggers are sent to the EEG recorder throughout the experiment. The system for sending 
triggers is set up in OpenSesame script within the inline script 'EEG_trigger_setup'.

The key to the triggers is provided below.

  0: reset trigger port in BrainVision Recorder. This trigger is integrated in the 
     trigger-sending function.

  -- Resting-state EEG part --

    10: beginning of eyes-open resting-state EEG

    11: end of eyes-open resting-state EEG

    12: beginning of eyes-closed resting-state EEG

    13: end of eyes-closed resting-state EEG

  -- Experiment part --

    5: fixation mark

    -- ID of each target sentence (only applicable to target trials) --

        110--253: triggers ranging between 110 and 253, time-locked to the onset of the 
          word of interest in each trial.

Comments in code scripts

It is helpful for our future selves, for our collaborators, and for any other stakeholders associated with a project—which includes any fellow researchers worldwide—to include comments in code scripts. These comments should introduce the purpose of the script at the top, and the purpose of various components of the code. Some excerpts are shown below as examples.

Open-source software

Where possible, open-source software should be used. Open-source software is free, and hence more accessible. Open-source software can be classified in various dimensions, such as the size of the user base. The more users, the greater the support, because the core developers have more resources, and the users will often help each other in public forums such as Stack Exchange. For instance, a programming language such as R boasts millions of users worldwide who count on support in public forums and in R-specific forums such as the Posit Community.

Other software are not as large. For instance, open-source software for psychological research (e.g., OpenSesame, psychoPy) are far smaller than R in terms of community. Yet, these software too can count on substantial support. For the more basic uses, most of the way has already been paved, and the existing documentation suffices. For more advanced uses, the smaller size of the community can become more obvious, as one needs to spend more time looking for solutions.

Regardless of the size of the community, all else being equal, open-source software is the right choice to ensure access to one’s work for all (potential) stakeholders in the future. The other option, proprietary software, entails dependence on the services of a private company.

Tidiness and parsimony in computer code

Code scripts should be as tidy and parsimonious as possible. For instance, to prevent overly long scripts that would impair the comprehension of the materials, it is useful to break down large projects into nested scripts, and source (i.e., run) the smaller scripts in the larger scripts.

# Compose all stimuli for Sessions 2, 3, 4 and 6

# Create participant-specific parameters
source('stimulus_preparation/participant_parameters.R')

# Frame base images
source('stimulus_preparation/base_images.R')

# Session 2
source('stimulus_preparation/Session 2/Session2_compile_all_stimuli.R')

# Session 3
source('stimulus_preparation/Session 3/Session3_compile_all_stimuli.R')

# Session 4
source('stimulus_preparation/Session 4/Session4_compile_all_stimuli.R')

# Session 6
source('stimulus_preparation/Session 6/Session6_compile_all_stimuli.R')

Tidiness and parsimony in project directories

A directory tree is useful to display all the folders in a project. The tree can be produced in the RStudio ‘Terminal’ console using the following one-line command.

find . -type d | sed -e "s/[^-][^\/]*\//  |/g" -e "s/|\([^ ]\)/| - \1/"

The output will look like the following (excerpt from https://osf.io/gt5uf/wiki).

.
  | - bayesian_priors
  |  | - plots
  | - semanticpriming
  |  | - analysis_with_visualsimilarity
  |  |  | - model_diagnostics
  |  |  |  | - results
  |  |  |  | - plots
  |  |  | - results
  |  |  | - plots
  |  |  | - correlations
  |  |  |  | - plots
  |  | - frequentist_bayesian_plots
  |  |  | - plots
  |  | - frequentist_analysis
  |  |  | - model_diagnostics
  |  |  |  | - results
  |  |  |  | - plots
  |  |  | - lexical_covariates_selection
  |  |  |  | - results
  |  |  |  | - plots
  |  |  | - results
  |  |  | - plots

Conclusion

Adhering to best practices—including the FAIR principles—in the creation of research materials enhances transparency, accessibility and reproducibility in scientific research. These standards facilitate researchers’ work beyond the short term, and increase the reliability of scientific work, thus contributing to the best use of resources.

research materials experimental stimuli research methods programming R s