research methods | Pablo Bernabeu

Making research materials Findable, Accessible, Interoperable and Reusable

To err is human, but when it comes to creating research materials, mistakes can be reduced by sharing more of our work and by using some helpful tools. For instance, we can make our research materials FAIRer—that is, more Findable, Accessible, …

R functions for checking and fixing vmrk files from BrainVision

Electroencephalography (EEG) has become a cornerstone for understanding the intricate workings of the human brain in the field of neuroscience. However, EEG software and hardware come with their own set of constraints, particularly in the management of markers, also known as triggers. This article aims to shed light on these limitations and future prospects of marker management in EEG studies, while also introducing R functions that can help deal with vmrk files from BrainVision.

rscopus_plus: An extension of the rscopus package

Sometimes it’s useful to do a bibliometric analysis. To this end, the rscopus_plus functions (Bernabeu, 2024) extend the R package rscopus (Muschelli, 2022) to administer the search quota and to enable specific searches and comparisons. scopus_search_plus runs rscopus::scopus_search as many times as necessary based on the number of results and the search quota. scopus_search_DOIs gets DOIs from scopus_search_plus, which can then be imported into a reference manager, such as Zotero, to create a list of references.

A session logbook for a longitudinal study using conditional formatting in Excel

Longitudinal studies consist of several sessions, and often involve session session conductors. To facilitate the planning, registration and tracking of sessions, a session logbook becomes even more necessary than usual. To this end, an Excel workbook with conditional formatting can help automatise some formats and visualise the progress. Below is an example that is available on OneDrive. To fully access this workbook, it may be downloaded via File > Save as > Download a copy.

Motivating a preregistration (especially in experimental linguistics)

The best argument to motivate a preregistration may be that it doesn’t take any extra time. It just requires frontloading an important portion of the work. As a reward, the paper will receive greater trust from the reviewers and the readers at large. Preregistration is not perfect, but is a lesser evil that reduces the misuse of statistical analysis in science.

Learning how to use Zotero

Is it worth learning how to use a reference management system such as Zotero? Maybe. The hours you invest in learning how to use Zotero (approx. 10 hours) are likely to pay off, as they will save you a lot of time that you would otherwise spend formatting, revising and correcting references. In addition, this skill would become part of your skill set. A great guide Free, online webinars in which you could participate and ask questions

FAQs on mixed-effects models

I am dealing with nested data, and I remember from an article by Clark (1973) that nested should be analysed using special models. I’ve looked into mixed-effects models, and I’ve reached a structure with random intercepts by subjects and by items. Is this fine? In early days, researchers would aggregate the data across these repeated measures to prevent the violation of the assumption of independence of observations, which is one of the most important assumptions in statistics.

FAIR standards for the creation of research materials, with examples

In the fast-paced world of scientific research, establishing minimum standards for the creation of research materials is essential. Whether it's stimuli, custom software for data collection, or scripts for statistical analysis, the quality and transparency of these materials significantly impact the reproducibility and credibility of research. This blog post explores the importance of adhering to FAIR (Findable, Accessible, Interoperable, Reusable) principles, and offers practical examples for researchers, with a focus on the cognitive sciences.

Preprocessing the Norwegian Web as Corpus (NoWaC) in R

The present script can be used to preprocess data from a frequency list of the Norwegian as Web Corpus (NoWaC; Guevara, 2010). Before using the script, the frequency list should be downloaded from this URL. The list is described as ‘frequency list sorted primary alphabetic and secondary by frequency within each character’, and this is the direct URL. The download requires signing in to an institutional network. Last, the downloaded file should be unzipped.

ggplotting power curves from the simr package

The R package ‘simr’ has greatly facilitated power analysis for mixed-effects models using Monte Carlo simulation (which involves running hundreds or thousands of tests under slight variations of the data). The powerCurve function is used to estimate the statistical power for various sample sizes in one go. Since the tests are run serially, they can take a VERY long time; approximately, the time it takes to run the model supplied once (say, a few hours) times the number of simulations (nsim, which should be higher than 200), and times the number of different sample sizes examined.

How to visually assess the convergence of a mixed-effects model by plotting various optimizers

To assess whether convergence warnings render the results invalid, or on the contrary, the results can be deemed valid in spite of the warnings, Bates et al. (2023) suggest refitting models affected by convergence warnings with a variety of optimizers. The authors argue that, if the different optimizers produce practically-equivalent results, the results are valid. The allFit function from the ‘lme4’ package allows the refitting of models using a number of optimizers.

Assigning participant-specific parameters automatically in OpenSesame

OpenSesame offers options to counterbalance properties of the stimulus across participants. However, in cases of more involved assignments of session parameters across participants, it becomes necessary to write a bit of Python code in an inline script, which should be placed at the top of the timeline. In such a script, the participant-specific parameters are loaded in from a csv file. Below is a minimal example of the csv file.

Specifying version number in OSF download links

In the preparation of projects, files are often downloaded from OSF. It is good to document the URL addresses that were used for the downloads. These URLs can be provided in a code script (see example) or in a README file. Better yet, it’s possible to specify the version of each file in the URL. This specification helps reduce the possibility of inaccuracies later, should any files be modified afterwards.

Covariates are necessary to validate the variables of interest and to prevent bogus theories

The need for covariates—or nuisance variables—in statistical analyses is twofold. The first reason is purely statistical and the second reason is academic. First, the use of covariates is often necessary when the variable(s) of interest in a study may be connected to, and affected by, some satellite variables (Bottini et al., 2022; Elze et al., 2017; Sassenhagen & Alday, 2016). This complex scenario is the most common one due to the multivariate, dynamic, interactive nature of the real world.

Brief Clarifications, Open Questions: Commentary on Liu et al. (2018)

Liu et al. (2018) present a study that implements the conceptual modality switch (CMS) paradigm, which has been used to investigate the modality-specific nature of conceptual representations (Pecher et al., 2003). Liu et al.‘s experiment uses event-related potentials (ERPs; similarly, see Bernabeu et al., 2017; Collins et al., 2011; Hald et al., 2011, 2013). In the design of the switch conditions, the experiment implements a corpus analysis to distinguish between purely-embodied modality switches and switches that are more liable to linguistic bootstrapping (also see Bernabeu et al.

Web application for the simulation of experimental data

This open-source, R-based web application is suitable for educational or research purposes in experimental sciences. It allows the **creation of varied data sets with specified structures, such as between-group or within-participant variables, that …

Modality exclusivity norms for 747 properties and concepts in Dutch: A replication of English

This study is a cross-linguistic, conceptual replication of Lynott and Connell’s (2009, 2013) modality exclusivity norms. The properties and concepts tested therein were translated into Dutch, and independently rated and analyzed (Bernabeu, 2018).