R | Pablo Bernabeu

rscopus_plus: An extension of the rscopus package

Sometimes it’s useful to do a bibliometric analysis. To this end, the rscopus_plus functions (Bernabeu, 2024) extend the R package rscopus (Muschelli, 2022) to administer the search quota and to enable specific searches and comparisons. scopus_search_plus runs rscopus::scopus_search as many times as necessary based on the number of results and the search quota. scopus_search_DOIs gets DOIs from scopus_search_plus, which can then be imported into a reference manager, such as Zotero, to create a list of references.

FAQs on mixed-effects models

I am dealing with nested data, and I remember from an article by Clark (1973) that nested should be analysed using special models. I’ve looked into mixed-effects models, and I’ve reached a structure with random intercepts by subjects and by items. Is this fine? In early days, researchers would aggregate the data across these repeated measures to prevent the violation of the assumption of independence of observations, which is one of the most important assumptions in statistics.

FAIR standards for the creation of research materials, with examples

In the fast-paced world of scientific research, establishing minimum standards for the creation of research materials is essential. Whether it's stimuli, custom software for data collection, or scripts for statistical analysis, the quality and transparency of these materials significantly impact the reproducibility and credibility of research. This blog post explores the importance of adhering to FAIR (Findable, Accessible, Interoperable, Reusable) principles, and offers practical examples for researchers, with a focus on the cognitive sciences.

Preprocessing the Norwegian Web as Corpus (NoWaC) in R

The present script can be used to preprocess data from a frequency list of the Norwegian as Web Corpus (NoWaC; Guevara, 2010). Before using the script, the frequency list should be downloaded from this URL. The list is described as ‘frequency list sorted primary alphabetic and secondary by frequency within each character’, and this is the direct URL. The download requires signing in to an institutional network. Last, the downloaded file should be unzipped.

A new function to plot convergence diagnostics from lme4::allFit()

When a model has struggled to find enough information in the data to account for every predictor---especially for every random effect---, convergence warnings appear (Brauer & Curtin, 2018; Singmann & Kellen, 2019). In this article, I review the issue of convergence before presenting a new plotting function in R that facilitates the visualisation of the fixed effects fitted by different optimization algorithms (also dubbed optimizers).

Cannot open plots created with `brms::mcmc_plot` due to lack of `discrete_range` function

I would like to ask for advice regarding some plots that were created using brms::mcmc_plot(), and cannot be opened in R now. The plots were created last year using brms 2.17.0, and were saved in RDS objects. The problem I have is that I cannot open the plots in R now because I get an error related to a missing function. I would be very grateful if someone could please advise me if they can think of a possible reason or solution.

A table of results for Bayesian mixed-effects models: Grouping variables and specifying random slopes

Here I share the format applied to tables presenting the results of Bayesian models in Bernabeu (2022). The sample table presents a mixed-effects model that was fitted using the R package 'brms' (Bürkner et al., 2022).

A table of results for frequentist mixed-effects models: Grouping variables and specifying random slopes

Here I share the format applied to tables presenting the results of frequentist models in Bernabeu (2022). The sample table presents a mixed-effects model that was fitted using the R package 'lmerTest' (Kuznetsova et al., 2022).

Plotting two-way interactions from mixed-effects models using alias variables

Whereas the direction of main effects can be interpreted from the sign of the estimate, the interpretation of interaction effects often requires plots. This task is facilitated by the R package sjPlot (Lüdecke, 2022). In Bernabeu (2022), the sjPlot function called plot_model served as the basis for the creation of some custom functions. One of these functions is alias_interaction_plot, which allows the plotting of interactions between a continuous variable and a categorical variable.

Plotting two-way interactions from mixed-effects models using ten or six bins

Whereas the direction of main effects can be interpreted from the sign of the estimate, the interpretation of interaction effects often requires plots. This task is facilitated by the R package sjPlot (Lüdecke, 2022). In Bernabeu (2022), the sjPlot function called plot_model served as the basis for the creation of some custom functions. Two of these functions are deciles_interaction_plot and sextiles_interaction_plot. These functions allow the plotting of interactions between two continuous variables.

Bayesian workflow: Prior determination, predictive checks and sensitivity analyses

This post presents a run-through of a Bayesian workflow in R. The content is *closely* based on Bernabeu (2022), which was in turn based on lots of other references, also cited here.

Language and vision in conceptual processing: Multilevel analysis and statistical power

Research has suggested that conceptual processing depends on both language-based and vision-based information. We tested this interplay at three levels of the experimental structure: individuals, words and tasks. To this end, we drew on three …

Language and sensorimotor simulation in conceptual processing: Multilevel analysis and statistical power

Research has suggested that conceptual processing depends on both language-based and sensorimotor information. In this thesis, I investigate the nature of these systems and their interplay at three levels of the experimental structure---namely, …

Walking the line between reproducibility and efficiency in R Markdown: Three methods

As technology and research methods advance, the data sets tend to be larger and the methods more exhaustive. Consequently, the analyses take longer to run. This poses a challenge when the results are to be presented using R Markdown. One has to balance reproducibility and efficiency. On the one hand, it is desirable to keep the R Markdown document as self-contained as possible, so that those who may later examine the document can easily test and edit the code.

Tackling knitting errors in R Markdown

When knitting an R Markdown document after the first time, errors may sometimes appear. Three tips are recommended below. 1. Close PDF reader window When the document is knitted through the ‘Knit’ button, a PDF reader window opens to present the result. Closing this window can help resolve errors. 2. Delete service files Every time the Rmd is knitted, some service files are created. Some of these files have the ‘.

Preregistration: The interplay between linguistic and embodied systems in conceptual processing

This preregistration outlines a study that will investigate the dynamic nature of conceptual processing by examining the interplay between linguistic distributional systems—comprising word co-occurrence and word association—and embodied systems—comprising sensorimotor and emotional information. A set of confirmatory research questions are addressed using data from the Calgary Semantic Decision project, along with additional measures for the stimuli corresponding to distributional language statistics, embodied information, and individual differences in vocabulary size.

WebVTT caption transcription app

This open-source, R-based web application allows the conversion of video captions (subtitles) from the Web Video Text Tracks (WebVTT) Format into plain texts. For this purpose, users upload a WebVTT file with the extension of 'vtt' or 'txt'. …

Mixed-effects models in R and a new tool for data simulation

In this talk, I will look over the rationale for LMEMs, and demonstrate how to fit them in R (Brauer & Curtin, 2018; Luke, 2017). Challenges will also be covered. For instance, when using the widely-accepted 'maximal' approach, based on fitting all possible random effects for each fixed effect, models sometimes fail to find a solution, or 'convergence'. Advice for the problem of nonconvergence will be demonstrated, based on the progressive lightening of the random effects structure (Singman & Kellen, 2017; for an alternative approach, especially with small samples, see Matuschek et al., 2017). At the end, on a different note, I will present a web application that facilitates data simulation for research and teaching (Bernabeu & Lynott, 2020).

Reproducibilidad en torno a una aplicación web

Las aplicaciones web nos ayudan a facilitar el uso de nuestro trabajo, ya que no requieren programación para utilizarlas. Crear estas aplicaciones en R, mediante paquetes como "shiny" o "flexdashboard", ofrece múltiples ventajas. Entre ellas destaca la reproducibilidad, tal como veremos en torno a una aplicación para la simulación de datos (https://github.com/pablobernabeu/Experimental-data-simulation).

Collaboration while using R Markdown

In a highly recommendable presentation available on Youtube, Michael Frank walks us through R Markdown. Below, I loosely summarise and partly elaborate on Frank's advice regarding collaboration among colleagues, some of whom may not be used to R Markdown (see relevant time point in Frank's presentation). The first way is using GitHub, which has a great version control system, and even allows the rendering of Markdown text, if the file is given the extension ‘.

R Markdown amidst Madison parks

This document is part of teaching materials created for the workshop 'Open data and reproducibility v2.1: R Markdown, dashboards and Binder', delivered at the CarpentryCon 2020 conference. The purpose of this specific document is to practise R Markdown, including basic features such as Markdown markup and code chunks, along with more special features such as cross-references for figures, tables, code chunks, etc. Since this conference was originally going to take place in Madison, let's look at some open data from the City of Madison.

Web application for the simulation of experimental data

This open-source, R-based web application is suitable for educational or research purposes in experimental sciences. It allows the **creation of varied data sets with specified structures, such as between-group or within-participant variables, that …

Data dashboard: Butterfly species richness in Los Angeles

Dashboard with open data from a study by Prudic et al. (2018), that compares citizen science with traditional methods in butterfly sampling. Coding tasks included long-transforming, merging, and as ever, wrangling with a table.

Data is present: Workshops and datathons

This project offers free activities to learn and practise reproducible data presentation. Pablo Bernabeu organises these events in the context of a Software Sustainability Institute Fellowship. Programming languages such as R and Python offer free, powerful resources for data processing, visualisation and analysis. Experience in these programs is highly valued in data-intensive disciplines. Original data has become a public good in many research fields thanks to cultural and technological advances. On the internet, we can find innumerable data sets from sources such as scientific journals and repositories (e.g., OSF), local and national governments, non-governmental organisations (e.g., data.world), etc. Activities comprise free workshops and datathons.

Event-related potentials: Why and how I used them

Event-related potentials (ERPs) offer a unique insight in the study of human cognition. Let's look at their reason-to-be for the purposes of research, and how they are defined and processed. Most of this content is based on my master's thesis, which I could fortunately conduct at the Max Planck Institute for Psycholinguistics (see thesis or conference paper). Electroencephalography The brain produces electrical activity all the time, which can be measured via electrodes on the scalp—a method known as electroencephalography (EEG).

Dutch modality exclusivity norms

This app presents linguistic data over several tabs. The code combines the great front-end of Flexdashboard—based on R Markdown and yielding an unmatched user interface—, with the great back-end of Shiny—allowing users to download sections of data they select, in various formats. The hardest nuts to crack included modifying the rows/columns orientation without affecting the functionality of tables. A cool, recent finding was the reactable package. A nice feature, allowed by Flexdashboard, was the use of quite different formats in different tabs.

Naive principal component analysis in R

Principal Component Analysis (PCA) is a technique used to find the core components that underlie different variables. It comes in very useful whenever doubts arise about the true origin of three or more variables. There are two main methods for performing a PCA: naive or less naive. In the naive method, you first check some conditions in your data which will determine the essentials of the analysis. In the less-naive method, you set those yourself based on whatever prior information or purposes you had. The 'naive' approach is characterized by a first stage that checks whether the PCA should actually be performed with your current variables, or if some should be removed. The variables that are accepted are taken to a second stage which identifies the number of principal components that seem to underlie your set of variables.

At Greg, 8 am

The single dependent variable, RT, was accompanied by other variables which could be analyzed as independent variables. These included Group, Trial Number, and a within-subjects Condition. What had to be done first off, in order to take the usual table? The trials!

Modality switch effects emerge early and increase throughout conceptual processing

We tested whether conceptual processing is modality-specific by tracking the time course of the Conceptual Modality Switch effect. Forty-six participants verified the relation between property words and concept words. The conceptual modality of consecutive trials was manipulated in order to produce an Auditory-to-visual switch condition, a Haptic-to-visual switch condition, and a Visual-to-visual, no-switch condition. Event-Related Potentials (ERPs) were time-locked to the onset of the first word (property) in the target trials so as to measure the effect online and to avoid a within-trial confound. A switch effect was found, characterized by more negative ERP amplitudes for modality switches than no-switches. It proved significant in four typical time windows from 160 to 750 milliseconds post word onset, with greater strength in the Slow group, in posterior brain regions, and in the N400 window. The earliest switch effect was located in the language brain region, whereas later it was more prominent in the visual region. In the N400 and Late Positive windows, the Quick group presented the effect especially in the language region, whereas the Slow had it rather in the visual region. These results suggest that contextual factors such as time resources modulate the engagement of linguistic and embodied systems in conceptual processing.

The case for data dashboards: First steps in R Shiny

Dashboards for data visualisation, such as R Shiny and Tableau, allow the interactive exploration of data by means of drop-down lists and checkboxes, with no coding required from the final users. These web applications run on internet browsers, allowing for three viewing modes, catered to both analysts and the public at large: (1) private viewing (useful during analysis), (2) selective sharing (used within work groups), and (3) internet publication. Among the available platforms, R Shiny and Tableau stand out due to being relatively accessible to new users. Apps serve a broad variety of purposes. In science and beyond, these apps allow us to go the extra mile in sharing data. Alongside files and code shared in repositories, we can present the data in a website, in the form of plots or tables. This facilitates the public exploration of each section of the data (groups, participants, trials...) to anyone interested, and allows researchers to account for their proceeding in the analysis.

The Louisiana-Minnesota-Texas crisis across media and time: A big data exercise

Racism has long been ingrained in human societies. Ancient Greek Aristotle already claimed that non-Greeks were slaves by nature, as they easily submitted to despotic government (Reilly, Kaufman, & Bodino, 2002). This study focuses on racism in the United States, which extends from the foundation of the country, when black people were generally born into slavery, and were at any rate regarded as an inferior people. US racism stands out globally for two reasons. First, the country has played a hegemonic part in the World since soon after its foundation. Second, the US is regarded as the most advanced society technology-wise, as it sets the minutes for the technology sector worldwide. In spite of these advantages, the country has long suffered the plague of widespread racism. Indeed, the abolition of slavery in the mid-nineteenth century did not grant equal citizen rights to the black population. Over time, the black population started to confront this situation. Especially the mid-nineteenth century saw large uprisings and a patent division of different societal sectors, as reflected in literary works such as Ellison’s 'Invisible Man' (1952). Inequality and confrontation about racism has extended to date, and the costs thereof have been large in terms of lives and otherwise (Feagin, 2004).