statistics | Pablo Bernabeu

How to visually assess the convergence of a mixed-effects model by plotting various optimizers

To assess whether convergence warnings render the results invalid, or on the contrary, the results can be deemed valid in spite of the warnings, Bates et al. (2023) suggest refitting models affected by convergence warnings with a variety of optimizers. The authors argue that, if the different optimizers produce practically-equivalent results, the results are valid. The allFit function from the ‘lme4’ package allows the refitting of models using a number of optimizers.

A new function to plot convergence diagnostics from lme4::allFit()

When a model has struggled to find enough information in the data to account for every predictor---especially for every random effect---, convergence warnings appear (Brauer & Curtin, 2018; Singmann & Kellen, 2019). In this article, I review the issue of convergence before presenting a new plotting function in R that facilitates the visualisation of the fixed effects fitted by different optimization algorithms (also dubbed optimizers).

Covariates are necessary to validate the variables of interest and to prevent bogus theories

The need for covariates—or nuisance variables—in statistical analyses is twofold. The first reason is purely statistical and the second reason is academic. First, the use of covariates is often necessary when the variable(s) of interest in a study may be connected to, and affected by, some satellite variables (Bottini et al., 2022; Elze et al., 2017; Sassenhagen & Alday, 2016). This complex scenario is the most common one due to the multivariate, dynamic, interactive nature of the real world.

Cannot open plots created with brms::mcmc_plot due to lack of `discrete_range` function

I would like to ask for advice regarding some plots that were created using brms::mcmc_plot(), and cannot be opened in R now. The plots were created last year using brms 2.17.0, and were saved in RDS objects. The problem I have is that I cannot open the plots in R now because I get an error related to a missing function. I would be very grateful if someone could please advise me if they can think of a possible reason or solution.

A table of results for Bayesian mixed-effects models: Grouping variables and specifying random slopes

Here I share the format applied to tables presenting the results of Bayesian models in Bernabeu (2022). The sample table presents a mixed-effects model that was fitted using the R package 'brms' (Bürkner et al., 2022).

A table of results for frequentist mixed-effects models: Grouping variables and specifying random slopes

Here I share the format applied to tables presenting the results of frequentist models in Bernabeu (2022). The sample table presents a mixed-effects model that was fitted using the R package 'lmerTest' (Kuznetsova et al., 2022).

Plotting two-way interactions from mixed-effects models using alias variables

Whereas the direction of main effects can be interpreted from the sign of the estimate, the interpretation of interaction effects often requires plots. This task is facilitated by the R package sjPlot (Lüdecke, 2022). In Bernabeu (2022), the sjPlot function called plot_model served as the basis for the creation of some custom functions. One of these functions is alias_interaction_plot, which allows the plotting of interactions between a continuous variable and a categorical variable.

Plotting two-way interactions from mixed-effects models using ten or six bins

Whereas the direction of main effects can be interpreted from the sign of the estimate, the interpretation of interaction effects often requires plots. This task is facilitated by the R package sjPlot (Lüdecke, 2022). In Bernabeu (2022), the sjPlot function called plot_model served as the basis for the creation of some custom functions. Two of these functions are deciles_interaction_plot and sextiles_interaction_plot. These functions allow the plotting of interactions between two continuous variables.

Why can't we be friends? Plotting frequentist (lmerTest) and Bayesian (brms) mixed-effects models

Frequentist and Bayesian statistics are sometimes regarded as fundamentally different philosophies. Indeed, can both qualify as philosophies or is one of them just a pointless ritual? Is frequentist statistics only about $p$ values? Are frequentist estimates diametrically opposed to Bayesian posterior distributions? Are confidence intervals and credible intervals irreconcilable? Will R crash if lmerTest and brms are simultaneously loaded?

Linguistic and embodied systems in conceptual processing: Variation across individuals and items

The first study (Bernabeu et al., 2021) will merge existing datasets (Lynott et al., 2020; Pexman et al., 2017; Pexman & Yap, 2018; Wingfield & Connell, 2019). The second study will collect novel data to investigate questions such as the unique roles of vocabulary size, sensorimotor experience and attentional control.

Brief Clarifications, Open Questions: Commentary on Liu et al. (2018)

Liu et al. (2018) present a study that implements the conceptual modality switch (CMS) paradigm, which has been used to investigate the modality-specific nature of conceptual representations (Pecher et al., 2003). Liu et al.‘s experiment uses event-related potentials (ERPs; similarly, see Bernabeu et al., 2017; Collins et al., 2011; Hald et al., 2011, 2013). In the design of the switch conditions, the experiment implements a corpus analysis to distinguish between purely-embodied modality switches and switches that are more liable to linguistic bootstrapping (also see Bernabeu et al.

Preregistration: The interplay between linguistic and embodied systems in conceptual processing

This preregistration outlines a study that will investigate the dynamic nature of conceptual processing by examining the interplay between linguistic distributional systems—comprising word co-occurrence and word association—and embodied systems—comprising sensorimotor and emotional information. A set of confirmatory research questions are addressed using data from the Calgary Semantic Decision project, along with additional measures for the stimuli corresponding to distributional language statistics, embodied information, and individual differences in vocabulary size.

Mixed-effects models in R, and a new tool for data simulation

In this talk, I will look over the rationale for LMEMs, and demonstrate how to fit them in R (Brauer & Curtin, 2018; Luke, 2017). Challenges will also be covered. For instance, when using the widely-accepted 'maximal' approach, based on fitting all possible random effects for each fixed effect, models sometimes fail to find a solution, or 'convergence'. Advice for the problem of nonconvergence will be demonstrated, based on the progressive lightening of the random effects structure (Singman & Kellen, 2017; for an alternative approach, especially with small samples, see Matuschek et al., 2017). At the end, on a different note, I will present a web application that facilitates data simulation for research and teaching (Bernabeu & Lynott, 2020).

Reproducibilidad en torno a una aplicación web

Las aplicaciones web nos ayudan a facilitar el uso de nuestro trabajo, ya que no requieren programación para utilizarlas. Crear estas aplicaciones en R, mediante paquetes como "shiny" o "flexdashboard", ofrece múltiples ventajas. Entre ellas destaca la reproducibilidad, tal como veremos en torno a una aplicación para la simulación de datos (https://github.com/pablobernabeu/Experimental-data-simulation).

Web application for the simulation of experimental data

This open-source, R-based web application is suitable for educational or research purposes in experimental sciences. It allows the **creation of varied data sets with specified structures, such as between-group or within-participant variables, that …

Data is present: Workshops and datathons

This project offers free activities to learn and practise reproducible data presentation. Pablo Bernabeu organises these events in the context of a Software Sustainability Institute Fellowship. Programming languages such as R and Python offer free, powerful resources for data processing, visualisation and analysis. Experience in these programs is highly valued in data-intensive disciplines. Original data has become a public good in many research fields thanks to cultural and technological advances. On the internet, we can find innumerable data sets from sources such as scientific journals and repositories (e.g., OSF), local and national governments, non-governmental organisations (e.g., data.world), etc. Activities comprise free workshops and datathons.

Event-related potentials: Why and how I used them

Event-related potentials (ERPs) offer a unique insight in the study of human cognition. Let's look at their reason-to-be for the purposes of research, and how they are defined and processed. Most of this content is based on my master's thesis, which I could fortunately conduct at the Max Planck Institute for Psycholinguistics (see thesis or conference paper). Electroencephalography The brain produces electrical activity all the time, which can be measured via electrodes on the scalp—a method known as electroencephalography (EEG).

Dutch modality exclusivity norms for 336 properties and 411 concepts

Part of the toolkit of language researchers is formed of stimuli that have been rated on various dimensions. The current study presents modality exclusivity norms for 336 properties and 411 concepts in Dutch. Forty-two respondents rated the auditory, …

Naive principal component analysis in R

Principal Component Analysis (PCA) is a technique used to find the core components that underlie different variables. It comes in very useful whenever doubts arise about the true origin of three or more variables. There are two main methods for performing a PCA: naive or less naive. In the naive method, you first check some conditions in your data which will determine the essentials of the analysis. In the less-naive method, you set those yourself based on whatever prior information or purposes you had. The 'naive' approach is characterized by a first stage that checks whether the PCA should actually be performed with your current variables, or if some should be removed. The variables that are accepted are taken to a second stage which identifies the number of principal components that seem to underlie your set of variables.

Web application: Dutch modality exclusivity norms

This app presents linguistic data over several tabs. The code combines the great front-end of Flexdashboard—based on R Markdown and yielding an unmatched user interface—, with the great back-end of Shiny—allowing users to download sections of data they select, in various formats. The hardest nuts to crack included modifying the rows/columns orientation without affecting the functionality of tables. A cool, recent finding was the reactable package. A nice feature, allowed by Flexdashboard, was the use of quite different formats in different tabs.

Modality switch effects emerge early and increase throughout conceptual processing: Evidence from ERPs

We tested whether conceptual processing is modality-specific by tracking the time course of the Conceptual Modality Switch effect. Forty-six participants verified the relation between property words and concept words. The conceptual modality of …

At Greg, 8 am

The single dependent variable, RT, was accompanied by other variables which could be analyzed as independent variables. These included Group, Trial Number, and a within-subjects Condition. What had to be done first off, in order to take the usual table? The trials!

Modality switch effects emerge early and increase throughout conceptual processing: Evidence from ERPs

Research has extensively investigated whether conceptual processing is modality-specific—that is, whether meaning is processed to a large extent on the basis of perceptual and motor affordances (Barsalou, 2016). This possibility challenges long-established theories. It suggests a strong link between physical experience and language which is not borne out of the paradigmatic arbitrariness of words (see Lockwood, Dingemanse, & Hagoort, 2016). Modality-specificity also clashes with models of language that have no link to sensory and motor systems (Barsalou, 2016).