s

Job: Part-time research assistant in experimental research

We are seeking to appoint a part-time research assistant to help us recruit participants and conduct an experiment. In the current project, led by Jorge González Alonso and funded by the Research Council of Norway, we investigate language learning and the neurophysiological basis of multilingualism. To this end, we are conducting an electroencephalography (EEG) experiment. Your work as a research assistant will be mentored and supervised primarily by Pablo Bernabeu, and secondarily by the head of our project and the directors of our lab.

rscopus_plus: An extension of the rscopus package

Sometimes it’s useful to do a bibliometric analysis. To this end, the rscopus_plus functions (Bernabeu, 2024) extend the R package rscopus (Muschelli, 2022) to administer the search quota and to enable specific searches and comparisons. scopus_search_plus runs rscopus::scopus_search as many times as necessary based on the number of results and the search quota. scopus_search_DOIs gets DOIs from scopus_search_plus, which can then be imported into a reference manager, such as Zotero, to create a list of references.

How to end trial after timeout in jsPsych

I would like to ask for advice regarding a custom plugin for a serial reaction time task, that was created by @vekteo, and is available in Gorilla, where the code can be edited and tested. By default, trials are self-paced, but I would need them to time out after 2,000 ms. I am struggling to achieve this, and would be very grateful if someone could please advise me a bit.

A session logbook for a longitudinal study using conditional formatting in Excel

Longitudinal studies consist of several sessions, and often involve session session conductors. To facilitate the planning, registration and tracking of sessions, a session logbook becomes even more necessary than usual. To this end, an Excel workbook with conditional formatting can help automatise some formats and visualise the progress. Below is an example that is available on OneDrive. To fully access this workbook, it may be downloaded via File > Save as > Download a copy.

Motivating a preregistration (especially in experimental linguistics)

The best argument to motivate a preregistration may be that it doesn’t take any extra time. It just requires frontloading an important portion of the work. As a reward, the paper will receive greater trust from the reviewers and the readers at large. Preregistration is not perfect, but is a lesser evil that reduces the misuse of statistical analysis in science.

Do you speak a Scandinavian language(s) and English, but no other languages? Delta i et EEG-eksperiment

Ved å delta i vårt eksperiment og gjøre noen enkle oppgaver på en datamaskin, kan du bidra til forskning og tjene 250 kr i timen (gavekort). EEG er helt smertefritt. Eksperimentet foregår i Tromsø, ved UiT Norges Arktiske Universitet. Vi ser etter deltakere med følgende egenskaper: ☑ Alder 18–45 år; ☑ Snakker norsk som førstespråk og engelsk flytende. Utenom disse språkene, kan deltakerne også snakke svensk og dansk, men ikke andre språk (utover noen få ord);

Learning how to use Zotero

Is it worth learning how to use a reference management system such as Zotero? Maybe. The hours you invest in learning how to use Zotero (approx. 10 hours) are likely to pay off, as they will save you a lot of time that you would otherwise spend formatting, revising and correcting references. In addition, this skill would become part of your skill set. A great guide Free, online webinars in which you could participate and ask questions

FAQs on mixed-effects models

I am dealing with nested data, and I remember from an article by Clark (1973) that nested should be analysed using special models. I’ve looked into mixed-effects models, and I’ve reached a structure with random intercepts by subjects and by items. Is this fine? In early days, researchers would aggregate the data across these repeated measures to prevent the violation of the assumption of independence of observations, which is one of the most important assumptions in statistics.

FAIR standards for the creation of research materials, with examples

In the fast-paced world of scientific research, establishing minimum standards for the creation of research materials is essential. Whether it's stimuli, custom software for data collection, or scripts for statistical analysis, the quality and transparency of these materials significantly impact the reproducibility and credibility of research. This blog post explores the importance of adhering to FAIR (Findable, Accessible, Interoperable, Reusable) principles, and offers practical examples for researchers, with a focus on the cognitive sciences.

Two-second delay after logger in OpenSesame

The result shows a varying delay of around 2 seconds on average. It would be very helpful for us if we could cut down this delay, as it adds up. To try to achieve this, I reduced the number of variables logged, from the default 363 to 34 important variables. Unfortunately, this change did not result in a reduction of the delay.

Preprocessing the Norwegian Web as Corpus (NoWaC) in R

The present script can be used to pre-process data from a frequency list of the Norwegian as Web Corpus (NoWaC; Guevara, 2010). Before using the script, the frequency list should be downloaded from this URL. The list is described as ‘frequency list sorted primary alphabetic and secondary by frequency within each character’, and this is the direct URL. The download requires signing in to an institutional network. Last, the downloaded file should be unzipped.

An inline script for OpenSesame to send EEG triggers via serial port

The OpenSesame user base is skyrocketing but—of course—remains small in comparison to many other user bases that we are used to. Therefore, when developing an experiment in OpenSesame, there are still many opportunities to break the mould. When you need to do something beyond the standard operating procedure, it may take longer to find suitable resources than it takes when a more widespread tool is used. So, why would you still want to use OpenSesame?

How to correctly encode triggers in Python and send them to BrainVision through serial port (useful for OpenSesame and PsychoPy)

I'm sending the triggers in a binary format because Python requires this. For instance, to send the trigger 1, I run the code serialport.write(b'1'). I have succeeded in sending triggers in this way. However, I encounter two problems. First, the triggers are converted in a way I cannot entirely decipher. For instance, when I run the code serialport.write(b'1'), the trigger displayed in BrainVision Recorder is S 49, not S 1 as I would hope (please see Appendix below). Second, I cannot send two triggers with the same code one after the other. For instance, if I run serialport.write(b'1'), a trigger appears in BrainVision Recorder, but if I run the same afterwards (no matter how many times), no trigger appears. I tried to solve these problems by opening the parallel port in addition to the serial port, but the problems persist.

ggplotting power curves from the 'simr' package

The R package ‘simr’ has greatly facilitated power analysis for mixed-effects models using Monte Carlo simulation (which involves running hundreds or thousands of tests under slight variations of the data). The powerCurve function is used to estimate the statistical power for various sample sizes in one go. Since the tests are run serially, they can take a VERY long time; approximately, the time it takes to run the model supplied once (say, a few hours) times the number of simulations (nsim, which should be higher than 200), and times the number of different sample sizes examined.

How to break down colour variable in sjPlot::plot_model into equally-sized bins

Whereas the direction of main effects can be interpreted from the sign of the estimate, the interpretation of interaction effects often requires plots. This task is facilitated by the R package sjPlot. For instance, using the plot_model function, I plotted the interaction between two continuous variables. library(lme4) #> Loading required package: Matrix library(sjPlot) #> Learn more about sjPlot with 'browseVignettes("sjPlot")'. library(ggplot2) theme_set(theme_sjplot()) # Create data partially based on code by Ben Bolker # from https://stackoverflow.

How to map more informative values onto fill argument of sjPlot::plot_model

Whereas the direction of main effects can be interpreted from the sign of the estimate, the interpretation of interaction effects often requires plots. This task is facilitated by the R package sjPlot. For instance, using the plot_model function, I plotted the interaction between a continuous variable and a categorical variable. The categorical variable was passed to the fill argument of plot_model. library(lme4) #> Loading required package: Matrix library(sjPlot) #> Install package "strengejacke" from GitHub (`devtools::install_github("strengejacke/strengejacke")`) to load all sj-packages at once!

How to visually assess the convergence of a mixed-effects model by plotting various optimizers

To assess whether convergence warnings render the results invalid, or on the contrary, the results can be deemed valid in spite of the warnings, Bates et al. (2023) suggest refitting models affected by convergence warnings with a variety of optimizers. The authors argue that, if the different optimizers produce practically-equivalent results, the results are valid. The allFit function from the ‘lme4’ package allows the refitting of models using a number of optimizers.

Intermixing stimuli from two loops randomly in OpenSesame

I’m developing a slightly tricky design in OpenSesame (a Python-based experiment builder). My stimuli comprise two kinds of sentences that contain different elements, and different numbers of elements. These sentences must be presented word by word. Furthermore, I need to attach triggers to some words in the first kind of sentences but not in the second kind. Last, these kinds of sentences must be intermixed within a block (or a sequence) of trials, because the first kind are targets and the second kind are fillers.

Simultaneously sampling from two variables in jsPsych

I am using jsPsych to create an experiment and I am struggling to sample from two variables simultaneously. Specifically, in each trial, I would like to present a primeWord and a targetWord by randomly sampling each of them from its own variable. I have looked into several resources—such as sampling without replacement, custom sampling and position indices—but to no avail. I’m a beginner at this, so it’s possible that one of these resources was relevant (especially the last one, I think).

Table joins with conditional "fuzzy" string matching in R

Here’s an example of fuzzy-matching strings in R that I shared on StackOverflow. In stringdist_join, the max_dist argument is used to constrain the degree of fuzziness. library(fuzzyjoin) library(dplyr) #> #> Attaching package: 'dplyr' #> The following objects are masked from 'package:stats': #> #> filter, lag #> The following objects are masked from 'package:base': #> #> intersect, setdiff, setequal, union library(knitr) small_tab = data.frame(Food.Name = c('Corn', 'Squash', 'Peppers'), Food.Code = c(NA, NA, NA)) large_tab = data.

A new function to plot convergence diagnostics from lme4::allFit()

When a model has struggled to find enough information in the data to account for every predictor---especially for every random effect---, convergence warnings appear (Brauer & Curtin, 2018; Singmann & Kellen, 2019). In this article, I review the issue of convergence before presenting a new plotting function in R that facilitates the visualisation of the fixed effects fitted by different optimization algorithms (also dubbed optimizers).

Assigning participant-specific parameters automatically in OpenSesame

OpenSesame offers options to counterbalance properties of the stimulus across participants. However, in cases of more involved assignments of session parameters across participants, it becomes necessary to write a bit of Python code in an inline script, which should be placed at the top of the timeline. In such a script, the participant-specific parameters are loaded in from a csv file. Below is a minimal example of the csv file.

Pronominal object clitics in preverbal position are a hard nut to crack for Google Translate

Some Romance languages allow the movement of pronominal object clitics to the preverbal position (Hanson & Carlson, 2014; Labotka et al., 2023). That is, instead of saying La maestra lo ha detto (Italian) ‘The teacher has said it’, it is possible to say Lo ha detto la maestra ‘It has said the teacher’. The latter is a marked phrasing that increases the attention to the subject of the sentence. Furthermore, when the clitic is in preverbal position, the degree of focus on the subject is also dependent on the context.

Specifying version number in OSF download links

In the preparation of projects, files are often downloaded from OSF. It is good to document the URL addresses that were used for the downloads. These URLs can be provided in a code script (see example) or in a README file. Better yet, it’s possible to specify the version of each file in the URL. This specification helps reduce the possibility of inaccuracies later, should any files be modified afterwards.

Covariates are necessary to validate the variables of interest and to prevent bogus theories

The need for covariates—or nuisance variables—in statistical analyses is twofold. The first reason is purely statistical and the second reason is academic. First, the use of covariates is often necessary when the variable(s) of interest in a study may be connected to, and affected by, some satellite variables (Bottini et al., 2022; Elze et al., 2017; Sassenhagen & Alday, 2016). This complex scenario is the most common one due to the multivariate, dynamic, interactive nature of the real world.

Cannot open plots created with brms::mcmc_plot due to lack of `discrete_range` function

I would like to ask for advice regarding some plots that were created using brms::mcmc_plot(), and cannot be opened in R now. The plots were created last year using brms 2.17.0, and were saved in RDS objects. The problem I have is that I cannot open the plots in R now because I get an error related to a missing function. I would be very grateful if someone could please advise me if they can think of a possible reason or solution.

A table of results for Bayesian mixed-effects models: Grouping variables and specifying random slopes

Here I share the format applied to tables presenting the results of Bayesian models in Bernabeu (2022). The sample table presents a mixed-effects model that was fitted using the R package 'brms' (Bürkner et al., 2022).

A table of results for frequentist mixed-effects models: Grouping variables and specifying random slopes

Here I share the format applied to tables presenting the results of frequentist models in Bernabeu (2022). The sample table presents a mixed-effects model that was fitted using the R package 'lmerTest' (Kuznetsova et al., 2022).

Plotting two-way interactions from mixed-effects models using alias variables

Whereas the direction of main effects can be interpreted from the sign of the estimate, the interpretation of interaction effects often requires plots. This task is facilitated by the R package sjPlot (Lüdecke, 2022). In Bernabeu (2022), the sjPlot function called plot_model served as the basis for the creation of some custom functions. One of these functions is alias_interaction_plot, which allows the plotting of interactions between a continuous variable and a categorical variable.

Plotting two-way interactions from mixed-effects models using ten or six bins

Whereas the direction of main effects can be interpreted from the sign of the estimate, the interpretation of interaction effects often requires plots. This task is facilitated by the R package sjPlot (Lüdecke, 2022). In Bernabeu (2022), the sjPlot function called plot_model served as the basis for the creation of some custom functions. Two of these functions are deciles_interaction_plot and sextiles_interaction_plot. These functions allow the plotting of interactions between two continuous variables.

Why can't we be friends? Plotting frequentist (lmerTest) and Bayesian (brms) mixed-effects models

Frequentist and Bayesian statistics are sometimes regarded as fundamentally different philosophies. Indeed, can both qualify as philosophies or is one of them just a pointless ritual? Is frequentist statistics only about $p$ values? Are frequentist estimates diametrically opposed to Bayesian posterior distributions? Are confidence intervals and credible intervals irreconcilable? Will R crash if lmerTest and brms are simultaneously loaded?

Bayesian workflow: Prior determination, predictive checks and sensitivity analyses

This post presents a run-through of a Bayesian workflow in R. The content is *closely* based on Bernabeu (2022), which was in turn based on lots of other references, also cited here.

Avoiding (R) Markdown knitting errors using knit_deleting_service_files()

The function knit_deleting_service_files() helps avoid (R) Markdown knitting errors caused by files and folders remaining from previous knittings (e.g., manuscript.tex, ZHJhZnQtYXBhLlJtZA==.Rmd, manuscript.synctex.gz). The only obligatory argument for this function is the name of a .Rmd or .md file. The optional argument is a path to a directory containing this file. The function first offers szeleting potential service files and folders in the directory. A confirmation is required in the console (see screenshot below). Next, the document is knitted. Last, the function offers deleting potential service files and folders again.

Walking the line between reproducibility and efficiency in R Markdown: Three methods

As technology and research methods advance, the data sets tend to be larger and the methods more exhaustive. Consequently, the analyses take longer to run. This poses a challenge when the results are to be presented using R Markdown. One has to balance reproducibility and efficiency. On the one hand, it is desirable to keep the R Markdown document as self-contained as possible, so that those who may later examine the document can easily test and edit the code.

Tackling knitting errors in R Markdown

When knitting an R Markdown document after the first time, errors may sometimes appear. Three tips are recommended below. 1. Close PDF reader window When the document is knitted through the ‘Knit’ button, a PDF reader window opens to present the result. Closing this window can help resolve errors. 2. Delete service files Every time the Rmd is knitted, some service files are created. Some of these files have the ‘.

Parallelizing simr::powercurve() in R

The powercurve function from the R package ‘simr’ (Green & MacLeod, 2016) can incur very long running times when the method used for the calculation of p values is Kenward-Roger or Satterthwaite (see Luke, 2017). Here I suggest three ways for cutting down this time. Where possible, use a high-performance (or high-end) computing cluster. This removes the need to use personal computers for these long jobs. In case you’re using the fixed() parameter of the powercurve function, and calculating the power for different effects, run these at the same time (‘in parallel’) on different machines, rather than one after another.

Brief Clarifications, Open Questions: Commentary on Liu et al. (2018)

Liu et al. (2018) present a study that implements the conceptual modality switch (CMS) paradigm, which has been used to investigate the modality-specific nature of conceptual representations (Pecher et al., 2003). Liu et al.‘s experiment uses event-related potentials (ERPs; similarly, see Bernabeu et al., 2017; Collins et al., 2011; Hald et al., 2011, 2013). In the design of the switch conditions, the experiment implements a corpus analysis to distinguish between purely-embodied modality switches and switches that are more liable to linguistic bootstrapping (also see Bernabeu et al.

Collaboration while using R Markdown

In a highly recommendable presentation available on Youtube, Michael Frank walks us through R Markdown. Below, I loosely summarise and partly elaborate on Frank's advice regarding collaboration among colleagues, some of whom may not be used to R Markdown (see relevant time point in Frank's presentation). The first way is using GitHub, which has a great version control system, and even allows the rendering of Markdown text, if the file is given the extension ‘.

Notes about punctuation in formal writing

When writing formal pieces, some pitfalls in the punctuation are easy to avoid once you know them. Punctuation marks such as the comma, the semi-colon, the colon and the period are useful for organising phrases and clauses, facilitating the reading, and disambiguating. However, these marks are also liable to underuse, as in the case of run-on sentences; misuse, as in the comma splice; and overuse, as it often happens with the Oxford comma.

Stray meetings in Microsoft Teams

Unwanted, stranded meetings, overlapping with a general one in a channel, can occur when people click on the Meet (now)/📷 button, instead of clicking on the same Join button in the chat field. This may especially happen to those who reach the channel first, or who cannot see the Join button in the chat field because this field has been taken up by messages.

R Markdown amidst Madison parks

This document is part of teaching materials created for the workshop 'Open data and reproducibility v2.1: R Markdown, dashboards and Binder', delivered at the CarpentryCon 2020 conference. The purpose of this specific document is to practise R Markdown, including basic features such as Markdown markup and code chunks, along with more special features such as cross-references for figures, tables, code chunks, etc. Since this conference was originally going to take place in Madison, let's look at some open data from the City of Madison.

What's in a fluke? The problem of trust and distrust

The label 'fluke' may in principle be skewed by the eye of the beholder, the mind of the perceiver and the availability or lack of data.

How to engage Research Group Leaders in sustainable software practices

There is an increasing number of training courses introducing early career researchers to sustainable software practices but relatively little aimed at Research Group Leaders and Principal Investigators. Expecting group leaders to personally acquire such skills through training such as a two-day Carpentries workshop is unrealistic, as these require a significant time investment and are less directly applicable in the role of research director. In addition, many group leaders would not consider their group as outputting software, or are less aware of the full range of benefits that sustainable practice brings and will thus be less likely to signpost such training to their team members. Even where they do identify benefits, they may have concerns about releasing group software or may feel overwhelmed by the potential scale of the task, especially with respect to legacy projects.

Incentives for good research software practices

Software is increasingly becoming recognised as fundamental to research. In a 2014 survey of UK researchers undertaken by the Institute, 7 out of 10 researchers supported the view that it would be impossible to conduct research without software. As software continues to underpin more research activities, we must engage a variety of stakeholders to incentivise the uptake of best practice in software development to ensure the quality of research software keeps pace with the research it supports.

Data is present: Workshops and datathons

This project offers free activities to learn and practise reproducible data presentation. Pablo Bernabeu organises these events in the context of a Software Sustainability Institute Fellowship. Programming languages such as R and Python offer free, powerful resources for data processing, visualisation and analysis. Experience in these programs is highly valued in data-intensive disciplines. Original data has become a public good in many research fields thanks to cultural and technological advances. On the internet, we can find innumerable data sets from sources such as scientific journals and repositories (e.g., OSF), local and national governments, non-governmental organisations (e.g., data.world), etc. Activities comprise free workshops and datathons.

Event-related potentials: Why and how I used them

Event-related potentials (ERPs) offer a unique insight in the study of human cognition. Let's look at their reason-to-be for the purposes of research, and how they are defined and processed. Most of this content is based on my master's thesis, which I could fortunately conduct at the Max Planck Institute for Psycholinguistics (see thesis or conference paper). Electroencephalography The brain produces electrical activity all the time, which can be measured via electrodes on the scalp—a method known as electroencephalography (EEG).

Naive principal component analysis in R

Principal Component Analysis (PCA) is a technique used to find the core components that underlie different variables. It comes in very useful whenever doubts arise about the true origin of three or more variables. There are two main methods for performing a PCA: naive or less naive. In the naive method, you first check some conditions in your data which will determine the essentials of the analysis. In the less-naive method, you set those yourself based on whatever prior information or purposes you had. The 'naive' approach is characterized by a first stage that checks whether the PCA should actually be performed with your current variables, or if some should be removed. The variables that are accepted are taken to a second stage which identifies the number of principal components that seem to underlie your set of variables.

Review of the Landscape Model of reading: Composition, dynamics and application

Throughout the 1990s, two opposing theories were used to explain how people understand texts, later bridged by the Landscape Model of reading (van den Broek, Young, Tzeng, & Linderholm, 1999). A review is offered below, including a schematic representation of the Landscape Model. Memory-based view The memory-based view presented reading as an autonomous, unconscious, effortless process. Readers were purported to achieve an understanding of a text as a whole by combining the concepts, and implications readily afforded, in the text with their own background knowledge (Myers & O’Brien, 1998; O’Brien & Myers, 1999).

At Greg, 8 am

The single dependent variable, RT, was accompanied by other variables which could be analyzed as independent variables. These included Group, Trial Number, and a within-subjects Condition. What had to be done first off, in order to take the usual table? The trials!

Modality switch effects emerge early and increase throughout conceptual processing: Evidence from ERPs

Research has extensively investigated whether conceptual processing is modality-specific—that is, whether meaning is processed to a large extent on the basis of perceptual and motor affordances (Barsalou, 2016). This possibility challenges long-established theories. It suggests a strong link between physical experience and language which is not borne out of the paradigmatic arbitrariness of words (see Lockwood, Dingemanse, & Hagoort, 2016). Modality-specificity also clashes with models of language that have no link to sensory and motor systems (Barsalou, 2016).

The case for data dashboards: First steps in R Shiny

Dashboards for data visualisation, such as R Shiny and Tableau, allow the interactive exploration of data by means of drop-down lists and checkboxes, with no coding required from the final users. These web applications run on internet browsers, allowing for three viewing modes, catered to both analysts and the public at large: (1) private viewing (useful during analysis), (2) selective sharing (used within work groups), and (3) internet publication. Among the available platforms, R Shiny and Tableau stand out due to being relatively accessible to new users. Apps serve a broad variety of purposes. In science and beyond, these apps allow us to go the extra mile in sharing data. Alongside files and code shared in repositories, we can present the data in a website, in the form of plots or tables. This facilitates the public exploration of each section of the data (groups, participants, trials...) to anyone interested, and allows researchers to account for their proceeding in the analysis.

EEG error: datasets missing channels

Most of the recordings are perfectly fine, but a few present a big error. Out of 64 original electrodes, only two appear. These are the right mastoid (RM) and the left eye sensor (LEOG). Both are bipolar electrodes. RM is to be re-referenced to the online reference electrode, while LEOG is to be re-referenced to the right eye electrode.

Modality exclusivity norms for 747 properties and concepts in Dutch: A replication of English

This study is a cross-linguistic, conceptual replication of Lynott and Connell’s (2009, 2013) modality exclusivity norms. The properties and concepts tested therein were translated into Dutch, and independently rated and analyzed (Bernabeu, 2018).

Conceptual modality switch effect measured at first word?

Traditionally, the second word presented (whether noun or adjective) has been the point of measure, both for RTs and ERPs. Yet, could it be better to measure at the first word?