R

You shall know a word by the company it keeps — so choose your prompts wisely

Post

In 1957, linguist J. R. Firth observed that 'you shall know a word by the company it keeps'. That principle — words that co-occur share meaning — is the foundation on which all of generative AI was built, from early Latent Semantic Analysis to today's trillion-parameter Transformers. This post traces the lineage with three interactive LSA-to-PCA visualisations in R (Reuters newswire, State of the Union addresses and IMDB reviews), showing where simple co-occurrence models succeed, where they fail and why scale alone turned a modest insight into the technology behind ChatGPT. It then examines why LLMs are optimised for fluency rather than truth — hallucinations are a structural consequence, not a bug to be patched — and argues that careful prompt engineering is the best tool we have for steering a fundamentally heuristic machine.

R functions for checking and fixing vmrk files from BrainVision

Post

Electroencephalography (EEG) has become a cornerstone for understanding the intricate workings of the human brain in the field of neuroscience. However, EEG software and hardware come with their own set of constraints, particularly in the management of markers, also known as triggers. This article aims to shed light on these limitations and future prospects of marker management in EEG studies, while also introducing R functions that can help deal with vmrk files from BrainVision.

rscopus_plus: An extension of the rscopus package

Post

Extension of the rscopus R package with functions that manage search quotas, retrieve DOIs for reference managers, search for additional DOIs, compare publication counts across topics, and visualize bibliometric comparisons over time.

FAQs on mixed-effects models

Post

Frequently asked questions about mixed-effects models, covering the necessity of random slopes, appropriate p-value calculation methods, parallelization limitations, convergence issues, and optimizer selection.

FAIR standards for the creation of research materials, with examples

Post

In the fast-paced world of scientific research, establishing minimum standards for the creation of research materials is essential. Whether it's stimuli, custom software for data collection, or scripts for statistical analysis, the quality and transparency of these materials significantly impact the reproducibility and credibility of research. This blog post explores the importance of adhering to FAIR (Findable, Accessible, Interoperable, Reusable) principles, and offers practical examples for researchers, with a focus on the cognitive sciences.

Preprocessing the Norwegian Web as Corpus (NoWaC) in R

Post

An R script for preprocessing frequency list data from the Norwegian Web as Corpus (NoWaC), including instructions for downloading and preparing the corpus data.

ggplotting power curves from the simr package

Post

A custom R function to create ggplot2 visualizations of power curves generated by the simr package's powerCurve function for mixed-effects models.

How to discretise the colour variable in sjPlot::plot_model into equally-sized intervals

Post

Whereas the direction of main effects can be interpreted from the sign of the estimate, the interpretation of interaction effects often requires plots. This task is facilitated by the R package sjPlot. For instance, using the plot_model function, I plotted the interaction between two continuous variables. library(lme4) #> Loading required package: Matrix library(sjPlot) #> Learn more about sjPlot with 'browseVignettes("sjPlot")'. library(ggplot2) theme_set(theme_sjplot()) # Create data partially based on code by Ben Bolker # from https://stackoverflow.

How to map more informative values onto fill argument of sjPlot::plot_model

Post

Whereas the direction of main effects can be interpreted from the sign of the estimate, the interpretation of interaction effects often requires plots. This task is facilitated by the R package sjPlot. For instance, using the plot_model function, I plotted the interaction between a continuous variable and a categorical variable. The categorical variable was passed to the fill argument of plot_model. library(lme4) #> Loading required package: Matrix library(sjPlot) #> Install package "strengejacke" from GitHub (`devtools::install_github("strengejacke/strengejacke")`) to load all sj-packages at once!

How to visually assess the convergence of a mixed-effects model by plotting various optimizers

Post

A custom R function to create ggplot2 visualizations of fixed effects from models refitted with multiple optimizers using lme4's allFit function, enabling visual assessment of convergence validity in mixed-effects models.