In 1957, linguist J. R. Firth observed that 'you shall know a word by the company it keeps'. That principle — words that co-occur share meaning — is the foundation on which all of generative AI was built, from early Latent Semantic Analysis to today's trillion-parameter Transformers. This post traces the lineage with three interactive LSA-to-PCA visualisations in R (Reuters newswire, State of the Union addresses and IMDB reviews), showing where simple co-occurrence models succeed, where they fail and why scale alone turned a modest insight into the technology behind ChatGPT. It then examines why LLMs are optimised for fluency rather than truth — hallucinations are a structural consequence, not a bug to be patched — and argues that careful prompt engineering is the best tool we have for steering a fundamentally heuristic machine.
Electroencephalography (EEG) has become a cornerstone for understanding the intricate workings of the human brain in the field of neuroscience. However, EEG software and hardware come with their own set of constraints, particularly in the management of markers, also known as triggers. This article aims to shed light on these limitations and future prospects of marker management in EEG studies, while also introducing R functions that can help deal with vmrk files from BrainVision.
Extension of the rscopus R package with functions that manage search quotas, retrieve DOIs for reference managers, search for additional DOIs, compare publication counts across topics, and visualize bibliometric comparisons over time.
Frequently asked questions about mixed-effects models, covering the necessity of random slopes, appropriate p-value calculation methods, parallelization limitations, convergence issues, and optimizer selection.
In the fast-paced world of scientific research, establishing minimum standards for the creation of research materials is essential. Whether it's stimuli, custom software for data collection, or scripts for statistical analysis, the quality and transparency of these materials significantly impact the reproducibility and credibility of research. This blog post explores the importance of adhering to FAIR (Findable, Accessible, Interoperable, Reusable) principles, and offers practical examples for researchers, with a focus on the cognitive sciences.
An R script for preprocessing frequency list data from the Norwegian Web as Corpus (NoWaC), including instructions for downloading and preparing the corpus data.
Whereas the direction of main effects can be interpreted from the sign of the estimate, the interpretation of interaction effects often requires plots. This task is facilitated by the R package sjPlot. For instance, using the plot_model function, I plotted the interaction between two continuous variables.
library(lme4)
#> Loading required package: Matrix
library(sjPlot)
#> Learn more about sjPlot with 'browseVignettes("sjPlot")'.
library(ggplot2)
theme_set(theme_sjplot())
# Create data partially based on code by Ben Bolker # from https://stackoverflow.
Whereas the direction of main effects can be interpreted from the sign of the estimate, the interpretation of interaction effects often requires plots. This task is facilitated by the R package sjPlot. For instance, using the plot_model function, I plotted the interaction between a continuous variable and a categorical variable. The categorical variable was passed to the fill argument of plot_model.
library(lme4)
#> Loading required package: Matrix
library(sjPlot)
#> Install package "strengejacke" from GitHub (`devtools::install_github("strengejacke/strengejacke")`) to load all sj-packages at once!
A custom R function to create ggplot2 visualizations of fixed effects from models refitted with multiple optimizers using lme4's allFit function, enabling visual assessment of convergence validity in mixed-effects models.