R

You shall know a word by the company it keeps — so choose your prompts wisely

Post

In 1957, linguist J. R. Firth observed that 'you shall know a word by the company it keeps'. That principle — words that co-occur share meaning — is the foundation on which all of generative AI was built, from early Latent Semantic Analysis to today's trillion-parameter Transformers. This post traces the lineage with three interactive LSA-to-PCA visualisations in R (Reuters newswire, State of the Union addresses and IMDB reviews), showing where simple co-occurrence models succeed, where they fail and why scale alone turned a modest insight into the technology behind ChatGPT. It then examines why LLMs are optimised for fluency rather than truth — hallucinations are a structural consequence, not a bug to be patched — and argues that careful prompt engineering is the best tool we have for steering a fundamentally heuristic machine.

Scaling systematic reviews: A solo researcher's workflow with Gemini

Presentation

Conducting systematic literature reviews is traditionally a laborious, manual process involving the extraction of distinct data points from hundreds of academic papers, but this presentation introduces a structured, multi-tool workflow using the …

rscopus_plus: An extension of the rscopus package

Post

Extension of the rscopus R package with functions that manage search quotas, retrieve DOIs for reference managers, search for additional DOIs, compare publication counts across topics, and visualize bibliometric comparisons over time.

FAQs on mixed-effects models

Post

Frequently asked questions about mixed-effects models, covering the necessity of random slopes, appropriate p-value calculation methods, parallelization limitations, convergence issues, and optimizer selection.

FAIR standards for the creation of research materials, with examples

Post

In the fast-paced world of scientific research, establishing minimum standards for the creation of research materials is essential. Whether it's stimuli, custom software for data collection, or scripts for statistical analysis, the quality and transparency of these materials significantly impact the reproducibility and credibility of research. This blog post explores the importance of adhering to FAIR (Findable, Accessible, Interoperable, Reusable) principles, and offers practical examples for researchers, with a focus on the cognitive sciences.

Preprocessing the Norwegian Web as Corpus (NoWaC) in R

Post

An R script for preprocessing frequency list data from the Norwegian Web as Corpus (NoWaC), including instructions for downloading and preparing the corpus data.

A new function to plot convergence diagnostics from lme4::allFit()

Post

When a model has struggled to find enough information in the data to account for every predictor---especially for every random effect---, convergence warnings appear (Brauer & Curtin, 2018; Singmann & Kellen, 2019). In this article, I review the issue of convergence before presenting a new plotting function in R that facilitates the visualisation of the fixed effects fitted by different optimization algorithms (also dubbed optimizers).

Cannot open plots created with brms::mcmc_plot due to lack of discrete_range function

Post

I would like to ask for advice regarding some plots that were created using brms::mcmc_plot(), and cannot be opened in R now. The plots were created last year using brms 2.17.0, and were saved in RDS objects. The problem I have is that I cannot open the plots in R now because I get an error related to a missing function. I would be very grateful if someone could please advise me if they can think of a possible reason or solution.

A table of results for Bayesian mixed-effects models: Grouping variables and specifying random slopes

Post

Here I share the format applied to tables presenting the results of Bayesian models in Bernabeu (2022). The sample table presents a mixed-effects model that was fitted using the R package 'brms' (Bürkner et al., 2022).

A table of results for frequentist mixed-effects models: Grouping variables and specifying random slopes

Post

Here I share the format applied to tables presenting the results of frequentist models in Bernabeu (2022). The sample table presents a mixed-effects model that was fitted using the R package 'lmerTest' (Kuznetsova et al., 2022).