General methods

The analytical method was broadly similar across the three studies. Below, we present the commonalities in the statistical analysis and in the power analysis. Several R packages from the ‘tidyverse’ (Wickham et al., 2019) were used.


Several covariates—or nuisance variables—were included in each study to allow a rigorous analysis of the effects of interest (Sassenhagen & Alday, 2016). Unlike the effects of interest, these covariates were not critical to our research question (i.e., the interplay between language-based and vision-based information). They comprised participant-specific variables (e.g., attentional control), lexical variables (e.g., word frequency) and word concreteness. The covariates are distinguished from the effects of interest in the results table(s) in each study. The three kinds of covariates included were as follows.

Participant-specific covariates were measures akin to general cognition, and were included because some studies have found that the effect of vocabulary size was moderated by general cognition variables such as processing speed (Ratcliff et al., 2010; Yap et al., 2012). Similarly, research has evidenced the role of attentional control (Hutchison et al., 2014; Yap et al., 2017), and authors have expressed the desirability of including such covariates in models (James et al., 2018; Pexman & Yap, 2018). Therefore, we included in the analyses a individual measure of ‘general cognition’, where available. These measures were available in the first two studies, and they indexed task performance abilities that were different from vocabulary knowledge. We refer to them by their more specific names in each study.7 In Study 2.1, the measure used was attentional control (Hutchison et al., 2013). In Study 2.2, it was information uptake (Pexman & Yap, 2018). In Study 2.3, such a covariate was not used as it was not available in the data set of Balota et al. (2007).

Lexical covariates were selected in every study out of the same five variables, which had been used as covariates in Wingfield and Connell (2022b; also see Petilli et al., 2021). They comprised: number of letters (i.e., orthographic length), word frequency, number of syllables (both the latter from Balota et al., 2007), orthographic Levenshtein distance (Yarkoni et al., 2008) and phonological Levenshtein distance (Suárez et al., 2011; Yap & Balota, 2009). The selection among these candidates was performed because some of them were highly intercorrelated—i.e., \(r\) > .70 (Dormann et al., 2013; Harrison et al., 2018). The correlations and the selection models are available in Appendix A.

Word concreteness was included due to the pervasive effect of this variable across lexical and semantic tasks (Brysbaert et al., 2014; Connell & Lynott, 2012; Pexman & Yap, 2018), and due to the sizable correlations (\(r\) > .30) between word concreteness and some other predictors, such as visual strength (see correlation figures in each study). Furthermore, the role of word concreteness has been contested, with some research suggesting that its effect stems from perceptual simulation (Connell & Lynott, 2012) versus other research suggesting that the effect is amodal (Bottini et al., 2021). In passing, we will bring our results to bear on the role of word concreteness.

Data preprocessing and statistical analysis

In the three studies, the statistical analysis was designed to investigate the contribution of each effect of interest. The following preprocessing steps were applied. First, incorrect responses were removed. Second, nonword trials were removed (only necessary in Studies 2.1 and 2.3). Third, too fast and too slow responses were removed. For the latter purpose, we applied the same thresholds that had been applied in each of the original studies. That is, in Study 2.1, we removed responses faster than 200 ms or slower than 3,000 ms (Hutchison et al., 2013). In Study 2.2, we removed responses faster than 250 ms or slower than 3,000 ms (Pexman & Yap, 2018). In Study 2.3, we removed responses faster than 200 ms or slower than 4,000 ms (Balota et al., 2007). Next, the dependent variable—response time (RT)—was \(z\)-scored around each participant’s mean to curb the influence of each participant’s baseline speed (Balota et al., 2007; Kumar et al., 2020; Lim et al., 2020; Pexman et al., 2017; Pexman & Yap, 2018; Yap et al., 2012, 2017). This was important because the size of experimental effects is known to increase with longer RTs (Faust et al., 1999). Next, binary predictors were recoded into continuous variables (Brauer & Curtin, 2018). Specifically, participants’ gender was recoded as follows: Female = 0.5, X = 0, Male = -0.5. The SOAs in Study 2.1 were recoded as follows: 200 ms = -0.5; 1,200 ms = 0.5. Next, the data sets were trimmed by removing rows that lacked values on any variable, and by also removing RTs that were more than 3 standard deviations (SD) away from the mean (M). The nesting factors applied in the trimming are specified in each study. Finally, all predictors were \(z\)-scored, resulting in M ≈ 0 and SD ≈ 1 (values not exact as the variables were not normally distributed). More specifically, between-item predictors—i.e., word-level variables (e.g., language-based information) and task-level variables (e.g., SOA)—were \(z\)-scored around each participant’s own mean (Brauer & Curtin, 2018).

Random effects

With regard to random effects, participants and stimuli were crossed in the three studies. That is, each participant was presented with a subset of the stimuli. Conversely, each word was presented to a subset of participants. Therefore, linear mixed-effects models were implemented. These models included a maximal random-effects structure, with by-participant and by-item random intercepts, and the appropriate random slopes for all effects of interest (Barr et al., 2013; Brauer & Curtin, 2018; Singmann & Kellen, 2019). Random effects—especially random slopes—constrain the analytical space by claiming their share of variance. As a result, that variance cannot be taken by the fixed effects. In the semantic priming study, the items were prime–target pairs, whereas in the semantic decision and lexical decision studies, the items were individual words. In the case of interactions, random slopes were included only when the interacting variables varied within the same unit (Brauer & Curtin, 2018)—e.g., an interaction of two variables varying within participants (only present in Study 2.1). Where required due to convergence warnings, random slopes for covariates were removed, as inspired by Remedy 11 from Brauer and Curtin (2018). In this regard, whereas Brauer and Curtin (2018) contemplate the removal of random slopes for covariates only when the covariates are not interacting with any effects of interest, we removed random slopes for covariates even if they interacted with effects of interest because these interactions were covariates themselves.

To avoid an inflation of the Type I error rate—i.e., false positives—, the random slopes for the effects of interest (as indicated in each study) were never removed (see Table 17 in Brauer & Curtin, 2018; for an example of this approach, see Diaz et al., 2021). This approach arguably provides a better protection against false positives (Barr et al., 2013; Brauer & Curtin, 2018; Singmann & Kellen, 2019) than the practice of removing random slopes when they do not significantly improve the fit (Baayen et al., 2008; Bates et al., 2015; e.g., Bernabeu et al., 2017; Pexman & Yap, 2018; but also see Matuschek et al., 2017).

Frequentist analysis

\(P\) values were calculated using the Kenward-Roger approximation for degrees of freedom (Luke, 2017) in the R package ‘lmerTest’, Version 3.1-3 (Kuznetsova et al., 2017). The latter package in turn used ‘lme4’, Version 1.1-26 (Bates et al., 2015; Bates et al., 2021). To facilitate the convergence of the models, the maximum number of iterations was set to 1 million. Diagnostics regarding convergence and normality are provided in Appendix B. Those effects that are non-significant or very small are best interpreted by considering the confidence intervals and the credible intervals (Cumming, 2014).

The R package ‘GGally’ (Schloerke et al., 2021) was used to create correlation plots, whereas the package ‘sjPlot’ (Lüdecke, 2021) was used for interaction plots.

Bayesian analysis

A Bayesian analysis was performed to complement the estimates that had been obtained in the frequentist analysis. Whereas the goal of the frequentist analysis had been hypothesis testing, for which \(p\) values were used, the goal of the Bayesian analysis was parameter estimation. Accordingly, we estimated the posterior distribution of every effect, without calculating Bayes factors (for other examples of the same estimation approach, see Milek et al., 2018; Pregla et al., 2021; Rodríguez-Ferreiro et al., 2020; for comparisons between estimation and hypothesis testing, see Cumming, 2014; Kruschke & Liddell, 2018; Rouder et al., 2018; Schmalz et al., 2021; Tendeiro & Kiers, 2019, in press; van Ravenzwaaij & Wagenmakers, 2021). In the estimation approach, the estimates are interpreted by considering the position of their credible intervals in relation to the expected effect size. That is, the closer an interval is to an effect size of 0, the smaller the effect of that predictor. For instance, an interval that is symmetrically centred on 0 indicates a very small effect, whereas—in comparison—an interval that does not include 0 at all indicates a far larger effect.

This analysis served two purposes: first, to ascertain the interpretation of the smaller effects—which were identified as unreliable in the power analyses—, and second, to complement the estimates obtained in the frequentist analysis. The latter purpose was pertinent because the frequentist models presented convergence warnings—even though it must be noted that a previous study found that frequentist and Bayesian estimates were similar despite convergence warnings appearing in the frequentist analysis (Rodríguez-Ferreiro et al., 2020). Furthermore, the complementary analysis was pertinent because the frequentist models presented residual errors that deviated from normality—even though mixed-effects models are fairly robust to such a deviation (Knief & Forstmeier, 2021; Schielzeth et al., 2020). Owing to these precedents, we expected to find broadly similar estimates in the frequentist analyses and in the Bayesian ones. Across studies, each frequentist model has a Bayesian counterpart, with the exception of the secondary analysis performed in Study 2.1 (semantic priming) that included vision-based similarity as a predictor. The R package ‘brms’, Version 2.17.0, was used for the Bayesian analysis (Bürkner, 2018; Bürkner et al., 2022).


The priors were established by inspecting the effect sizes obtained in previous studies as well as the effect sizes obtained in our frequentist analyses of the present data (reported in Studies 2.1, 2.2 and 2.3 below). In the first regard, the previous studies that were considered were selected because the experimental paradigms, variables and analytical procedures they had used were similar to those used in our current studies. Specifically, regarding paradigms, we sought studies that implemented: (I) semantic priming with a lexical decision task—as in Study 2.1—, (II) semantic decision—as in Study 2.2—, or (III) lexical decision—as in Study 2.3. Regarding analytical procedures, we sought studies in which both the dependent and the independent variables were \(z\)-scored. We found two studies that broadly matched these criteria: Lim et al. (2020) (see Table 5 therein) and Pexman and Yap (2018) (see Tables 6 and 7 therein). Out of these studies, Pexman and Yap (2018) contained the variables that were most similar to ours, which included vocabulary size (labelled ‘NAART’) and word frequency.

Based on both these studies and on the frequentist analyses reported below, a range of effect sizes was identified that spanned between β = -0.30 and β = 0.30. This range was centred around 0 as the variables were \(z\)-scored. The bounds of this range were determined by the largest effects, which appeared in Pexman and Yap (2018). Pexman et al. conducted a semantic decision study, and split the data set into abstract and concrete words. The two largest effects they found were—first—a word concreteness effect in the concrete-words analysis of β = -0.41, and—second—a word concreteness effect in the abstract-words analysis of β = 0.20. Unlike Pexman et al., we did not split the data set into abstract and concrete words, but analysed these sets together. Therefore, we averaged between the aforementioned values, obtaining a range between β = -0.30 and β = 0.30.

In the results of Lim et al. (2020) and Pexman and Yap (2018), and in our frequentist results, some effects consistently presented a negative polarity (i.e., leading to shorter response times), whereas some other effects were consistently positive. We incorporated the direction of effects into the priors only in cases of large effects that had presented a consistent direction (either positive or negative) in previous studies and in our frequentist analyses in the present studies. These criteria were matched by the following variables: word frequency—with a negative direction, as higher word frequency leads to shorter RTs (Brysbaert et al., 2016; Brysbaert et al., 2018; Lim et al., 2020; Mendes & Undorf, 2021; Pexman & Yap, 2018)—, number of letters and number of syllables—both with positive directions (Barton et al., 2014; Beyersmann et al., 2020; Pexman & Yap, 2018)—, and orthographic Levenshtein distance—with a positive direction (Cerni et al., 2016; Dijkstra et al., 2019; Kim et al., 2018; Yarkoni et al., 2008). We did not incorporate information about the direction of the word concreteness effect, as this effect can follow different directions in abstract and concrete words (Brysbaert et al., 2014; Pexman & Yap, 2018), and we analysed both sets of words together. In conclusion, the four predictors that had directional priors were covariates. All the other predictors had priors centred on 0. Last, as a methodological matter, it is noteworthy that most of the psycholinguistic studies applying Bayesian analysis have not incorporated any directional information in priors (e.g., Pregla et al., 2021; Rodríguez-Ferreiro et al., 2020; Stone et al., 2020; cf. Stone et al., 2021).

Prior distributions and prior predictive checks

The choice of priors can influence the results in consequential ways. To assess the extent of this influence, prior sensitivity analyses have been recommended. These analyses are performed by comparing the effect of more and less strict priors—or, in other words, priors varying in their degree of informativeness. The degree of variation is adjusted through the standard deviation, and the means are not varied (Lee & Wagenmakers, 2014; Schoot et al., 2021; Stone et al., 2020).

In this way, we compared the results obtained using ‘informative’ priors (SD = 0.1), ‘weakly-informative’ priors (SD = 0.2) and ‘diffuse’ priors (SD = 0.3). These standard deviations were chosen so that around 95% of values in the informative priors would fall within our initial range of effect sizes that spanned from -0.30 to 0.30. All priors are illustrated in Figure 1. These priors resembled others from previous psycholinguistic studies (Pregla et al., 2021; Stone et al., 2020; Stone et al., 2021). For instance, Stone et al. (2020) used the following priors: \(Normal\)(0, 0.1), \(Normal\)(0, 0.3) and \(Normal\)(0, 1). The range of standard deviations we used—i.e., 0.1, 0.2 and 0.3—was narrower than those of previous studies because our dependent variable and our predictors were \(z\)-scored, resulting in small estimates and small SDs (see Lim et al., 2020; Pexman & Yap, 2018). These priors were used on the fixed effects and on the standard deviation parameters of the fixed effects. For the correlations among the random effects, an LKJ(2) prior was used (Lewandowski et al., 2009). This is a ‘regularising’ prior, as it assumes that high correlations among random effects are rare (also used in Rodríguez-Ferreiro et al., 2020; Stone et al., 2020; Stone et al., 2021; Vasishth, Nicenboim, et al., 2018).


source('bayesian_priors/bayesian_priors.R', local = TRUE)

    getwd(),  # Circumvent illegal characters in file path

Figure 1: Priors used in the three studies. The green vertical rectangle shows the range of plausible effect sizes based on previous studies and on our frequentist analyses. In the informative priors, around 95% of the values fall within the range.

The adequacy of each of these priors was assessed by performing prior predictive checks, in which we compared the observed data to the predictions of the model (Schoot et al., 2021). Furthermore, in these checks we also tested the adequacy of two model-wide distributions: the traditional Gaussian distribution (default in most analyses) and an exponentially modified Gaussian—dubbed ‘ex-Gaussian’—distribution (Matzke & Wagenmakers, 2009). The ex-Gaussian distribution was considered because the residual errors of the frequentist models were not normally distributed (Lo & Andrews, 2015), and because this distribution was found to be more appropriate than the Gaussian one in a previous, related study (see supplementary materials of Rodríguez-Ferreiro et al., 2020). The ex-Gaussian distribution had an identity link function, which preserves the interpretability of the coefficients, as opposed to a transformation applied directly to the dependent variable (Lo & Andrews, 2015). The results of these prior predictive checks revealed that the priors were adequate, and that the ex-Gaussian distribution was more appropriate than the Gaussian one (see Appendix C), converging with Rodríguez-Ferreiro et al. (2020). Therefore, the ex-Gaussian distribution was used in the final models.

Prior sensitivity analysis

In the main analysis, the informative, weakly-informative and diffuse priors were used in separate models. In other words, in each model, all priors had the same degree of informativeness (as done in Pregla et al., 2021; Rodríguez-Ferreiro et al., 2020; Stone et al., 2020; Stone et al., 2021). In this way, a prior sensitivity analysis was performed to acknowledge the likely influence of the priors on the posterior distributions—that is, on the results (Lee & Wagenmakers, 2014; Schoot et al., 2021; Stone et al., 2020).

Posterior distributions

Posterior predictive checks were performed to assess the consistency between the observed data and new data predicted by the posterior distributions (Schoot et al., 2021). These checks are available in Appendix C.


When convergence was not reached in a model, as indicated by \(\widehat R\) > 1.01 (Schoot et al., 2021; Vehtari et al., 2021), the number of iterations was increased and the random slopes for covariates were removed (Brauer & Curtin, 2018). The resulting random effects in these models were largely the same as those present in the frequentist models. The only exception concerned the models of the lexical decision study. In the frequentist model for the latter study, the random slopes for covariates were removed due to convergence warnings, whereas in the Bayesian analysis, these random slopes did not have to be removed as the models converged, thanks to the large number of iterations that were run. In the lexical decision study, it was possible to run a larger number of iterations than in the two other studies, as the lexical decision data set had fewer observations, resulting in faster running.

The Bayesian models in the semantic decision study could not be made to converge, and the final results of these models were not valid. Therefore, those estimates are not shown in the main text, but are available in Appendix E.

Statistical power analysis

Power curves based on Monte Carlo simulations were performed for most of the effects of interest using the R package ‘simr’, Version 1.0.5 (Green & MacLeod, 2016). Obtaining power curves for a range of effects in each study allows for a comprehensive assessment of the plausibility of the power estimated for each effect.

In each study, the item-level sample size—i.e., the number of words—was not modified. Therefore, to plan the sample size for future studies, these results must be considered under the assumptions that the future study would apply a statistical method similar to ours—namely, a mixed-effects model with random intercepts and slopes—, and that the analysis would encompass at least as many stimuli as the corresponding study (numbers detailed in each study below). \(P\) values were calculated using the Satterthwaite approximation for degrees of freedom (Luke, 2017).

Monte Carlo simulations consist of running the statistical model a large number of times, under slight, random variations of the dependent variable (Green & MacLeod, 2016; for a comparable approach, see Loken & Gelman, 2017). The power to detect each effect of interest is calculated by dividing the number of times that the effect is significant by the total number of simulations run. For instance, if an effect is significant on 85 simulations out of 100, the power for that effect is 85% (Kumle et al., 2021). The sample sizes tested in the semantic priming study ranged from 50 to 800 participants, whereas those tested in the semantic decision and lexical decision studies ranged from 50 to 2,000 participants. These sample sizes were unequally spaced to limit the computational requirements. They comprised the following: 50, 100, 200, 300, 400, 500, 600, 700, 800, 1,200, 1,600 and 2,000 participants.8 The variance of the results decreases as more simulations are run. In each of our three studies, 200 simulations (as in Brysbaert & Stevens, 2018) were run for each effect of interest and for each sample size under consideration. Thus, for a power curve examining the power for an effect across 12 sample sizes, 2,400 simulations were run.

Power analyses require setting an effect size for each effect. Often, it is difficult to determine the effect size, as the amount and the scope of relevant research are usually finite and biased (Albers & Lakens, 2018; Gelman & Carlin, 2014; Kumle et al., 2021). In some power analyses, the original effect sizes from previous studies have been adopted without any modification (e.g., Pacini & Barnard, 2021; Villalonga et al., 2021). In contrast, some authors have opted to reduce the previous effect sizes to account for two intervening factors. First, publication bias and insufficient statistical power cause published effect sizes to be inflated (Brysbaert, 2019; Loken & Gelman, 2017; Open Science Collaboration, 2015; Vasishth, Mertzen, et al., 2018; Vasishth & Gelman, 2021). Second, over the course of the research, a variety of circumstances could create differences between the planned study and the studies that were used in the power analysis. Some of these differences could be foreseeable—for instance, if they are due to a limitation in the literature available for the power analysis—, whereas other differences might be unforeseeable and could go unnoticed (Barsalou, 2019; Noah et al., 2018). Reducing the effect size in the power analysis leads to an increase of the sample size of the planned study (Brysbaert & Stevens, 2018; Green & MacLeod, 2016; Hoenig & Heisey, 2001). The reduced effect size—sometimes dubbed the smallest effect size of interest—is often set with a degree of arbitrariness. In previous studies, Fleur et al. (2020) applied a reduction of 1/8 (i.e., 12.5%), whereas Kumle et al. (2021) applied a 15% reduction. In the present study, a reduction of 20% was applied to every effect in the power analysis. By comparison with the power analyses reviewed in this paragraph, the present reduction will lead to a more conservative estimate of required sample sizes. However, after considering the precedents of small samples and publication bias reviewed above, a 20% reduction is arguably a reasonable safeguard. Indeed, a posteriori, the results of our power analyses suggested that the 20% reduction had not been excessive, as some of the effects examined were detectable with small sample sizes.

Both the primary analysis and the power analysis were performed in R (R Core Team, 2021). Version 4.0.2 was used for the frequentist analysis, Version 4.1.0 was used for the Bayesian analysis, and Version 4.1.2 was used for fast operations such as data preprocessing and plotting. Given the complexity of these analyses, all the statistical and the power analyses were run on the High-End Computing facility at Lancaster University.9


Albers, C., & Lakens, D. (2018). When power analyses based on pilot data are biased: Inaccurate effect size estimators and follow-up bias. Journal of Experimental Social Psychology, 74, 187–195.
Baayen, R. H., Davidson, D. J., & Bates, D. M. (2008). Mixed-effects modeling with crossed random effects for subjects and items. Journal of Memory and Language, 59(4), 390–412.
Balota, D. A., Yap, M. J., Hutchison, K. A., Cortese, M. J., Kessler, B., Loftis, B., Neely, J. H., Nelson, D. L., Simpson, G. B., & Treiman, R. (2007). The English Lexicon Project. Behavior Research Methods, 39, 445–459.
Barr, D. J., Levy, R., Scheepers, C., & Tily, H. J. (2013). Random effects structure for confirmatory hypothesis testing: Keep it maximal. Journal of Memory and Language, 68(3), 255–278.
Barsalou, L. W. (2019). Establishing generalizable mechanisms. Psychological Inquiry, 30(4), 220–230.
Barton, J. J. S., Hanif, H. M., Eklinder Björnström, L., & Hills, C. (2014). The word-length effect in reading: A review. Cognitive Neuropsychology, 31(5-6), 378–412.
Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67, 1–48.
Bates, D., Maechler, M., Bolker, B., Walker, S., Christensen, R. H. B., Singmann, H., Dai, B., Scheipl, F., Grothendieck, G., Green, P., Fox, J., Brauer, A., & Krivitsky, P. N. (2021). Package ’lme4. CRAN.
Bernabeu, P., Willems, R. M., & Louwerse, M. M. (2017). Modality switch effects emerge early and increase throughout conceptual processing: Evidence from ERPs. In G. Gunzelmann, A. Howes, T. Tenbrink, & E. J. Davelaar (Eds.), Proceedings of the 39th Annual Conference of the Cognitive Science Society (pp. 1629–1634). Cognitive Science Society.
Beyersmann, E., Grainger, J., & Taft, M. (2020). Evidence for embedded word length effects in complex nonwords. Language, Cognition and Neuroscience, 35(2), 235–245.
Bottini, R., Morucci, P., D’Urso, A., Collignon, O., & Crepaldi, D. (2021). The concreteness advantage in lexical decision does not depend on perceptual simulations. Journal of Experimental Psychology: General.
Brauer, M., & Curtin, J. J. (2018). Linear mixed-effects models and the analysis of nonindependent data: A unified framework to analyze categorical and continuous independent variables that vary within-subjects and/or within-items. Psychological Methods, 23(3), 389–411.
Brysbaert, M. (2019). How many participants do we have to include in properly powered experiments? A tutorial of power analysis with reference tables. Journal of Cognition, 2(1, 1), 16.
Brysbaert, M., Mandera, P., & Keuleers, E. (2018). The word frequency effect in word processing: An updated review. Current Directions in Psychological Science, 27(1), 45–50.
Brysbaert, M., & Stevens, M. (2018). Power analysis and effect size in mixed effects models: A tutorial. Journal of Cognition, 1(1), 9.
Brysbaert, M., Stevens, M., Mandera, P., & Keuleers, E. (2016). The impact of word prevalence on lexical decision times: Evidence from the Dutch Lexicon Project 2. Journal of Experimental Psychology: Human Perception and Performance, 42(3), 441–458.
Brysbaert, M., Warriner, A. B., & Kuperman, V. (2014). Concreteness ratings for 40 thousand generally known English word lemmas. Behavior Research Methods, 46, 904–911.
Bürkner, P.-C. (2018). Advanced Bayesian multilevel modeling with the R package brms. The R Journal, 10(1), 395–411.
Bürkner, P.-C., Gabry, J., Weber, S., Johnson, A., Modrak, M., Badr, H. S., Weber, F., Ben-Shachar, M. S., & Rabel, H. (2022). Package ’brms. CRAN.
Cerni, T., Velay, J.-L., Alario, F.-X., Vaugoyeau, M., & Longcamp, M. (2016). Motor expertise for typing impacts lexical decision performance. Trends in Neuroscience and Education, 5(3), 130–138.
Connell, L., & Lynott, D. (2012). Strength of perceptual experience predicts word processing performance better than concreteness or imageability. Cognition, 125(3), 452–465.
Cumming, G. (2014). The new statistics: Why and how. Psychological Science, 25(1), 7–29.
Diaz, M. T., Karimi, H., Troutman, S. B. W., Gertel, V. H., Cosgrove, A. L., & Zhang, H. (2021). Neural sensitivity to phonological characteristics is stable across the lifespan. NeuroImage, 225, 117511.
Dijkstra, T., Wahl, A., Buytenhuijs, F., Halem, N. V., Al-Jibouri, Z., Korte, M. D., & Rekké, S. (2019). Multilink: A computational model for bilingual word recognition and word translation. Bilingualism: Language and Cognition, 22(4), 657–679.
Dormann, C. F., Elith, J., Bacher, S., Buchmann, C., Carl, G., Carré, G., Marquéz, J. R. G., Gruber, B., Lafourcade, B., Leitão, P. J., Münkemüller, T., McClean, C., Osborne, P. E., Reineking, B., Schröder, B., Skidmore, A. K., Zurell, D., & Lautenbach, S. (2013). Collinearity: A review of methods to deal with it and a simulation study evaluating their performance. Ecography, 36(1), 27–46.
Faust, M. E., Balota, D. A., Spieler, D. H., & Ferraro, F. R. (1999). Individual differences in information-processing rate and amount: Implications for group differences in response latency. Psychological Bulletin, 125, 777–799.
Fleur, D. S., Flecken, M., Rommers, J., & Nieuwland, M. S. (2020). Definitely saw it coming? The dual nature of the pre-nominal prediction effect. Cognition, 204, 104335.
Gelman, A., & Carlin, J. (2014). Beyond power calculations: Assessing type S (sign) and type M (magnitude) errors. Perspectives on Psychological Science, 9(6), 641–651.
Green, P., & MacLeod, C. J. (2016). SIMR: An R package for power analysis of generalized linear mixed models by simulation. Methods in Ecology and Evolution, 7(4), 493–498.
Harrison, X. A., Donaldson, L., Correa-Cano, M. E., Evans, J., Fisher, D. N., Goodwin, C., Robinson, B. S., Hodgson, D. J., & Inger, R. (2018). A brief introduction to mixed effects modelling and multi-model inference in ecology. PeerJ, 6, 4794.
Hoenig, J. M., & Heisey, D. M. (2001). The Abuse of Power. The American Statistician, 55(1), 19–24.
Hutchison, K. A., Balota, D. A., Neely, J. H., Cortese, M. J., Cohen-Shikora, E. R., Tse, C.-S., Yap, M. J., Bengson, J. J., Niemeyer, D., & Buchanan, E. (2013). The semantic priming project. Behavior Research Methods, 45, 1099–1114.
Hutchison, K. A., Heap, S. J., Neely, J. H., & Thomas, M. A. (2014). Attentional control and asymmetric associative priming. Journal of Experimental Psychology: Learning, Memory, and Cognition, 40(3), 844–856.
James, A. N., Fraundorf, S. H., Lee, E. K., & Watson, D. G. (2018). Individual differences in syntactic processing: Is there evidence for reader-text interactions? Journal of Memory and Language, 102, 155–181.
Kim, M., Crossley, S. A., & Skalicky, S. (2018). Effects of lexical features, textual properties, and individual differences on word processing times during second language reading comprehension. Reading and Writing, 31(5), 1155–1180.
Knief, U., & Forstmeier, W. (2021). Violating the normality assumption may be the lesser of two evils. Behavior Research Methods.
Kruschke, J. K., & Liddell, T. M. (2018). The Bayesian New Statistics: Hypothesis testing, estimation, meta-analysis, and power analysis from a Bayesian perspective. Psychonomic Bulletin & Review, 25(1), 178–206.
Kumar, A. A., Balota, D. A., & Steyvers, M. (2020). Distant connectivity and multiple-step priming in large-scale semantic networks. Journal of Experimental Psychology: Learning, Memory, and Cognition, 46(12), 2261–2276.
Kumle, L., Võ, M. L.-H., & Draschkow, D. (2021). Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R. Behavior Research Methods.
Kuznetsova, A., Brockhoff, P. B., & Christensen, R. H. B. (2017). lmerTest package: Tests in linear mixed effects models. Journal of Statistical Software, 82(13), 1–26.
Lee, M. D., & Wagenmakers, E.-J. (2014). Bayesian cognitive modeling: A practical course. Cambridge University Press.
Lewandowski, D., Kurowicka, D., & Joe, H. (2009). Generating random correlation matrices based on vines and extended onion method. Journal of Multivariate Analysis, 100(9), 1989–2001.
Lim, R. Y., Yap, M. J., & Tse, C.-S. (2020). Individual differences in Cantonese Chinese word recognition: Insights from the Chinese Lexicon Project. Quarterly Journal of Experimental Psychology, 73(4), 504–518.
Lo, S., & Andrews, S. (2015). To transform or not to transform: Using generalized linear mixed models to analyse reaction time data. Frontiers in Psychology, 6, 1171.
Loken, E., & Gelman, A. (2017). Measurement error and the replication crisis. Science, 355(6325), 584–585.
Lüdecke, D. (2021). sjPlot: Data visualization for statistics in social science.
Luke, S. G. (2017). Evaluating significance in linear mixed-effects models in R. Behavior Research Methods, 49(4), 1494–1502.
Matuschek, H., Kliegl, R., Vasishth, S., Baayen, H., & Bates, D. (2017). Balancing Type I error and power in linear mixed models. Journal of Memory and Language, 94, 305–315.
Matzke, D., & Wagenmakers, E.-J. (2009). Psychological interpretation of the ex-Gaussian and shifted Wald parameters: A diffusion model analysis. Psychonomic Bulletin & Review, 16(5), 798–817.
Mendes, P. S., & Undorf, M. (2021). On the pervasive effect of word frequency in metamemory. Quarterly Journal of Experimental Psychology, 17470218211053329.
Milek, A., Butler, E. A., Tackman, A. M., Kaplan, D. M., Raison, C. L., Sbarra, D. A., Vazire, S., & Mehl, M. R. (2018). Eavesdropping on happiness” revisited: A pooled, multisample replication of the association between life satisfaction and observed daily conversation quantity and quality. Psychological Science, 29(9), 1451–1462.
Noah, T., Schul, Y., & Mayo, R. (2018). When both the original study and its failed replication are correct: Feeling observed eliminates the facial-feedback effect. Journal of Personality and Social Psychology, 114, 657–664.
Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349(6251), aac4716.
Pacini, A. M., & Barnard, P. J. (2021). Exocentric coding of the mapping between valence and regions of space: Implications for embodied cognition. Acta Psychologica, 214, 103264.
Petilli, M. A., Günther, F., Vergallito, A., Ciapparelli, M., & Marelli, M. (2021). Data-driven computational models reveal perceptual simulation in word processing. Journal of Memory and Language, 117, 104194.
Pexman, P. M., Heard, A., Lloyd, E., & Yap, M. J. (2017). The Calgary semantic decision project: Concrete/abstract decision data for 10,000 English words. Behavior Research Methods, 49(2), 407–417.
Pexman, P. M., & Yap, M. J. (2018). Individual differences in semantic processing: Insights from the Calgary semantic decision project. Journal of Experimental Psychology: Learning, Memory, and Cognition, 44(7), 1091–1112.
Pregla, D., Lissón, P., Vasishth, S., Burchert, F., & Stadie, N. (2021). Variability in sentence comprehension in aphasia in German. Brain and Language, 222, 105008.
R Core Team. (2021). R: A language and environment for statistical computing. R Foundation for Statistical Computing.
Ratcliff, R., Thapar, A., & McKoon, G. (2010). Individual differences, aging, and IQ in two-choice tasks. Cognitive Psychology, 60, 127–157.
Rodríguez-Ferreiro, J., Aguilera, M., & Davies, R. (2020). Semantic priming and schizotypal personality: Reassessing the link between thought disorder and enhanced spreading of semantic activation. PeerJ, 8, e9511.
Rouder, J. N., Haaf, J. M., & Vandekerckhove, J. (2018). Bayesian inference for psychology, part IV: Parameter estimation and Bayes factors. Psychonomic Bulletin & Review, 25(1), 102–113.
Sassenhagen, J., & Alday, P. M. (2016). A common misapplication of statistical inference: Nuisance control with null-hypothesis significance tests. Brain and Language, 162, 42–45.
Schielzeth, H., Dingemanse, N. J., Nakagawa, S., Westneat, D. F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N. A., Garamszegi, L. Z., & Araya‐Ajoy, Y. G. (2020). Robustness of linear mixed‐effects models to violations of distributional assumptions. Methods in Ecology and Evolution, 11(9), 1141–1152.
Schloerke, B., Cook, D., Larmarange, J., Briatte, F., Marbach, M., Thoen, E., Elberg, A., & Crowley, J. (2021). GGally: Extension to ’ggplot2’.
Schmalz, X., Biurrun Manresa, J., & Zhang, L. (2021). What is a Bayes factor? Psychological Methods.
Schoot, R. van de, Depaoli, S., Gelman, A., King, R., Kramer, B., Märtens, K., Tadesse, M. G., Vannucci, M., Willemsen, J., & Yau, C. (2021). Bayesian statistics and modelling. Nature Reviews Methods Primers, 1, 3.
Singmann, H., & Kellen, D. (2019). An introduction to mixed models for experimental psychology. In D. H. Spieler & E. Schumacher (Eds.), New methods in cognitive psychology (pp. 4–31). Psychology Press.
Stone, K., Malsburg, T. von der, & Vasishth, S. (2020). The effect of decay and lexical uncertainty on processing long-distance dependencies in reading. PeerJ, 8, e10438.
Stone, K., Veríssimo, J., Schad, D. J., Oltrogge, E., Vasishth, S., & Lago, S. (2021). The interaction of grammatically distinct agreement dependencies in predictive processing. Language, Cognition and Neuroscience, 36(9), 1159–1179.
Suárez, L., Tan, S. H., Yap, M. J., & Goh, W. D. (2011). Observing neighborhood effects without neighbors. Psychonomic Bulletin & Review, 18(3), 605–611.
Tendeiro, J. N., & Kiers, H. A. L. (2019). A review of issues about null hypothesis Bayesian testing. Psychological Methods, 24(6), 774–795.
Tendeiro, J. N., & Kiers, H. A. L. (in press). On the white, the black, and the many shades of gray in between: Our reply to van Ravenzwaaij and Wagenmakers (2021). Psychological Methods.
van Ravenzwaaij, D., & Wagenmakers, E.-J. (2021). Advantages masquerading as “issues” in Bayesian hypothesis testing: A commentary on Tendeiro and Kiers (2019). Psychological Methods.
Vasishth, S., & Gelman, A. (2021). How to embrace variation and accept uncertainty in linguistic and psycholinguistic data analysis. Linguistics, 59(5), 1311–1342.
Vasishth, S., Mertzen, D., Jäger, L. A., & Gelman, A. (2018). The statistical significance filter leads to overoptimistic expectations of replicability. Journal of Memory and Language, 103, 151–175.
Vasishth, S., Nicenboim, B., Beckman, M. E., Li, F., & Kong, E. J. (2018). Bayesian data analysis in the phonetic sciences: A tutorial introduction. Journal of Phonetics, 71, 147–161.
Vehtari, A., Gelman, A., Simpson, D., Carpenter, B., & Burkner, P.-C. (2021). Rank-normalization, folding, and localization: An improved R-hat for assessing convergence of MCMC. Bayesian Analysis, 16(2), 667–718.
Villalonga, M. B., Sussman, R. F., & Sekuler, R. (2021). Perceptual timing precision with vibrotactile, auditory, and multisensory stimuli. Attention, Perception, & Psychophysics, 83(5), 2267–2280.
Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L. D., François, R., Grolemund, G., Hayes, A., Henry, L., Hester, J., Kuhn, M., Pedersen, T. L., Miller, E., Bache, S. M., Müller, K., Ooms, J., Robinson, D., Seidel, D. P., Spinu, V., … Yutani, H. (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686.
Wingfield, C., & Connell, L. (2022b). Understanding the role of linguistic distributional knowledge in cognition. Language, Cognition and Neuroscience, 1–51.
Yap, M. J., & Balota, D. A. (2009). Visual word recognition of multisyllabic words. Journal of Memory and Language, 60(4), 502–529.
Yap, M. J., Balota, D. A., Sibley, D. E., & Ratcliff, R. (2012). Individual differences in visual word recognition: Insights from the English Lexicon Project. Journal of Experimental Psychology: Human Perception and Performance, 38, 1, 53–79.
Yap, M. J., Hutchison, K. A., & Tan, L. C. (2017). Individual differences in semantic priming performance: Insights from the semantic priming project. In M. N. Jones (Ed.), Frontiers of cognitive psychology. Big data in cognitive science (pp. 203–226). Routledge/Taylor & Francis Group.
Yarkoni, T., Balota, D., & Yap, M. J. (2008). Moving beyond Coltheart’s N: A new measure of orthographic similarity. Psychonomic Bulletin & Review, 15(5), 971–979.

  1. The general cognition measures could also be dubbed general or fluid intelligence, but we think that cognition is more appropriate in our present context.↩︎

  2. For the semantic priming study, the remaining sample sizes up to 2,000 participants have not finished running yet. Upon finishing, they will be reported in this manuscript.↩︎

  3. Information about this facility is available at Even though analysis jobs were run in parallel, some of the statistical analyses took four months to complete (specifically, one month for the final model to run, which was delayed due to three reasons: limited availability of machines, occasional cancellations of jobs to allow maintenance work on the machines, and lack of convergence of the models). Furthermore, the power analysis for the semantic priming study took six months (specifically, two months of running, with delays due to the limited availability of machines and occasional cancellations of jobs).↩︎

Pablo Bernabeu, 2022. Licence: CC BY 4.0.

Online book created using the R package bookdown.