Covariates are necessary to validate the variables of interest and to prevent bogus theories

2023 research methods, statistics

The need for covariates—or nuisance variables—in statistical analyses is twofold. The first reason is purely statistical and the second reason is academic.

First, the use of covariates is often necessary when the variable(s) of interest in a study may be connected to, and affected by, some satellite variables (Bottini et al., 2022; Elze et al., 2017; Sassenhagen & Alday, 2016). This complex scenario is the most common one due to the multivariate, dynamic, interactive nature of the real world.

Second, the use of covariates is often necessary to prevent the development of bogus, redundant theories. Academics are strongly rewarded for developing theories. As we know, wherever there are strong rewards, there are serious risks. An academic could—consciously or not—produce a theory that is too closely related to an existing theory. So closely related are these theories that the second version might not warrant a name of its own. In such a scenario, covariates are useful and indeed necessary to vet the unique nature of the second version. That is, the first and the second version must be tested in the same model, and the variables corresponding to the first version can be construed as covariates. This allows both the developers of the theories and the readers to compare the effects corresponding to each version of the theory, and to assess the degree of separation between them.

The perverted use of covariates (Stefan & Schönbrodt, 2023)—however frequent and harmful—stands completely orthogonal to the correct usage of covariates, in the same way that a stethoscope can be used for good or for bad purposes. It would be poorly informed and misleading to conflate the correct and the incorrect uses, or to reject the use of covariates altogether due to the incorrect uses.

In conclusion, the effects of interest in correlational/observational studies can be subject to mediation and moderation by satellite variables. These variables cannot be manipulated in correlational/observational studies, but they can—and often should—be included as covariates in the statistical models, to ward off spurious results and to vet similar theories.

References

Bottini, R., Morucci, P., D’Urso, A., Collignon, O., & Crepaldi, D. (2022). The concreteness advantage in lexical decision does not depend on perceptual simulations. Journal of Experimental Psychology: General, 151(3), 731–738. https://doi.org/10.1037/xge0001090

Elze, M. C., Gregson, J., Baber, U., Williamson, E., Sartori, S., Mehran, R., Nichols, M., Stone, G. W., & Pocock, S. J. (2017). Comparison of propensity score methods and covariate adjustment: Evaluation in 4 cardiovascular studies. Journal of the American College of Cardiology, 69(3), 345-357. https://doi.org/10.1016/j.jacc.2016.10.060

Sassenhagen, J., & Alday, P. M. (2016). A common misapplication of statistical inference: Nuisance control with null-hypothesis significance tests. Brain and Language, 162, 42-45. https://doi.org/10.1016/j.bandl.2016.08.001

Stefan, A. M., & Schönbrodt, F. D. (2023). Big little lies: A compendium and simulation of p-hacking strategies. Royal Society Open Science, 10(2), 220346. https://doi.org/10.1098/rsos.220346

research methods statistics conflation s