Software Sustainability Institute Fellowship

Mixed-effects models in R and a new tool for data simulation

Presentation

In this talk, I will look over the rationale for LMEMs, and demonstrate how to fit them in R (Brauer & Curtin, 2018; Luke, 2017). Challenges will also be covered. For instance, when using the widely-accepted 'maximal' approach, based on fitting all possible random effects for each fixed effect, models sometimes fail to find a solution, or 'convergence'. Advice for the problem of nonconvergence will be demonstrated, based on the progressive lightening of the random effects structure (Singman & Kellen, 2017; for an alternative approach, especially with small samples, see Matuschek et al., 2017). At the end, on a different note, I will present a web application that facilitates data simulation for research and teaching (Bernabeu & Lynott, 2020).

Reproducibilidad en torno a una aplicación web

Presentation

Las aplicaciones web nos ayudan a facilitar el uso de nuestro trabajo, ya que no requieren programación para utilizarlas. Crear estas aplicaciones en R, mediante paquetes como "shiny" o "flexdashboard", ofrece múltiples ventajas. Entre ellas destaca la reproducibilidad, tal como veremos en torno a una aplicación para la simulación de datos (https://github.com/pablobernabeu/Experimental-data-simulation).

Web application for the simulation of experimental data

Application / dashboard

Open-source R-based web application for creating varied experimental data sets with customizable structures including between-group and within-participant variables that can be categorical or continuous.

Data dashboard: Butterfly species richness in Los Angeles

Application / dashboard

Dashboard with open data from a study by Prudic et al. (2018), that compares citizen science with traditional methods in butterfly sampling. Coding tasks included long-transforming, merging, and as ever, wrangling with a table.

Data is present: Workshops and datathons

Post

This project offers free activities to learn and practise reproducible data presentation. Pablo Bernabeu organises these events in the context of a Software Sustainability Institute Fellowship. Programming languages such as R and Python offer free, powerful resources for data processing, visualisation and analysis. Experience in these programs is highly valued in data-intensive disciplines. Original data has become a public good in many research fields thanks to cultural and technological advances. On the internet, we can find innumerable data sets from sources such as scientific journals and repositories (e.g., OSF), local and national governments, non-governmental organisations (e.g., data.world), etc. Activities comprise free workshops and datathons.