Getting started#
This guide works through a short analysis with depictr, from a first look at the
data to a fitted model and its diagnostics. Every function returns a
plotnine object, so anything shown here can be refined
further with the usual + syntax.
Install#
pip install depictr # core: plotnine, pandas, numpy, matplotlib, scipy
pip install depictr[all] # plus the optional computation back-ends
The exploratory, theme and accessibility tools work with the core install. The
model, classification and survival plots delegate their computation to
statsmodels, scikit-learn and lifelines respectively, each an optional extra
(depictr[models], depictr[classification], depictr[survival]).
The idea#
depictr gives the whole workflow one theme, one colourblind-safe palette (the Okabe-Ito set) and one calling convention. Where a specialist package already computes a quantity well, depictr hands the work to it and redraws the result under the shared theme, so a ROC curve, a coefficient plot and a survival curve all share the same visual language.
import depictr as dp
Each call returns a plotnine object. In a Jupyter notebook it renders on
display; in a script, call .show(), or save it with dp.save_plot(p, "fig.png").
A first look at the data#
depictr ships a few reproducibly simulated datasets. Here is a lexical-decision experiment with reaction times in two priming conditions.
ld = dp.lexical_decision()
dp.explore_distribution(ld, "RT", group="condition", kind="density",
legend_inside=True)
The legend sits inside the panel, in the corner the distribution leaves empty. For a wider survey, a correlation heatmap and a missing-data map give a quick overview:
wb = dp.wellbeing_survey()
dp.correlation_heatmap(wb)
dp.missingness_map(wb)
Fitting and reading a model#
depictr does not fit models; it reads a model you have fitted, or a tidy table of estimates. Fit an ordinary least-squares model with statsmodels, then read it from several angles.
import statsmodels.formula.api as smf
cy = dp.crop_yield()
# Q() quotes "yield" because it is a Python keyword.
model = smf.ols('Q("yield") ~ fertiliser + rainfall + soil_ph + treatment',
cy).fit()
dp.coefficient_plot(model, title="Drivers of crop yield")
dp.effects_plot(model, "fertiliser")
dp.residual_diagnostics_plot(model)
coefficient_plot also accepts a plain data frame of estimates (with columns
term, estimate, conf_low, conf_high), so estimates from any source – a
Bayesian fit, a bootstrap, a table copied from a paper – plot the same way.
Survival and classification#
These families delegate to lifelines and scikit-learn. Kaplan-Meier curves with a log-rank test and a number-at-risk table are one call:
ct = dp.clinical_trial()
dp.survival_plot(ct["time"], ct["event"], group=ct["arm"],
risk_table=True, legend_inside=True)
dp.roc_curve_plot(ct["adverse_event"], ct["biomarker"])
Accessibility, checked rather than asserted#
The default palette is the Okabe-Ito set, and that choice is verified rather than assumed. A Machado-2009 simulator and a CIE-Lab distance test report how far apart the palette’s colours stay under each form of colour-vision deficiency.
dp.palette_safety()
# {'min_delta_e': ..., 'safe': True, 'by_condition': {...}, ...}
Extending and composing#
Because every function returns a plotnine object, the grammar-of-graphics extensions apply:
from plotnine import labs
dp.roc_curve_plot(ct["adverse_event"], ct["biomarker"]) + labs(title="Adverse event")
To place several plots in one figure, use arrange_plots:
dp.arrange_plots(dp.qq_plot(model), dp.influence_plot(model), ncol=2)
Where next#
The gallery renders a worked example from every family.
The API reference documents each function and its options.