scopusflow¶

scopusflow is a reproducible workflow layer over pybliometrics for Scopus searches. It is the Python twin of the R package scopusflow and follows the same design.

Status

This is an early release (0.1.1), covered by an offline test suite. The retrieval and abstract drivers are thin layers over pybliometrics, so a short trial run against your installed version is worth doing before a large live harvest.

Why this exists¶

pybliometrics is the mature way to reach the Scopus API from Python. It wraps around ten endpoints and handles the HTTP, cursor pagination, weekly-quota rotation and per-query caching. What it does not provide is a workflow on top of that plumbing, such as a declarative search plan, a single record schema that holds across query types, a resumable harvest with checkpoints, or DOI change-tracking between runs. scopusflow fills that gap, and depends on pybliometrics rather than re-implementing the plumbing it already does well.

	pybliometrics	scopusflow
Reach the API (search, retrieval, quota, cursor, cache)	yes	delegates
Declarative, reproducible search plan	no	yes
One stable record schema across query types	no	yes
Resumable, checkpointed harvest of a plan	no	yes
DOI extraction and change-tracking between runs	no	yes
Annual publication trends without downloading records	no	yes
Topic-trend comparison with stability bands	no	yes
Batch abstract retrieval, resilient per id	no	yes
Trend and top-source/author plots	no	yes
Export to reference managers (BibTeX, RIS)	no	yes

The other Python options are not live alternatives. elsapy was archived as read-only in January 2025, and pyscopus has had no release since 2018.

Install¶

pip install scopusflow          # add [plot] for figures, [app] for the code-free app

A Scopus API key configured for pybliometrics, in its standard ~/.config/pybliometrics.cfg, is needed only for the steps that contact the API.

A first search¶

import scopusflow as sf

q = sf.scopus_query("graphene", "supercapacitor", field="TITLE-ABS-KEY")
plan = sf.SearchPlan(q, years=range(2010, 2023), partition="year")

records = sf.fetch_plan(plan, cache_dir="harvest", resume=True)

sf.top(records, by="source")
dois = sf.extract_dois(records)

later = sf.fetch_plan(plan, cache_dir="harvest2")
sf.diff_dois(old=records, new=later)

trend = sf.scopus_trend(q, years=range(2010, 2023))
sf.plot_trend(trend)

The guides give worked walk-throughs of each part of the workflow, from designing a query to comparing topics and exporting the result. The reference documents the full API.

If scopusflow contributes to published work, please cite it.