Building a reference set¶

A retrieval becomes useful once it leaves the package, as a reading list in a reference manager or as input to a writing project. This guide covers that export end of the workflow, taking a set of records from the API through a clean DOI list and on into the two interchange formats that reference managers read.

Fetch the record set¶

Everything here starts from a frame of records. You build one by running a SearchPlan through fetch_plan, which drives pybliometrics one cell at a time and returns a single frame with the stable RECORD_COLUMNS schema. This step contacts the Scopus API, so it needs a working API key configured for pybliometrics. Run it once and keep the frame in hand for the rest of the guide.

import scopusflow as sf

q = sf.scopus_query("graphene", "supercapacitor", field="TITLE-ABS-KEY")
plan = sf.SearchPlan(q, years=range(2018, 2023), partition="year")

records = sf.fetch_plan(plan, cache_dir="graphene-harvest")
records.shape

The output below runs on a small synthetic record set so the guide works without a key. The result is an ordinary pandas DataFrame underneath, so it drops straight into any analysis you already have. The columns are always the same whatever the query was, which is what lets the DOI and export helpers that follow rely on them.

out(records[["title", "year", "doi"]].head())

title	year	doi
A study of graphene supercapacitors, part 1	2018	10.1000/demo.2018.000
A study of graphene supercapacitors, part 2	2018	10.1000/demo.2018.001
A study of graphene supercapacitors, part 3	2018	10.1000/demo.2018.002
A study of graphene supercapacitors, part 4	2018	10.1000/demo.2018.003
A study of graphene supercapacitors, part 5	2019	10.1000/demo.2019.000

A clean, deduplicated DOI list¶

Reference managers such as Zotero import most reliably from DOIs. extract_dois pulls them from the doi column, strips any resolver prefix or doi: label, and removes duplicates compared case-insensitively, so the same article imports once even when its DOI was stored with a https://doi.org/ prefix or in a different case.

dois = sf.extract_dois(records)
out(dois[:5])

['10.1000/demo.2018.000', '10.1000/demo.2018.001', '10.1000/demo.2018.002', '10.1000/demo.2018.003', '10.1000/demo.2019.000']

The function works offline because it only reads the frame you already hold. It returns a plain Python list rather than writing a file, so you stay in control of where anything lands. Pass dedupe=False if you want to keep every occurrence, for instance to count how often a DOI recurs across cells.

all_dois = sf.extract_dois(records, dedupe=False)
out((len(all_dois), len(dois)))

(20, 20)

Writing the list out is then a one-liner with the standard library, which keeps the package free of any implicit filesystem writes.

from pathlib import Path

Path("reference-set.txt").write_text("\n".join(dois), encoding="utf-8")

Render to BibTeX and RIS¶

A DOI list is enough for an import-by-identifier, but a full record carries more. to_bibtex and to_ris render the set in the two formats that reference managers read, so a search moves straight into Zotero, EndNote, Mendeley or a LaTeX bibliography. Each record becomes one entry with its authors split out, and both functions are pure and offline, returning a string rather than touching disk.

ris = sf.to_ris(records)
out(ris[:320])

TY  - JOUR
TI  - A study of graphene supercapacitors, part 1
AU  - Lee J.
AU  - Park S.
PY  - 2018
JO  - Nano Letters
DO  - 10.1000/demo.2018.000
N1  - Scopus ID: 2018000
ER  - 

TY  - JOUR
TI  - A study of graphene supercapacitors, part 2
AU  - Park S.
AU  - Kim H.
PY  - 2018
JO  - Advanced Materials
DO  - 10.1000/dem

BibTeX works the same way, one @article entry per record. The citation keys are built from the first author's surname and the year, and made unique within the export so that biber does not reject duplicates.

bibtex = sf.to_bibtex(records)
out(bibtex[:320])

@article{lee2018,
  author = {Lee J. and Park S.},
  title = {A study of graphene supercapacitors, part 1},
  journal = {Nano Letters},
  year = {2018},
  doi = {10.1000/demo.2018.000},
  note = {Scopus ID: 2018000},
}

@article{park2018,
  author = {Park S. and Kim H.},
  title = {A study of graphene supercapacitors,

Save the export¶

Because both functions hand back a string, writing the whole set is again a plain file write, in whichever format your downstream tool expects. A .bib file feeds a LaTeX bibliography directly, while a .ris file imports into most reference managers.

from pathlib import Path

Path("reference-set.bib").write_text(sf.to_bibtex(records), encoding="utf-8")
Path("reference-set.ris").write_text(sf.to_ris(records), encoding="utf-8")

From there the search is portable. Open the .ris file from Zotero or EndNote to bring the whole set into your library, or point a LaTeX project at the .bib file and cite by the keys it contains. The DOI list remains a lighter alternative when you only need to seed an import by identifier and would rather let the reference manager fetch the metadata itself.