Skip to content

Comparing topics

A bibliometric study often asks not how large a literature is but how its internal emphasis shifts over time. Within deep-learning research, is the share of work that also concerns medical imaging growing faster than the share about computer vision? compare_topics answers that question by counting, and plot_comparison shows the answer. The comparison contacts the Scopus API, so it is shown here but reconstructed offline for the plotting, using a frame of the same shape so the rest of the guide runs without a key.

What the comparison measures

For each year and each comparison term, the function counts the records matching the reference topic combined with that term, then expresses that count as a percentage of the records matching the reference alone. A value of 30% for computer vision in 2020 means that 30% of the deep-learning records that year also mention computer vision. The reference is the denominator, so it sits at 100% by construction and is not drawn.

Running compare_topics makes one count request per term per year, so it needs a configured Scopus key (through pybliometrics) and counts against your quota. Keep the term and year counts modest.

import scopusflow as sf

cmp = sf.compare_topics(
    reference_query="deep learning",
    comparison_terms=[
        "computer vision",
        "natural language processing",
        "medical imaging",
        "drug discovery",
    ],
    years=range(2013, 2022),
    field="TITLE-ABS-KEY",
)

The field argument wraps every term in the same field tag, the way scopus_query does, so each side of the AND searches the title, abstract and keywords.

The shape of the result

The result is a tidy pandas frame with one row per topic and year, carrying the stable COMPARISON_COLUMNS schema. We build one here with the same columns so the rest of the guide runs offline. The reference set grows across the period, which the uncertainty band will later reflect.

import numpy as np
import pandas as pd
import scopusflow as sf

years = list(range(2013, 2022))
ref_n = np.linspace(400, 1600, len(years)).round().astype(int)
counts = {
    "computer vision": np.linspace(140, 720, len(years)).round().astype(int),
    "natural language processing": np.linspace(90, 540, len(years)).round().astype(int),
    "medical imaging": np.linspace(15, 260, len(years)).round().astype(int),
    "drug discovery": np.linspace(8, 170, len(years)).round().astype(int),
}

rows = []
for year, n in zip(years, ref_n):
    rows.append({
        "query": "deep learning", "query_type": "reference",
        "abridged_query": "deep learning", "year": year, "n": int(n),
        "reference_n": int(n), "comparison_percentage": 100.0,
        "average_comparison_percentage": 100.0,
    })
for topic, series in counts.items():
    avg = 100.0 * series.sum() / ref_n.sum()
    for year, n, ref in zip(years, series, ref_n):
        rows.append({
            "query": topic, "query_type": "comparison",
            "abridged_query": topic, "year": year, "n": int(n),
            "reference_n": int(ref),
            "comparison_percentage": 100.0 * n / ref,
            "average_comparison_percentage": avg,
        })

cmp = pd.DataFrame(rows, columns=sf.compare.COMPARISON_COLUMNS)
out(cmp.head())
query query_type abridged_query year n reference_n comparison_percentage average_comparison_percentage
deep learning reference deep learning 2013 400 400 100.0 100.0
deep learning reference deep learning 2014 550 550 100.0 100.0
deep learning reference deep learning 2015 700 700 100.0 100.0
deep learning reference deep learning 2016 850 850 100.0 100.0
deep learning reference deep learning 2017 1000 1000 100.0 100.0

The comparison_percentage column is the per-year share, and average_comparison_percentage is the same ratio computed over the whole period, which is what orders the topics in the plot. A year in which the reference has no records has no defined share, so compare_topics records it as a missing value rather than a misleading zero.

A first plot

With the optional plot extra installed, plot_comparison draws each comparison topic as a line and returns the matplotlib Axes for any further adjustment.

ax = sf.plot_comparison(cmp)
show()
2026-07-03T09:35:07.201083 image/svg+xml Matplotlib v3.11.0, https://matplotlib.org/

The chart uses a colour-blind-safe palette and, because there are only a few topics, labels the lines directly so the reader need not match colours to a legend. Each label carries the topic's total record count. The shaded band around each line is a Wilson stability range, wide in the early years when the reference set is small and the share would move easily, and narrower as the literature grows. Because Scopus returns exact counts rather than a sample, the band is illustrative rather than a confidence interval, a point the caption on the figure makes plain.

Drawing the eye to one topic

When one topic is the focus of a figure, highlight draws it in an accent colour and greys the rest, which keeps the context visible without letting it compete. The named topic must be one of the comparison topics in the frame.

ax = sf.plot_comparison(cmp, highlight="medical imaging")
show()
2026-07-03T09:35:07.570858 image/svg+xml Matplotlib v3.11.0, https://matplotlib.org/

Only the highlighted topic keeps its band, so the eye settles on the one line that matters while the others recede.

Adjusting the labels and the band

The count suffix on each label can be turned off with counts_in_legend, and the band can be removed with interval, when a cleaner look is wanted.

ax = sf.plot_comparison(cmp, counts_in_legend=False, interval=False)
show()
2026-07-03T09:35:07.940913 image/svg+xml Matplotlib v3.11.0, https://matplotlib.org/

The return value is an ordinary matplotlib Axes, so a different style, a saved file or any further tweak is one method call away, for instance ax.figure.savefig("topics.png", dpi=200).

Reading the result as a table

Sometimes the numbers matter more than the picture. Because the output is a pandas frame, the usual tools apply. Here are the comparison topics ranked by their average share.

comp = cmp[cmp["query_type"] == "comparison"]
ranked = (comp[["abridged_query", "average_comparison_percentage"]]
          .drop_duplicates()
          .sort_values("average_comparison_percentage", ascending=False))
out(ranked)
abridged_query average_comparison_percentage
computer vision 43.000000
natural language processing 31.500000
medical imaging 13.755556
drug discovery 8.900000