Retrieve¶
Size a search cheaply first, then execute a plan as a resumable, checkpointed harvest, and pull fuller records.
scopus_count ¶
scopus_count(query, years=None, field=None, view='STANDARD', **kwargs)
Return how many records the (optionally year-filtered) query matches.
A single cheap request that does not download the records, so it is the right way to size a search before committing quota to a harvest.
Source code in src/scopusflow/count.py
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 | |
fetch_plan ¶
fetch_plan(plan, cache_dir=None, resume=True, format='parquet', should_stop=None, **kwargs)
Run every cell of plan and return one normalised DataFrame.
With cache_dir set, each cell is written to disk as it completes, so an
interrupted or quota-limited run resumes without re-fetching finished cells.
format selects the checkpoint format ("parquet" or "csv"); parquet
silently falls back to CSV when no parquet engine is installed. Pass a
zero-argument should_stop callable to allow co-operative cancellation: it
is checked before each cell and the harvest stops (returning what it has) when
it returns True. Per-cell progress is emitted on the "scopusflow"
logger.
Source code in src/scopusflow/fetch.py
75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 | |
scopus_abstract ¶
scopus_abstract(ids, by='doi', view='META', **kwargs)
Retrieve abstracts for one or many ids, resilient per id.
by selects the lookup type ("doi", "eid" or "scopus_id"). Any id that
fails is warned about and yields an all-NA row that still records the id.
Source code in src/scopusflow/abstract.py
47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 | |