
Designing precise queries across disciplines
Source:vignettes/designing-queries.Rmd
designing-queries.RmdA retrieval is only as good as its query. This article shows how to
compose correct, field-tagged ‘Scopus’ queries with
scopus_query() rather than pasting fragments by hand, where
a missing bracket or a mistyped tag quietly returns the wrong records.
Everything here is string construction, so it all runs offline; each
query is shown as the literal string it produces.
Field tags decide where to look
A field tag restricts a query to part of a record.
scopus_field_tags() lists the common ones.
scopus_field_tags()
#> # A tibble: 12 × 2
#> tag searches
#> <chr> <chr>
#> 1 TITLE Words in the document title
#> 2 TITLE-ABS-KEY Title, abstract and keywords
#> 3 TITLE-ABS-KEY-AUTH Title, abstract, keywords and author names
#> 4 ABS Abstract text
#> 5 KEY Indexed and author keywords
#> 6 AUTH Author names
#> 7 AUTHKEY Author-supplied keywords
#> 8 AFFIL Affiliation, any part
#> 9 AFFILORG Affiliation organisation name
#> 10 SRCTITLE Source (publication) title
#> 11 DOI Digital Object Identifier
#> 12 ALL All available fieldsThe most generally useful tag is TITLE-ABS-KEY, which
searches the title, abstract and keywords together, broad enough to
catch a topic without the noise of a full-text match.
One term, many disciplines
The same builder serves any field. Each call below returns the exact query string that would be sent to ‘Scopus’.
scopus_query("CRISPR", .field = "TITLE-ABS-KEY") # molecular biology
#> [1] "TITLE-ABS-KEY(CRISPR)"
scopus_query("gravitational waves", .field = "TITLE-ABS-KEY") # physics
#> [1] "TITLE-ABS-KEY(gravitational waves)"
scopus_query("microplastics", .field = "TITLE-ABS-KEY") # environmental science
#> [1] "TITLE-ABS-KEY(microplastics)"
scopus_query("blockchain", .field = "TITLE-ABS-KEY") # computer science
#> [1] "TITLE-ABS-KEY(blockchain)"
scopus_query("digital humanities", .field = "AUTHKEY") # humanities
#> [1] "AUTHKEY(digital humanities)"The last example uses AUTHKEY, the author-supplied
keywords, which isolates work that self-identifies with a field and so
cuts incidental mentions.
Combining terms with boolean operators
Passing several terms joins them. The default operator is
AND, and OR or AND NOT are
available through .op.
# Two concepts that must co-occur (materials science).
scopus_query("perovskite", "solar cell", .field = "TITLE-ABS-KEY")
#> [1] "TITLE-ABS-KEY(perovskite) AND TITLE-ABS-KEY(solar cell)"
# Spelling variants, either of which will do (economics).
scopus_query("behavioral economics", "behavioural economics", .op = "OR")
#> [1] "behavioral economics OR behavioural economics"
# A family of related tools (molecular biology).
scopus_query("CRISPR", "Cas9", "Cas12", .op = "OR")
#> [1] "CRISPR OR Cas9 OR Cas12"From a query to a plan
A composed query drops straight into the rest of the workflow. Here it anchors a year-partitioned plan, which keeps each cell under the API’s 5000-record ceiling.
q <- scopus_query("gut microbiome", "immunology", .field = "TITLE-ABS-KEY")
q
#> [1] "TITLE-ABS-KEY(gut microbiome) AND TITLE-ABS-KEY(immunology)"
plan <- scopus_plan(q, years = 2015:2022, partition = "year")
plan
#> <scopus_plan> (8 cells, view "STANDARD", partition "year")
#> # A tibble: 8 × 6
#> cell query date year view page_size
#> * <int> <chr> <chr> <int> <chr> <int>
#> 1 1 TITLE-ABS-KEY(gut microbiome) AND TITLE-ABS… 2015 2015 STAN… 200
#> 2 2 TITLE-ABS-KEY(gut microbiome) AND TITLE-ABS… 2016 2016 STAN… 200
#> 3 3 TITLE-ABS-KEY(gut microbiome) AND TITLE-ABS… 2017 2017 STAN… 200
#> 4 4 TITLE-ABS-KEY(gut microbiome) AND TITLE-ABS… 2018 2018 STAN… 200
#> 5 5 TITLE-ABS-KEY(gut microbiome) AND TITLE-ABS… 2019 2019 STAN… 200
#> 6 6 TITLE-ABS-KEY(gut microbiome) AND TITLE-ABS… 2020 2020 STAN… 200
#> 7 7 TITLE-ABS-KEY(gut microbiome) AND TITLE-ABS… 2021 2021 STAN… 200
#> 8 8 TITLE-ABS-KEY(gut microbiome) AND TITLE-ABS… 2022 2022 STAN… 200The plan is ready to size and run, which contacts the API.
scopus_count(q, years = 2015:2022)
records <- scopus_fetch_plan(plan)Searching by affiliation
Field tags reach beyond topics. AFFILORG searches the
affiliation, which turns a query into an institution-level view of
output.
scopus_query("Max Planck", .field = "AFFILORG")
#> [1] "AFFILORG(Max Planck)"When a term is empty
The builder validates its input, so a stray empty term is caught early rather than producing a malformed query.
tryCatch(
scopus_query("graphene", ""),
scopus_error_bad_input = function(e) conditionMessage(e)
)
#> [1] "`...` must be one or more non-empty character terms."