Skip to contents

A retrieval is only as good as its query. This article shows how to compose correct, field-tagged ‘Scopus’ queries with scopus_query() rather than pasting fragments by hand, where a missing bracket or a mistyped tag quietly returns the wrong records. Everything here is string construction, so it all runs offline; each query is shown as the literal string it produces.

Field tags decide where to look

A field tag restricts a query to part of a record. scopus_field_tags() lists the common ones.

scopus_field_tags()
#> # A tibble: 12 × 2
#>    tag                searches                                  
#>    <chr>              <chr>                                     
#>  1 TITLE              Words in the document title               
#>  2 TITLE-ABS-KEY      Title, abstract and keywords              
#>  3 TITLE-ABS-KEY-AUTH Title, abstract, keywords and author names
#>  4 ABS                Abstract text                             
#>  5 KEY                Indexed and author keywords               
#>  6 AUTH               Author names                              
#>  7 AUTHKEY            Author-supplied keywords                  
#>  8 AFFIL              Affiliation, any part                     
#>  9 AFFILORG           Affiliation organisation name             
#> 10 SRCTITLE           Source (publication) title                
#> 11 DOI                Digital Object Identifier                 
#> 12 ALL                All available fields

The most generally useful tag is TITLE-ABS-KEY, which searches the title, abstract and keywords together, broad enough to catch a topic without the noise of a full-text match.

One term, many disciplines

The same builder serves any field. Each call below returns the exact query string that would be sent to ‘Scopus’.

scopus_query("CRISPR", .field = "TITLE-ABS-KEY")              # molecular biology
#> [1] "TITLE-ABS-KEY(CRISPR)"
scopus_query("gravitational waves", .field = "TITLE-ABS-KEY") # physics
#> [1] "TITLE-ABS-KEY(gravitational waves)"
scopus_query("microplastics", .field = "TITLE-ABS-KEY")       # environmental science
#> [1] "TITLE-ABS-KEY(microplastics)"
scopus_query("blockchain", .field = "TITLE-ABS-KEY")          # computer science
#> [1] "TITLE-ABS-KEY(blockchain)"
scopus_query("digital humanities", .field = "AUTHKEY")        # humanities
#> [1] "AUTHKEY(digital humanities)"

The last example uses AUTHKEY, the author-supplied keywords, which isolates work that self-identifies with a field and so cuts incidental mentions.

Combining terms with boolean operators

Passing several terms joins them. The default operator is AND, and OR or AND NOT are available through .op.

# Two concepts that must co-occur (materials science).
scopus_query("perovskite", "solar cell", .field = "TITLE-ABS-KEY")
#> [1] "TITLE-ABS-KEY(perovskite) AND TITLE-ABS-KEY(solar cell)"

# Spelling variants, either of which will do (economics).
scopus_query("behavioral economics", "behavioural economics", .op = "OR")
#> [1] "behavioral economics OR behavioural economics"

# A family of related tools (molecular biology).
scopus_query("CRISPR", "Cas9", "Cas12", .op = "OR")
#> [1] "CRISPR OR Cas9 OR Cas12"

From a query to a plan

A composed query drops straight into the rest of the workflow. Here it anchors a year-partitioned plan, which keeps each cell under the API’s 5000-record ceiling.

q <- scopus_query("gut microbiome", "immunology", .field = "TITLE-ABS-KEY")
q
#> [1] "TITLE-ABS-KEY(gut microbiome) AND TITLE-ABS-KEY(immunology)"
plan <- scopus_plan(q, years = 2015:2022, partition = "year")
plan
#> <scopus_plan> (8 cells, view "STANDARD", partition "year")
#> # A tibble: 8 × 6
#>    cell query                                        date   year view  page_size
#> * <int> <chr>                                        <chr> <int> <chr>     <int>
#> 1     1 TITLE-ABS-KEY(gut microbiome) AND TITLE-ABS… 2015   2015 STAN…       200
#> 2     2 TITLE-ABS-KEY(gut microbiome) AND TITLE-ABS… 2016   2016 STAN…       200
#> 3     3 TITLE-ABS-KEY(gut microbiome) AND TITLE-ABS… 2017   2017 STAN…       200
#> 4     4 TITLE-ABS-KEY(gut microbiome) AND TITLE-ABS… 2018   2018 STAN…       200
#> 5     5 TITLE-ABS-KEY(gut microbiome) AND TITLE-ABS… 2019   2019 STAN…       200
#> 6     6 TITLE-ABS-KEY(gut microbiome) AND TITLE-ABS… 2020   2020 STAN…       200
#> 7     7 TITLE-ABS-KEY(gut microbiome) AND TITLE-ABS… 2021   2021 STAN…       200
#> 8     8 TITLE-ABS-KEY(gut microbiome) AND TITLE-ABS… 2022   2022 STAN…       200

The plan is ready to size and run, which contacts the API.

scopus_count(q, years = 2015:2022)
records <- scopus_fetch_plan(plan)

Searching by affiliation

Field tags reach beyond topics. AFFILORG searches the affiliation, which turns a query into an institution-level view of output.

scopus_query("Max Planck", .field = "AFFILORG")
#> [1] "AFFILORG(Max Planck)"

When a term is empty

The builder validates its input, so a stray empty term is caught early rather than producing a malformed query.

tryCatch(
  scopus_query("graphene", ""),
  scopus_error_bad_input = function(e) conditionMessage(e)
)
#> [1] "`...` must be one or more non-empty character terms."