The Data Feed changefiles show all the changes in the Unpaywall database over time. They are provided for subscribers to the Unpaywall Data Feed. Files use the same schema as the REST API and database snapshot. This list is also available via a JSON endpoint for programmatic access.

API key required

Show changes in last

Access changefiles

Each changefile has a timestamp in its filename that tells you the most recent update it contains.

  • Weekly: changed_dois_with_versions_YYYY-MM-DDThhmmss_to_YYYY-MM-DDThhmmss.jsonl.gz
  • Daily: changed_dois_with_versions_YYYY-MM-DDThhmmss.jsonl.gz

You can use the snapshot and changefiles together to keep your copy up to date by following these steps:

  1. Download and import the current daily snapshot.
  2. Determine the last update timestamp in the snapshot.
    • If you've already imported the snapshot, this is the maximum value of the updated field among all rows.
    • The timestamp in the snapshot file name is an upper bound on the max-updated value: unpaywall_snapshot_2021-08-04T083001.jsonl.gz
    • The file name is provided in the Content-Disposition header if you download it with a utility like curl: Content-Disposition: attachment; filename="unpaywall_snapshot_2021-08-04T083001.jsonl.gz"
  3. Download all changefiles, starting with the most recent file wiith a last-updated timestamp before that of the snapshot. If the available changefiles are:
      1. changed_dois_with_versions_2021-05-01T080001.jsonl.gz
      2. changed_dois_with_versions_2021-05-02T080001.jsonl.gz
      3. changed_dois_with_versions_2021-05-03T080001.jsonl.gz
    and the snapshot timestamp is 2021-05-02T120000, get changefiles 2 and 3 in that order.
  4. Import each changefile by reading it line by line, overwriting or updating the previous record for that row's DOI.
  5. Continue to import changefiles as above, as they are released.

JSON Endpoint

GET api.unpaywall.org/feed/changefiles?api_key=YOUR_API_KEY&interval=INTERVAL
Description Provides a JSON object containing a list of changefile attributes and URLs.
Accepts
  • api_key (string, required): Your API key, issued when you subscribe to the data feed.
  • interval (string, optional): Which set of changefiles to list. Options:
    • "week" (default): Files produced weekly on Thursdays, containing all records updated in the previous 9 days. Each file overlaps with the previous week's to account for variability in the time to produce the file and updates that occur while generating it.
    • "day": Files produced every day, containing all records updated since the last file was generated. No overlap is needed because each export stores the updated timestamp of each record, for use by the next export process.

The difference in overlap behavior is an implementation detail that shouldn't affect your import process; each row in a changefile should overwrite the corresponding record in the dataset regardless.

Returns

An object containing an array of changefiles:

{
  "list": [
    {
      "date": string (YYYY-MM-DD),
      "filename": string,
      "filetype": string ("jsonl", or "csv"),
      "last_modified": string (YYYY-MM-DDThh:mm:ss)
      "lines": integer,
      "size": integer,
      "url": string
    },
    …
  ]
}
                            
Example https://api.unpaywall.org/feed/changefiles?api_key=YOUR_API_KEY&interval=day