Workflow:Web Archiving Capture Assessment Response Processing Workflow

Jump to navigation Jump to search
Web Archiving Capture Assessment Response Processing Workflow
Input:Visual curatorial assessments of web archives captures.
Output:Jira tickets, QA on web archives, emails.
Organisation:Library of Congress

Workflow Description[edit]

Capture Assessment Response Processes

Purpose, Context and Content[edit]

Currently at the LC, our systems for web archives curatorial workflows (Digiboard), Web Archiving Team (WAT) documentation and work processes (Confluence/Jira), Web Archives Replay (OpenWayback/Pywb), and capture (Digiboard for curating, a vendor for at-scale crawling, other ticketing systems to submit seed lists, Heritrix and browser-based crawlers for capture, and subsequent crawl reports, storage, etc.) do not speak to each other. A process was designed whereby curatorial staff, digital technicians, acquisitions specialists, and other designated individuals could review captures of given sites and send responses in a structured way to the WAT. That process is called Capture Assessment. For our purposes, Quality Assurance (QA) is the technical process completed by WAT and the Library crawl vendor to iteratively adjust crawl parameters and other variables in order to improve capture quality over time. Capture Assessment response processing, featured in this workflow, is the workflow through which the WAT review capture assessments and complete QA as-needed.


This workflow is relatively new and experimental in our program as of 6/2/2023. The workflow replaced a previous workflow where a single yes/no question was asked in the banner of Wayback, "is this a good capture?", and no notification went to the Web Archiving Team when curatorial staff submitted an answer. While the responses were recorded in a module in Digiboard, it was not a reliable tool and did not always yield detailed or usable results.

Further Information[edit]