Workflow:Quality Assurance: Iterative Seed Issue Decision Tree

From COPTR
Revision as of 19:21, 15 June 2023 by Meghly (talk | contribs) (→‎Purpose, Context and Content)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search
Quality Assurance: Iterative Seed Issue Decision Tree
Status:Production
Tools:
Input:Web Archives visual replay and crawl report data
Output:Adjustments to seed URLs and scopes; the results of a future crawl; documentation
Organisation:Library of Congress

Workflow Description

Quality Assurance: Seed Issue Decision Tree


Purpose, Context and Content

This decision tree is meant to provide an outline for completing seed-by-seed quality assurance, beginning with data from Heritrix crawl reports and iterating on input, crawler, or other variables until either the captures improve or a seed URL is deemed non-archivable. At our organization, this workflow is conducted entirely by the Web Archiving Team, the technical team which facilitates the contracted crawling, the use of our curatorial workflow tool Digiboard, and the ingest and access to the web archives (this latter in conjunction with our Office of the Chief Information Officer or OCIO).

Evaluation/Review

This workflow or some version of it has been in place for a long time in our program. It is labor intensive and certainly cannot be completed on 100% of the materials going into the crawls or 100% of the materials coming out of the crawls. That said, close attention to detail may yield results, even if QA cannot be completed all the time on everything.

Further Information