Workflow:Quality Assurance: Iterative Seed Issue Decision Tree

Jump to navigation Jump to search
Quality Assurance: Iterative Seed Issue Decision Tree
Input:Web Archives visual replay and crawl report data
Output:Adjustments to seed URLs and scopes; the results of a future crawl; documentation
Organisation:Library of Congress

Workflow Description[edit]

Quality Assurance: Seed Issue Decision Tree

Purpose, Context and Content[edit]

This decision tree is meant to provide an outline for completing seed-by-seed quality assurance, beginning with data from Heritrix crawl reports and iterating on input, crawler, or other variables until either the captures improve or a seed URL is deemed non-archivable. At our organization, this workflow is conducted entirely by the Web Archiving Team, the technical team which facilitates the contracted crawling, the use of our curatorial workflow tool Digiboard, and the ingest and access to the web archives (this latter in conjunction with our Office of the Chief Information Officer or OCIO).


This workflow or some version of it has been in place for a long time in our program. It is labor intensive and certainly cannot be completed on 100% of the materials going into the crawls or 100% of the materials coming out of the crawls. That said, close attention to detail may yield results, even if QA cannot be completed all the time on everything.

Further Information[edit]