Workflow:Quality Assurance: Iterative Seed Issue Decision Tree
Purpose, Context and Content
This decision tree is meant to provide an outline for completing seed-by-seed quality assurance, beginning with data from Heritrix crawl reports and iterating on input, crawler, or other variables until either the captures improve or a seed URL is deemed non-archivable. At our organization, this workflow is conducted entirely by the Web Archiving Team, the technical team which facilitates the contracted crawling, the use of our curatorial workflow tool Digiboard, and the ingest and access to the web archives (this latter in conjunction with our Office of the Chief Information Officer or OCIO).
This workflow or some version of it has been in place for a long time in our program. It is labor intensive and certainly cannot be completed on 100% of the materials going into the crawls or 100% of the materials coming out of the crawls. That said, close attention to detail may yield results, even if QA cannot be completed all the time on everything.