Workflow:Quality Assurance: Iterative Seed Issue Decision Tree

From COPTR
Revision as of 20:49, 2 June 2023 by Meghly (talk | contribs) (Created page with "{{Infobox COW |status=Production |tools=Heritrix, Webrecorder, OpenWayback, Pywb, OutbackCDX, CDX |input=Web Archives visual replay and crawl report data |output=Adjustments t...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search
Quality Assurance: Iterative Seed Issue Decision Tree
Status:Production
Tools:
Input:Web Archives visual replay and crawl report data
Output:Adjustments to seed URLs and scopes; the results of a future crawl; documentation
Organisation:Library of Congress

Workflow Description

Quality Assurance: Seed Issue Decision Tree


Purpose, Context and Content

This decision tree is meant to provide an outline for completing seed-by-seed quality assurance, beginning with data from Heritrix crawl reports and iterating on input, crawler, or other variables until either the captures improve or a seed URL is deemed non-archivable.

Evaluation/Review

This workflow or some version of it has been in place for a long time in our program. It is labor intensive and certainly cannot be completed on 100% of the materials going into the crawls or 100% of the materials coming out of the crawls. That said, close attention to detail may yield results, even if QA cannot be completed all the time on everything.

Further Information