Workflow:Web Archiving Quality Assurance Lifecycle
Workflow Description
Purpose, Context and Content
This workflow is an illustration of the life cycle of the quality assurance (QA) process in place in the Web Archiving Program at the Library of Congress as of March 25th, 2023. This work flow is meant to be vendor-agnostic, but assumes cloud-based web archiving services and cloud-based transfer. It is designed for an iterative crawling environment whereby adjustments to seed URLs, scopes, SURTs, regex, etc. are done from crawl to crawl, rather than having missing elements patched in; and for a large scale operation. There is a mix of open source and non-open source technologies in play, and the QA itself does not rely on a single technology, but require Web Archives crawl and replay tech.