Difference between revisions of "Workflow:Web Archiving Quality Assurance Lifecycle"
Jump to navigation
Jump to search
(Created page with "{{Infobox COW |status=Production |tools=Heritrix, AWS, Pywb, OpenWayback, CDX |input=Input: Seed URLs, SURTs, Exclude lists |output=WARCs, CDX files, Curatorial Data |organisa...") |
|||
(3 intermediate revisions by the same user not shown) | |||
Line 8: | Line 8: | ||
}} | }} | ||
==Workflow Description== | ==Workflow Description== | ||
− | |||
<!-- To add an image of your workflow, open the "Upload File" link on the left in a new browser tab and follow on screen instructions, then return to this page and add the name of your uploaded image to the line below - replacing "workflow.png" with the name of your file. Replace the text "Textual description" with a short description of your image. Filenames are case sensitive! If you don't want to add a workflow diagram or other image, delete the line below --> | <!-- To add an image of your workflow, open the "Upload File" link on the left in a new browser tab and follow on screen instructions, then return to this page and add the name of your uploaded image to the line below - replacing "workflow.png" with the name of your file. Replace the text "Textual description" with a short description of your image. Filenames are case sensitive! If you don't want to add a workflow diagram or other image, delete the line below --> | ||
− | |||
[[File:QA-Life-Cycle-20230525.jpeg|Quality Assurance Life Cycle at Library of Congress as of March, 25 2023.]]<br> | [[File:QA-Life-Cycle-20230525.jpeg|Quality Assurance Life Cycle at Library of Congress as of March, 25 2023.]]<br> | ||
<!-- Describe your workflow here with an overview of the different steps or processes involved--> | <!-- Describe your workflow here with an overview of the different steps or processes involved--> | ||
− | |||
==Purpose, Context and Content== | ==Purpose, Context and Content== |
Latest revision as of 20:20, 2 June 2023
Workflow Description[edit]
Purpose, Context and Content[edit]
This workflow is an illustration of the life cycle of the quality assurance (QA) process in place in the Web Archiving Program at the Library of Congress as of March 25th, 2023. This work flow is meant to be vendor-agnostic, but assumes cloud-based web archiving services and cloud-based transfer. It is designed for an iterative crawling environment whereby adjustments to seed URLs, scopes, SURTs, regex, etc. are done from crawl to crawl, rather than having missing elements patched in; and for a large scale operation. There is a mix of open source and non-open source technologies in play, and the QA itself does not rely on a single technology, but require Web Archives crawl and replay tech.