<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en-GB">
	<id>https://coptr.digipres.org/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Meghly</id>
	<title>COPTR - User contributions [en-gb]</title>
	<link rel="self" type="application/atom+xml" href="https://coptr.digipres.org/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Meghly"/>
	<link rel="alternate" type="text/html" href="https://coptr.digipres.org/Special:Contributions/Meghly"/>
	<updated>2026-04-07T05:33:48Z</updated>
	<subtitle>User contributions</subtitle>
	<generator>MediaWiki 1.35.14</generator>
	<entry>
		<id>https://coptr.digipres.org/index.php?title=Workflow:Quality_Assurance:_Iterative_Seed_Issue_Decision_Tree&amp;diff=6100</id>
		<title>Workflow:Quality Assurance: Iterative Seed Issue Decision Tree</title>
		<link rel="alternate" type="text/html" href="https://coptr.digipres.org/index.php?title=Workflow:Quality_Assurance:_Iterative_Seed_Issue_Decision_Tree&amp;diff=6100"/>
		<updated>2023-06-15T19:21:02Z</updated>

		<summary type="html">&lt;p&gt;Meghly: /* Purpose, Context and Content */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Infobox COW&lt;br /&gt;
|status=Production&lt;br /&gt;
|tools=Heritrix, Webrecorder, OpenWayback, Pywb, OutbackCDX, CDX&lt;br /&gt;
|input=Web Archives visual replay and crawl report data&lt;br /&gt;
|output=Adjustments to seed URLs and scopes; the results of a future crawl; documentation&lt;br /&gt;
|organisation=Library of Congress&lt;br /&gt;
|organisationurl=https://www.loc.gov&lt;br /&gt;
}}&lt;br /&gt;
==Workflow Description==&lt;br /&gt;
&amp;lt;!-- To add an image of your workflow, open the &amp;quot;Upload File&amp;quot; link on the left in a new browser tab and follow on screen instructions, then return to this page and add the name of your uploaded image to the line below - replacing &amp;quot;workflow.png&amp;quot; with the name of your file. Replace the text &amp;quot;Textual description&amp;quot; with a short description of your image. Filenames are case sensitive! If you don't want to add a workflow diagram or other image, delete the line below  --&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[[File:Seed-Issue-Decision-Tree-20230525-ScreenSized.jpeg|Quality Assurance: Seed Issue Decision Tree]]&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- Describe your workflow here with an overview of the different steps or processes involved--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Purpose, Context and Content==&lt;br /&gt;
This decision tree is meant to provide an outline for completing seed-by-seed quality assurance, beginning with data from Heritrix crawl reports and iterating on input, crawler, or other variables until either the captures improve or a seed URL is deemed non-archivable. At our organization, this workflow is conducted entirely by the Web Archiving Team, the technical team which facilitates the contracted crawling, the use of our curatorial workflow tool Digiboard, and the ingest and access to the web archives (this latter in conjunction with our Office of the Chief Information Officer or OCIO).&lt;br /&gt;
&lt;br /&gt;
==Evaluation/Review==&lt;br /&gt;
&amp;lt;!-- How effective was the workflow? Was it replaced with a better workflow? Did it work well with some content but not others? What is the current status of the workflow? Does it relate to another workflow already described on the wiki? Link, explain and elaborate --&amp;gt;&lt;br /&gt;
This workflow or some version of it has been in place for a long time in our program. It is labor intensive and certainly cannot be completed on 100% of the materials going into the crawls or 100% of the materials coming out of the crawls. That said, close attention to detail may yield results, even if QA cannot be completed all the time on everything.&lt;br /&gt;
&lt;br /&gt;
==Further Information==&lt;br /&gt;
&amp;lt;!-- Provide any further information or links to additional documentation here --&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- Add four tildes below (&amp;quot;~~~~&amp;quot;) to create an automatic signature, including your wiki username. Ensure your user page (click on your username to create it) includes an up to date contact email address so that people can contact you if they want to discuss your workflow --&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- Note that your workflow will be marked with a CC3.0 licence --&amp;gt;&lt;/div&gt;</summary>
		<author><name>Meghly</name></author>
	</entry>
	<entry>
		<id>https://coptr.digipres.org/index.php?title=Workflow:Quality_Assurance:_Iterative_Seed_Issue_Decision_Tree&amp;diff=6097</id>
		<title>Workflow:Quality Assurance: Iterative Seed Issue Decision Tree</title>
		<link rel="alternate" type="text/html" href="https://coptr.digipres.org/index.php?title=Workflow:Quality_Assurance:_Iterative_Seed_Issue_Decision_Tree&amp;diff=6097"/>
		<updated>2023-06-02T20:49:26Z</updated>

		<summary type="html">&lt;p&gt;Meghly: Created page with &amp;quot;{{Infobox COW |status=Production |tools=Heritrix, Webrecorder, OpenWayback, Pywb, OutbackCDX, CDX |input=Web Archives visual replay and crawl report data |output=Adjustments t...&amp;quot;&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Infobox COW&lt;br /&gt;
|status=Production&lt;br /&gt;
|tools=Heritrix, Webrecorder, OpenWayback, Pywb, OutbackCDX, CDX&lt;br /&gt;
|input=Web Archives visual replay and crawl report data&lt;br /&gt;
|output=Adjustments to seed URLs and scopes; the results of a future crawl; documentation&lt;br /&gt;
|organisation=Library of Congress&lt;br /&gt;
|organisationurl=https://www.loc.gov&lt;br /&gt;
}}&lt;br /&gt;
==Workflow Description==&lt;br /&gt;
&amp;lt;!-- To add an image of your workflow, open the &amp;quot;Upload File&amp;quot; link on the left in a new browser tab and follow on screen instructions, then return to this page and add the name of your uploaded image to the line below - replacing &amp;quot;workflow.png&amp;quot; with the name of your file. Replace the text &amp;quot;Textual description&amp;quot; with a short description of your image. Filenames are case sensitive! If you don't want to add a workflow diagram or other image, delete the line below  --&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[[File:Seed-Issue-Decision-Tree-20230525-ScreenSized.jpeg|Quality Assurance: Seed Issue Decision Tree]]&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- Describe your workflow here with an overview of the different steps or processes involved--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Purpose, Context and Content==&lt;br /&gt;
This decision tree is meant to provide an outline for completing seed-by-seed quality assurance, beginning with data from Heritrix crawl reports and iterating on input, crawler, or other variables until either the captures improve or a seed URL is deemed non-archivable. &lt;br /&gt;
&lt;br /&gt;
==Evaluation/Review==&lt;br /&gt;
&amp;lt;!-- How effective was the workflow? Was it replaced with a better workflow? Did it work well with some content but not others? What is the current status of the workflow? Does it relate to another workflow already described on the wiki? Link, explain and elaborate --&amp;gt;&lt;br /&gt;
This workflow or some version of it has been in place for a long time in our program. It is labor intensive and certainly cannot be completed on 100% of the materials going into the crawls or 100% of the materials coming out of the crawls. That said, close attention to detail may yield results, even if QA cannot be completed all the time on everything.&lt;br /&gt;
&lt;br /&gt;
==Further Information==&lt;br /&gt;
&amp;lt;!-- Provide any further information or links to additional documentation here --&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- Add four tildes below (&amp;quot;~~~~&amp;quot;) to create an automatic signature, including your wiki username. Ensure your user page (click on your username to create it) includes an up to date contact email address so that people can contact you if they want to discuss your workflow --&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- Note that your workflow will be marked with a CC3.0 licence --&amp;gt;&lt;/div&gt;</summary>
		<author><name>Meghly</name></author>
	</entry>
	<entry>
		<id>https://coptr.digipres.org/index.php?title=File:Seed-Issue-Decision-Tree-20230525-ScreenSized.jpeg&amp;diff=6096</id>
		<title>File:Seed-Issue-Decision-Tree-20230525-ScreenSized.jpeg</title>
		<link rel="alternate" type="text/html" href="https://coptr.digipres.org/index.php?title=File:Seed-Issue-Decision-Tree-20230525-ScreenSized.jpeg&amp;diff=6096"/>
		<updated>2023-06-02T20:43:19Z</updated>

		<summary type="html">&lt;p&gt;Meghly: Decision Tree illustration of dealing with seed issues in Web Archiving.&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Summary ==&lt;br /&gt;
Decision Tree illustration of dealing with seed issues in Web Archiving.&lt;/div&gt;</summary>
		<author><name>Meghly</name></author>
	</entry>
	<entry>
		<id>https://coptr.digipres.org/index.php?title=Workflow:Web_Archiving_Capture_Assessment_Response_Processing_Workflow&amp;diff=6095</id>
		<title>Workflow:Web Archiving Capture Assessment Response Processing Workflow</title>
		<link rel="alternate" type="text/html" href="https://coptr.digipres.org/index.php?title=Workflow:Web_Archiving_Capture_Assessment_Response_Processing_Workflow&amp;diff=6095"/>
		<updated>2023-06-02T20:40:57Z</updated>

		<summary type="html">&lt;p&gt;Meghly: Created page with &amp;quot;{{Infobox COW |status=Experimental |tools=Jira, Confluence, email |input=Visual curatorial assessments of web archives captures. |output=Jira tickets, QA on web archives, emai...&amp;quot;&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Infobox COW&lt;br /&gt;
|status=Experimental&lt;br /&gt;
|tools=Jira, Confluence, email&lt;br /&gt;
|input=Visual curatorial assessments of web archives captures.&lt;br /&gt;
|output=Jira tickets, QA on web archives, emails.&lt;br /&gt;
|organisation=Library of Congress&lt;br /&gt;
|organisationurl=https://loc.gov&lt;br /&gt;
}}&lt;br /&gt;
==Workflow Description==&lt;br /&gt;
&amp;lt;!-- To add an image of your workflow, open the &amp;quot;Upload File&amp;quot; link on the left in a new browser tab and follow on screen instructions, then return to this page and add the name of your uploaded image to the line below - replacing &amp;quot;workflow.png&amp;quot; with the name of your file. Replace the text &amp;quot;Textual description&amp;quot; with a short description of your image. Filenames are case sensitive! If you don't want to add a workflow diagram or other image, delete the line below  --&amp;gt;&lt;br /&gt;
[[File:Capture-Assessment-Processing-20230525.jpeg|Capture Assessment Response Processes]]&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- Describe your workflow here with an overview of the different steps or processes involved--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Purpose, Context and Content==&lt;br /&gt;
&amp;lt;!-- Describe what your workflow is for - i.e. what it is designed to achieve, what the organisational context of the workflow is, and what content it is designed to work with --&amp;gt;&lt;br /&gt;
Currently at the LC, our systems for web archives curatorial workflows (Digiboard), Web Archiving Team (WAT) documentation and work processes (Confluence/Jira), Web Archives Replay (OpenWayback/Pywb), and capture (Digiboard for curating, a vendor for at-scale crawling, other ticketing systems to submit seed lists, Heritrix and browser-based crawlers for capture, and subsequent crawl reports, storage, etc.) do not speak to each other. A process was designed whereby curatorial staff, digital technicians, acquisitions specialists, and other designated individuals could review captures of given sites and send responses in a structured way to the WAT. That process is called Capture Assessment. For our purposes, Quality Assurance (QA) is the technical process completed by WAT and the Library crawl vendor to iteratively adjust crawl parameters and other variables in order to improve capture quality over time. Capture Assessment response processing, featured in this workflow, is the workflow through which the WAT review capture assessments and complete QA as-needed.&lt;br /&gt;
&lt;br /&gt;
==Evaluation/Review==&lt;br /&gt;
&amp;lt;!-- How effective was the workflow? Was it replaced with a better workflow? Did it work well with some content but not others? What is the current status of the workflow? Does it relate to another workflow already described on the wiki? Link, explain and elaborate --&amp;gt;&lt;br /&gt;
This workflow is relatively new and experimental in our program as of 6/2/2023. The workflow replaced a previous workflow where a single yes/no question was asked in the banner of Wayback, &amp;quot;is this a good capture?&amp;quot;, and no notification went to the Web Archiving Team when curatorial staff submitted an answer. While the responses were recorded in a module in Digiboard, it was not a reliable tool and did not always yield detailed or usable results.&lt;br /&gt;
&lt;br /&gt;
==Further Information==&lt;br /&gt;
&amp;lt;!-- Provide any further information or links to additional documentation here --&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- Add four tildes below (&amp;quot;~~~~&amp;quot;) to create an automatic signature, including your wiki username. Ensure your user page (click on your username to create it) includes an up to date contact email address so that people can contact you if they want to discuss your workflow --&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- Note that your workflow will be marked with a CC3.0 licence --&amp;gt;&lt;/div&gt;</summary>
		<author><name>Meghly</name></author>
	</entry>
	<entry>
		<id>https://coptr.digipres.org/index.php?title=File:Capture-Assessment-Processing-20230525.jpeg&amp;diff=6094</id>
		<title>File:Capture-Assessment-Processing-20230525.jpeg</title>
		<link rel="alternate" type="text/html" href="https://coptr.digipres.org/index.php?title=File:Capture-Assessment-Processing-20230525.jpeg&amp;diff=6094"/>
		<updated>2023-06-02T20:40:28Z</updated>

		<summary type="html">&lt;p&gt;Meghly: Meghly uploaded a new version of File:Capture-Assessment-Processing-20230525.jpeg&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Summary ==&lt;br /&gt;
WAT Scrum/administrative/Quality Assurance processes for curatorial capture assessments.&lt;/div&gt;</summary>
		<author><name>Meghly</name></author>
	</entry>
	<entry>
		<id>https://coptr.digipres.org/index.php?title=File:Capture-Assessment-Processing-20230525.jpeg&amp;diff=6093</id>
		<title>File:Capture-Assessment-Processing-20230525.jpeg</title>
		<link rel="alternate" type="text/html" href="https://coptr.digipres.org/index.php?title=File:Capture-Assessment-Processing-20230525.jpeg&amp;diff=6093"/>
		<updated>2023-06-02T20:26:44Z</updated>

		<summary type="html">&lt;p&gt;Meghly: WAT Scrum/administrative/Quality Assurance processes for curatorial capture assessments.&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Summary ==&lt;br /&gt;
WAT Scrum/administrative/Quality Assurance processes for curatorial capture assessments.&lt;/div&gt;</summary>
		<author><name>Meghly</name></author>
	</entry>
	<entry>
		<id>https://coptr.digipres.org/index.php?title=Workflow:Web_Archiving_Quality_Assurance_Lifecycle&amp;diff=6092</id>
		<title>Workflow:Web Archiving Quality Assurance Lifecycle</title>
		<link rel="alternate" type="text/html" href="https://coptr.digipres.org/index.php?title=Workflow:Web_Archiving_Quality_Assurance_Lifecycle&amp;diff=6092"/>
		<updated>2023-06-02T20:20:15Z</updated>

		<summary type="html">&lt;p&gt;Meghly: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Infobox COW&lt;br /&gt;
|status=Production&lt;br /&gt;
|tools=Heritrix, AWS, Pywb, OpenWayback, CDX&lt;br /&gt;
|input=Input: Seed URLs, SURTs, Exclude lists&lt;br /&gt;
|output=WARCs, CDX files, Curatorial Data&lt;br /&gt;
|organisation=Library of Congress&lt;br /&gt;
|organisationurl=https://www.loc.gov/&lt;br /&gt;
}}&lt;br /&gt;
==Workflow Description==&lt;br /&gt;
&amp;lt;!-- To add an image of your workflow, open the &amp;quot;Upload File&amp;quot; link on the left in a new browser tab and follow on screen instructions, then return to this page and add the name of your uploaded image to the line below - replacing &amp;quot;workflow.png&amp;quot; with the name of your file. Replace the text &amp;quot;Textual description&amp;quot; with a short description of your image. Filenames are case sensitive! If you don't want to add a workflow diagram or other image, delete the line below  --&amp;gt;&lt;br /&gt;
[[File:QA-Life-Cycle-20230525.jpeg|Quality Assurance Life Cycle at Library of Congress as of March, 25 2023.]]&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- Describe your workflow here with an overview of the different steps or processes involved--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Purpose, Context and Content==&lt;br /&gt;
&amp;lt;!-- Describe what your workflow is for - i.e. what it is designed to achieve, what the organisational context of the workflow is, and what content it is designed to work with --&amp;gt;&lt;br /&gt;
This workflow is an illustration of the life cycle of the quality assurance (QA) process in place in the Web Archiving Program at the Library of Congress as of March 25th, 2023. This work flow is meant to be vendor-agnostic, but assumes cloud-based web archiving services and cloud-based transfer. It is designed for an iterative crawling environment whereby adjustments to seed URLs, scopes, SURTs, regex, etc. are done from crawl to crawl, rather than having missing elements patched in; and for a large scale operation. There is a mix of open source and non-open source technologies in play, and the QA itself does not rely on a single technology, but require Web Archives crawl and replay tech. &lt;br /&gt;
&lt;br /&gt;
==Evaluation/Review==&lt;br /&gt;
&amp;lt;!-- How effective was the workflow? Was it replaced with a better workflow? Did it work well with some content but not others? What is the current status of the workflow? Does it relate to another workflow already described on the wiki? Link, explain and elaborate --&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Further Information==&lt;br /&gt;
&amp;lt;!-- Provide any further information or links to additional documentation here --&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- Add four tildes below (&amp;quot;~~~~&amp;quot;) to create an automatic signature, including your wiki username. Ensure your user page (click on your username to create it) includes an up to date contact email address so that people can contact you if they want to discuss your workflow --&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- Note that your workflow will be marked with a CC3.0 licence --&amp;gt;&lt;/div&gt;</summary>
		<author><name>Meghly</name></author>
	</entry>
	<entry>
		<id>https://coptr.digipres.org/index.php?title=File:QA-Life-Cycle-20230525.jpeg&amp;diff=6091</id>
		<title>File:QA-Life-Cycle-20230525.jpeg</title>
		<link rel="alternate" type="text/html" href="https://coptr.digipres.org/index.php?title=File:QA-Life-Cycle-20230525.jpeg&amp;diff=6091"/>
		<updated>2023-06-02T20:19:38Z</updated>

		<summary type="html">&lt;p&gt;Meghly: Meghly uploaded a new version of File:QA-Life-Cycle-20230525.jpeg&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Summary ==&lt;br /&gt;
Quality Assurance Life Cycle at Library of Congress as of March, 25 2023.&lt;/div&gt;</summary>
		<author><name>Meghly</name></author>
	</entry>
	<entry>
		<id>https://coptr.digipres.org/index.php?title=Workflow:Web_Archiving_Quality_Assurance_Lifecycle&amp;diff=6090</id>
		<title>Workflow:Web Archiving Quality Assurance Lifecycle</title>
		<link rel="alternate" type="text/html" href="https://coptr.digipres.org/index.php?title=Workflow:Web_Archiving_Quality_Assurance_Lifecycle&amp;diff=6090"/>
		<updated>2023-06-02T20:16:50Z</updated>

		<summary type="html">&lt;p&gt;Meghly: /* Workflow Description */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Infobox COW&lt;br /&gt;
|status=Production&lt;br /&gt;
|tools=Heritrix, AWS, Pywb, OpenWayback, CDX&lt;br /&gt;
|input=Input: Seed URLs, SURTs, Exclude lists&lt;br /&gt;
|output=WARCs, CDX files, Curatorial Data&lt;br /&gt;
|organisation=Library of Congress&lt;br /&gt;
|organisationurl=https://www.loc.gov/&lt;br /&gt;
}}&lt;br /&gt;
==Workflow Description==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- To add an image of your workflow, open the &amp;quot;Upload File&amp;quot; link on the left in a new browser tab and follow on screen instructions, then return to this page and add the name of your uploaded image to the line below - replacing &amp;quot;workflow.png&amp;quot; with the name of your file. Replace the text &amp;quot;Textual description&amp;quot; with a short description of your image. Filenames are case sensitive! If you don't want to add a workflow diagram or other image, delete the line below  --&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[[File:QA-Life-Cycle-20230525.jpeg|Quality Assurance Life Cycle at Library of Congress as of March, 25 2023.]]&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- Describe your workflow here with an overview of the different steps or processes involved--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Purpose, Context and Content==&lt;br /&gt;
&amp;lt;!-- Describe what your workflow is for - i.e. what it is designed to achieve, what the organisational context of the workflow is, and what content it is designed to work with --&amp;gt;&lt;br /&gt;
This workflow is an illustration of the life cycle of the quality assurance (QA) process in place in the Web Archiving Program at the Library of Congress as of March 25th, 2023. This work flow is meant to be vendor-agnostic, but assumes cloud-based web archiving services and cloud-based transfer. It is designed for an iterative crawling environment whereby adjustments to seed URLs, scopes, SURTs, regex, etc. are done from crawl to crawl, rather than having missing elements patched in; and for a large scale operation. There is a mix of open source and non-open source technologies in play, and the QA itself does not rely on a single technology, but require Web Archives crawl and replay tech. &lt;br /&gt;
&lt;br /&gt;
==Evaluation/Review==&lt;br /&gt;
&amp;lt;!-- How effective was the workflow? Was it replaced with a better workflow? Did it work well with some content but not others? What is the current status of the workflow? Does it relate to another workflow already described on the wiki? Link, explain and elaborate --&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Further Information==&lt;br /&gt;
&amp;lt;!-- Provide any further information or links to additional documentation here --&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- Add four tildes below (&amp;quot;~~~~&amp;quot;) to create an automatic signature, including your wiki username. Ensure your user page (click on your username to create it) includes an up to date contact email address so that people can contact you if they want to discuss your workflow --&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- Note that your workflow will be marked with a CC3.0 licence --&amp;gt;&lt;/div&gt;</summary>
		<author><name>Meghly</name></author>
	</entry>
	<entry>
		<id>https://coptr.digipres.org/index.php?title=Workflow:Web_Archiving_Quality_Assurance_Lifecycle&amp;diff=6089</id>
		<title>Workflow:Web Archiving Quality Assurance Lifecycle</title>
		<link rel="alternate" type="text/html" href="https://coptr.digipres.org/index.php?title=Workflow:Web_Archiving_Quality_Assurance_Lifecycle&amp;diff=6089"/>
		<updated>2023-06-02T20:16:27Z</updated>

		<summary type="html">&lt;p&gt;Meghly: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Infobox COW&lt;br /&gt;
|status=Production&lt;br /&gt;
|tools=Heritrix, AWS, Pywb, OpenWayback, CDX&lt;br /&gt;
|input=Input: Seed URLs, SURTs, Exclude lists&lt;br /&gt;
|output=WARCs, CDX files, Curatorial Data&lt;br /&gt;
|organisation=Library of Congress&lt;br /&gt;
|organisationurl=https://www.loc.gov/&lt;br /&gt;
}}&lt;br /&gt;
==Workflow Description==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- To add an image of your workflow, open the &amp;quot;Upload File&amp;quot; link on the left in a new browser tab and follow on screen instructions, then return to this page and add the name of your uploaded image to the line below - replacing &amp;quot;workflow.png&amp;quot; with the name of your file. Replace the text &amp;quot;Textual description&amp;quot; with a short description of your image. Filenames are case sensitive! If you don't want to add a workflow diagram or other image, delete the line below  --&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[[File:577px-QA-Life-Cycle-20230525.jpeg|Quality Assurance Life Cycle at Library of Congress as of March, 25 2023.]]&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- Describe your workflow here with an overview of the different steps or processes involved--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Purpose, Context and Content==&lt;br /&gt;
&amp;lt;!-- Describe what your workflow is for - i.e. what it is designed to achieve, what the organisational context of the workflow is, and what content it is designed to work with --&amp;gt;&lt;br /&gt;
This workflow is an illustration of the life cycle of the quality assurance (QA) process in place in the Web Archiving Program at the Library of Congress as of March 25th, 2023. This work flow is meant to be vendor-agnostic, but assumes cloud-based web archiving services and cloud-based transfer. It is designed for an iterative crawling environment whereby adjustments to seed URLs, scopes, SURTs, regex, etc. are done from crawl to crawl, rather than having missing elements patched in; and for a large scale operation. There is a mix of open source and non-open source technologies in play, and the QA itself does not rely on a single technology, but require Web Archives crawl and replay tech. &lt;br /&gt;
&lt;br /&gt;
==Evaluation/Review==&lt;br /&gt;
&amp;lt;!-- How effective was the workflow? Was it replaced with a better workflow? Did it work well with some content but not others? What is the current status of the workflow? Does it relate to another workflow already described on the wiki? Link, explain and elaborate --&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Further Information==&lt;br /&gt;
&amp;lt;!-- Provide any further information or links to additional documentation here --&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- Add four tildes below (&amp;quot;~~~~&amp;quot;) to create an automatic signature, including your wiki username. Ensure your user page (click on your username to create it) includes an up to date contact email address so that people can contact you if they want to discuss your workflow --&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- Note that your workflow will be marked with a CC3.0 licence --&amp;gt;&lt;/div&gt;</summary>
		<author><name>Meghly</name></author>
	</entry>
	<entry>
		<id>https://coptr.digipres.org/index.php?title=Workflow:Web_Archiving_Quality_Assurance_Lifecycle&amp;diff=6088</id>
		<title>Workflow:Web Archiving Quality Assurance Lifecycle</title>
		<link rel="alternate" type="text/html" href="https://coptr.digipres.org/index.php?title=Workflow:Web_Archiving_Quality_Assurance_Lifecycle&amp;diff=6088"/>
		<updated>2023-06-02T20:16:01Z</updated>

		<summary type="html">&lt;p&gt;Meghly: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Infobox COW&lt;br /&gt;
|status=Production&lt;br /&gt;
|tools=Heritrix, AWS, Pywb, OpenWayback, CDX&lt;br /&gt;
|input=Input: Seed URLs, SURTs, Exclude lists&lt;br /&gt;
|output=WARCs, CDX files, Curatorial Data&lt;br /&gt;
|organisation=Library of Congress&lt;br /&gt;
|organisationurl=https://www.loc.gov/&lt;br /&gt;
}}&lt;br /&gt;
==Workflow Description==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- To add an image of your workflow, open the &amp;quot;Upload File&amp;quot; link on the left in a new browser tab and follow on screen instructions, then return to this page and add the name of your uploaded image to the line below - replacing &amp;quot;workflow.png&amp;quot; with the name of your file. Replace the text &amp;quot;Textual description&amp;quot; with a short description of your image. Filenames are case sensitive! If you don't want to add a workflow diagram or other image, delete the line below  --&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[[File:QA-Life-Cycle-20230525.jpeg/577px-QA-Life-Cycle-20230525.jpeg|Quality Assurance Life Cycle at Library of Congress as of March, 25 2023.]]&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- Describe your workflow here with an overview of the different steps or processes involved--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Purpose, Context and Content==&lt;br /&gt;
&amp;lt;!-- Describe what your workflow is for - i.e. what it is designed to achieve, what the organisational context of the workflow is, and what content it is designed to work with --&amp;gt;&lt;br /&gt;
This workflow is an illustration of the life cycle of the quality assurance (QA) process in place in the Web Archiving Program at the Library of Congress as of March 25th, 2023. This work flow is meant to be vendor-agnostic, but assumes cloud-based web archiving services and cloud-based transfer. It is designed for an iterative crawling environment whereby adjustments to seed URLs, scopes, SURTs, regex, etc. are done from crawl to crawl, rather than having missing elements patched in; and for a large scale operation. There is a mix of open source and non-open source technologies in play, and the QA itself does not rely on a single technology, but require Web Archives crawl and replay tech. &lt;br /&gt;
&lt;br /&gt;
==Evaluation/Review==&lt;br /&gt;
&amp;lt;!-- How effective was the workflow? Was it replaced with a better workflow? Did it work well with some content but not others? What is the current status of the workflow? Does it relate to another workflow already described on the wiki? Link, explain and elaborate --&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Further Information==&lt;br /&gt;
&amp;lt;!-- Provide any further information or links to additional documentation here --&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- Add four tildes below (&amp;quot;~~~~&amp;quot;) to create an automatic signature, including your wiki username. Ensure your user page (click on your username to create it) includes an up to date contact email address so that people can contact you if they want to discuss your workflow --&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- Note that your workflow will be marked with a CC3.0 licence --&amp;gt;&lt;/div&gt;</summary>
		<author><name>Meghly</name></author>
	</entry>
	<entry>
		<id>https://coptr.digipres.org/index.php?title=Workflow:Web_Archiving_Quality_Assurance_Lifecycle&amp;diff=6087</id>
		<title>Workflow:Web Archiving Quality Assurance Lifecycle</title>
		<link rel="alternate" type="text/html" href="https://coptr.digipres.org/index.php?title=Workflow:Web_Archiving_Quality_Assurance_Lifecycle&amp;diff=6087"/>
		<updated>2023-06-02T20:15:02Z</updated>

		<summary type="html">&lt;p&gt;Meghly: Created page with &amp;quot;{{Infobox COW |status=Production |tools=Heritrix, AWS, Pywb, OpenWayback, CDX |input=Input: Seed URLs, SURTs, Exclude lists |output=WARCs, CDX files, Curatorial Data |organisa...&amp;quot;&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Infobox COW&lt;br /&gt;
|status=Production&lt;br /&gt;
|tools=Heritrix, AWS, Pywb, OpenWayback, CDX&lt;br /&gt;
|input=Input: Seed URLs, SURTs, Exclude lists&lt;br /&gt;
|output=WARCs, CDX files, Curatorial Data&lt;br /&gt;
|organisation=Library of Congress&lt;br /&gt;
|organisationurl=https://www.loc.gov/&lt;br /&gt;
}}&lt;br /&gt;
==Workflow Description==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- To add an image of your workflow, open the &amp;quot;Upload File&amp;quot; link on the left in a new browser tab and follow on screen instructions, then return to this page and add the name of your uploaded image to the line below - replacing &amp;quot;workflow.png&amp;quot; with the name of your file. Replace the text &amp;quot;Textual description&amp;quot; with a short description of your image. Filenames are case sensitive! If you don't want to add a workflow diagram or other image, delete the line below  --&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[[File:QA-Life-Cycle-20230525.jpeg|Quality Assurance Life Cycle at Library of Congress as of March, 25 2023.]]&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- Describe your workflow here with an overview of the different steps or processes involved--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Purpose, Context and Content==&lt;br /&gt;
&amp;lt;!-- Describe what your workflow is for - i.e. what it is designed to achieve, what the organisational context of the workflow is, and what content it is designed to work with --&amp;gt;&lt;br /&gt;
This workflow is an illustration of the life cycle of the quality assurance (QA) process in place in the Web Archiving Program at the Library of Congress as of March 25th, 2023. This work flow is meant to be vendor-agnostic, but assumes cloud-based web archiving services and cloud-based transfer. It is designed for an iterative crawling environment whereby adjustments to seed URLs, scopes, SURTs, regex, etc. are done from crawl to crawl, rather than having missing elements patched in; and for a large scale operation. There is a mix of open source and non-open source technologies in play, and the QA itself does not rely on a single technology, but require Web Archives crawl and replay tech. &lt;br /&gt;
&lt;br /&gt;
==Evaluation/Review==&lt;br /&gt;
&amp;lt;!-- How effective was the workflow? Was it replaced with a better workflow? Did it work well with some content but not others? What is the current status of the workflow? Does it relate to another workflow already described on the wiki? Link, explain and elaborate --&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Further Information==&lt;br /&gt;
&amp;lt;!-- Provide any further information or links to additional documentation here --&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- Add four tildes below (&amp;quot;~~~~&amp;quot;) to create an automatic signature, including your wiki username. Ensure your user page (click on your username to create it) includes an up to date contact email address so that people can contact you if they want to discuss your workflow --&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- Note that your workflow will be marked with a CC3.0 licence --&amp;gt;&lt;/div&gt;</summary>
		<author><name>Meghly</name></author>
	</entry>
	<entry>
		<id>https://coptr.digipres.org/index.php?title=File:QA-Life-Cycle-20230525.jpeg&amp;diff=6086</id>
		<title>File:QA-Life-Cycle-20230525.jpeg</title>
		<link rel="alternate" type="text/html" href="https://coptr.digipres.org/index.php?title=File:QA-Life-Cycle-20230525.jpeg&amp;diff=6086"/>
		<updated>2023-06-02T20:06:30Z</updated>

		<summary type="html">&lt;p&gt;Meghly: Meghly uploaded a new version of File:QA-Life-Cycle-20230525.jpeg&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Summary ==&lt;br /&gt;
Quality Assurance Life Cycle at Library of Congress as of March, 25 2023.&lt;/div&gt;</summary>
		<author><name>Meghly</name></author>
	</entry>
	<entry>
		<id>https://coptr.digipres.org/index.php?title=File:QA-Life-Cycle-20230525.jpeg&amp;diff=6085</id>
		<title>File:QA-Life-Cycle-20230525.jpeg</title>
		<link rel="alternate" type="text/html" href="https://coptr.digipres.org/index.php?title=File:QA-Life-Cycle-20230525.jpeg&amp;diff=6085"/>
		<updated>2023-06-02T19:52:41Z</updated>

		<summary type="html">&lt;p&gt;Meghly: Quality Assurance Life Cycle at Library of Congress as of March, 25 2023.&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Summary ==&lt;br /&gt;
Quality Assurance Life Cycle at Library of Congress as of March, 25 2023.&lt;/div&gt;</summary>
		<author><name>Meghly</name></author>
	</entry>
</feed>