Editing Workflow:Web Archiving Quality Assurance (QA) Workflow
Jump to navigation
Jump to search
Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.
The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.
Latest revision | Your text | ||
Line 1: | Line 1: | ||
{{Infobox COW | {{Infobox COW | ||
|status=Production | |status=Production | ||
− | |tools=Heritrix, Cathode, Browsertrix, | + | |tools=Heritrix, Cathode, Browsertrix, JIRA, Screaming Frog |
|input=Live website content. | |input=Live website content. | ||
|output=Archived website content served from WARC files. | |output=Archived website content served from WARC files. | ||
Line 32: | Line 32: | ||
We can also add specific checks for a temporary period. For example, if we need to update the contents of a particular field due to a process change. | We can also add specific checks for a temporary period. For example, if we need to update the contents of a particular field due to a process change. | ||
− | |||
<b>Site Crawled</b> | <b>Site Crawled</b> | ||
− | |||
The crawl order is generated as an XML file and sent to our vendor. The vendor launches the crawls. | The crawl order is generated as an XML file and sent to our vendor. The vendor launches the crawls. | ||
− | |||
<b>Tracking and Prioritisation</b> | <b>Tracking and Prioritisation</b> | ||
− | + | We currently use JIRA as our tracking system for crawls. | |
− | We currently use | + | As soon as a crawl is launched a JIRA ticket is set up by our vendors, containing basic information about the crawl. |
− | As soon as a crawl is launched a | + | All correspondence between TNA and our vendors about the crawl takes place on the JIRA ticket. |
− | All correspondence between TNA and our vendors about the crawl takes place on the | + | TNA marks up the JIRA tickets of any crawls which need to be treated as ‘High Priority’. We add a standard label and a descriptive comment. |
− | TNA marks up the | ||
Common reasons for a site being considered High Priority include: | Common reasons for a site being considered High Priority include: | ||
Line 55: | Line 51: | ||
Our supplier will also leave a comment in the ticket if a problem is noticed during the crawl – for example if it is becoming much larger than expected or if the crawler is blocked. | Our supplier will also leave a comment in the ticket if a problem is noticed during the crawl – for example if it is becoming much larger than expected or if the crawler is blocked. | ||
− | Each site has a parent ticket (task) in | + | Each site has a parent ticket (task) in JIRA and each individual crawl has a child ticket (sub-task). This enables us to record information which applies to all crawls at parent level and to easily move between individual crawl tickets. |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
==Purpose, Context and Content== | ==Purpose, Context and Content== | ||
<!-- Describe what your workflow is for - i.e. what it is designed to achieve, what the organisational context of the workflow is, and what content it is designed to work with --> | <!-- Describe what your workflow is for - i.e. what it is designed to achieve, what the organisational context of the workflow is, and what content it is designed to work with --> | ||
− | |||
− | |||
− | |||
− | |||
− | |||
==Evaluation/Review== | ==Evaluation/Review== | ||
<!-- How effective was the workflow? Was it replaced with a better workflow? Did it work well with some content but not others? What is the current status of the workflow? Does it relate to another workflow already described on the wiki? Link, explain and elaborate --> | <!-- How effective was the workflow? Was it replaced with a better workflow? Did it work well with some content but not others? What is the current status of the workflow? Does it relate to another workflow already described on the wiki? Link, explain and elaborate --> | ||
− | |||
==Further Information== | ==Further Information== | ||
Line 128: | Line 68: | ||
<!-- Note that your workflow will be marked with a CC3.0 licence --> | <!-- Note that your workflow will be marked with a CC3.0 licence --> | ||
− | |||
− |