Editing Workflow:Browsertrix-crawler Workflow

{{Infobox COW
|status=Experimental
|tools=Browsertrix, Conifer
|input=Website
|output=WARC file
|organisation=UK Government Web Archive
|organisationurl=https://nationalarchives.gov.uk/webarchive
}}
==Workflow Description==

<!-- To add an image of your workflow, open the "Upload File" link on the left in a new browser tab and follow on screen instructions, then return to this page and add the name of your uploaded image to the line below - replacing "workflow.png" with the name of your file. Replace the text "Textual description" with a short description of your image. Filenames are case sensitive! If you don't want to add a workflow diagram or other image, delete the line below  -->

[[File:BX-workflow.png|Flowchart workflow for capturing a website with Browsertrix-crawler]]<br>

<!-- Describe your workflow here with an overview of the different steps or processes involved-->

The workflow involves the decision to capture a website with Browsertrix-crawler. It shows the iterative process of crawling a page with Browsertrix, QAing the results in Conifer and recrawling with adjusted settings. 

==Purpose, Context and Content==
<!-- Describe what your workflow is for - i.e. what it is designed to achieve, what the organisational context of the workflow is, and what content it is designed to work with -->

The purpose of this workflow is determine whether a site is suitable for capture with [https://github.com/webrecorder/browsertrix-crawler Browsertrix Crawler] and if so, run a Browsertrix crawl. The crawl is then subject to Quality Assurance. If the crawl is found to be unsatisfactory Browsertrix settings are adjusted and the crawl is run again, with this process potentially being repeated several times until a satisfactory crawl is completed. If not satisfactory crawl can be made in this way, the site will be captured with Conifer. 

The steps are as follows:

1. A site is identified for capture.

2. The site is assessed to determine which capture method is suitable. At this point we look at:

     * How large the site is
     * Does the site contain interactive content?
     * What is the planned capture frequency? (if the proposed capture is very frequent we may be more likely to use an in-house tool like Browsertrix to reduce costs)
     * What level of fidelity is required
     * Have previous crawls of the site been attempted and what was the outcome

3. An initial decision of what capture technology to use is made

==Evaluation/Review==
<!-- How effective was the workflow? Was it replaced with a better workflow? Did it work well with some content but not others? What is the current status of the workflow? Does it relate to another workflow already described on the wiki? Link, explain and elaborate -->

==Further Information==
<!-- Provide any further information or links to additional documentation here -->

<!-- Add four tildes below ("~~~~") to create an automatic signature, including your wiki username. Ensure your user page (click on your username to create it) includes an up to date contact email address so that people can contact you if they want to discuss your workflow -->

<!-- Note that your workflow will be marked with a CC3.0 licence -->