Search results

The page 'Web Crawl' does not exist on this wiki. You can fix that!

Add a new Tool page about Web Crawl

WarcManager
...ARC Manager is a web-based UI for managing and querying collections of web crawl data. |function=File Management, Web Capture

936 bytes (137 words) - 16:57, 26 November 2021
Brozzler
Brozzler is a distributed web crawler that uses a real browser (Chrome or Chromium) to fetch pages and em Brozzler is designed to work in conjunction with warcprox for web archiving.

2 KB (275 words) - 16:16, 9 December 2021
Annotation Curation Tool (ACT)
|purpose=w3act is an annotation and curation tool for web archives |content=Web

873 bytes (130 words) - 16:11, 9 December 2021
Workflow:Browsertrix-crawler Workflow
|organisation=UK Government Web Archive ...several times until a satisfactory crawl is completed. If no satisfactory crawl can be made in this way, the site will be captured with Conifer.

5 KB (748 words) - 17:02, 9 December 2021
Heritrix
|purpose=Heritrix is an open-source web crawler, allowing users to target websites they wish to include in a collec |function=Web Capture

5 KB (753 words) - 15:59, 26 November 2021
Workflow:Web Archiving Quality Assurance Lifecycle
...e QA itself does not rely on a single technology, but require Web Archives crawl and replay tech.

3 KB (427 words) - 20:20, 2 June 2023
TubeKit
|platforms=Web based |function=Web Capture

926 bytes (133 words) - 16:55, 26 November 2021
Workflow:Quality Assurance: Iterative Seed Issue Decision Tree
|input=Web Archives visual replay and crawl report data |output=Adjustments to seed URLs and scopes; the results of a future crawl; documentation

3 KB (441 words) - 19:21, 15 June 2023
Workflow:Web Archiving Capture Assessment Response Processing Workflow
|input=Visual curatorial assessments of web archives captures. |output=Jira tickets, QA on web archives, emails.

3 KB (535 words) - 20:40, 2 June 2023
WebCite
|purpose=WebCite is an on-demand web archiving service that takes snapshots of Internet-accessible digital objec |function=Persistent Identification, Web Capture, Citation and Impact Tracking

3 KB (436 words) - 16:46, 26 November 2021
Workflow:Web Archiving Quality Assurance (QA) Workflow
|organisation=The National Archives (UK), UK Government Web Archive [[File:TNA QA Process Flow v1 (1).png|UK Government Web Archive Quality Assurance (QA) Workflow]]<br>

10 KB (1,809 words) - 11:35, 8 February 2024
COPTR to do list
...Category:Web_Crawl]] is broader than just crawl. Could add an overarching "Web Archiving" category, then have sub categories. Would be nice to incorporate

1 KB (202 words) - 09:15, 1 December 2014
HTTrack
|formats_out=HTTrack Crawl |function=Web Capture

2 KB (299 words) - 15:57, 26 November 2021
HTTrack2Arc
|formats_in=HTTrack Crawl [[Category:Web]]

2 KB (357 words) - 21:57, 25 May 2021

Search results

Navigation menu

Search