Web
Jump to navigation
Jump to search
Tools for this content type
Annotation Curation Tool (ACT) | w3act is an annotation and curation tool for web archives | X | |||||||
Archive-It | Archive-It is the leading web archiving service for collecting and accessing cultural heritage on the web. It is a service provided by the Internet Archive. | X | X | ||||||
ArchiveFacebook | ArchiveFacebook is a Firefox extension which allows individuals to save and manage Facebook web content. | X | |||||||
Brozzler | From GitHub (https://github.com/internetarchive/brozzler):
Brozzler is a distributed web crawler that uses a real browser (Chrome or Chromium) to fetch pages and embedded URLs and to extract links. Brozzler is designed to work in conjunction with warcprox for web archiving. |
X | |||||||
CINCH | CINCH (Capture INgest and CHecksum Tool) facilitates batch downloading and ingest of Internet-accessible documents and/or images to a central repository. | X | |||||||
DeepArc | Intended for preserving web sites from the back-end, this is a database-to-XML curation tool. | X | X | ||||||
GNU Wget | Non-interactive network downloader | X | |||||||
HTTrack | HTTrack is a website copying utility. | X | |||||||
Heritrix | Heritrix is an open-source web crawler, allowing users to target websites they wish to include in a collection and to harvest an instance of each site. | X | |||||||
IMacros | iMacros makes it easy to test web-based applications. | X | X | X | |||||
Khtml2png | khtml2png is a command line program to create screenshots of webpages. | X | |||||||
Libsafe | libsafe allows the organizations to create a full OAIS compliant Archive, including active and passive digital preservation workflows and is particularly suited for master image files of digitizing processes. | X | |||||||
NetarchiveSuite | NetarchiveSuite is a web archiving software package designed to plan, schedule and run web harvests of parts of the Internet. | X | |||||||
NutchWAX | NutchWAX is software for indexing ARC files (archived Web sites gathered using Heritrix) for full text search. | X | |||||||
PageVault | pageVault supports the archiving of all unique responses generated by a web server. | X | |||||||
Pagelyzer | Suite of tools for detecting changes in web pages and their rendering | X | X | X | |||||
Pearl Crescent Page Saver | Pearl Crescent Page Saver is an extension for Mozilla Firefox that lets you capture images of web pages, including Flash content. | X | |||||||
Perma.cc | A tool that captures, stores, plays-back and provides a new URL for web citation. Built and maintained at the Harvard Law School Library. | X | X | ||||||
Screen-scraper | screen-scraper is a tool for extracting data from websites. | X | |||||||
SiteStory | SiteStory is a transactional web archive. It archives resources of a web server it is associated with. | X | |||||||
Spadix software | Spadix Software can download websites from a starting URL, search engine results or web dirs, and is able to follow external links. | X | |||||||
Storytracker | Tools for tracking stories on news homepages | X | |||||||
Teleport | Teleport is a web crawling tool that enables offline browsing | X | |||||||
The DeDuplicator (Heritrix add-on module) | The DeDuplicator is an add-on module for Heritrix to reduce the amount of duplicate data collected in a series of snapshot crawls. | X | X | ||||||
UKWA Access API | Web archives access API | X | X | X | |||||
W3C Markup Validation Service | This is the World Wide Web Consortium's validation tool. | X | |||||||
WARCreate | Google Chrome browser extension for creating WARC files from web pages | X | X | ||||||
WAS (Web Archiving Service) | The Web Archiving Service (WAS) is a Web-based curatorial tool that enables libraries and archivists to capture, curate, analyze, and preserve Web-based government and political information. | X | |||||||
WAXToolbar | WAXToolbar is a firefox extension to help users with common tasks encountered surfing a web archive. | X | |||||||
WCT (Web Curator Tool) | Web Curator Tool (WCT) is a workflow management application for selective web archiving. | X | X | ||||||
WERA (Web ARchive Access) | WERA (Web ARchive Access) is a freely available solution for searching and navigating archived web document collections. | X | |||||||
Warc-proxy | Warc-proxy is a simple tool to view WARC content in Firefox | X | X | ||||||
WarcManager | The WARC Manager is a web-based UI for managing and querying collections of web crawl data. | X | X | X | |||||
Warcit | Warcit is a command-line tool that converts directories (including nested directories), files (including HTML or other web assets and data files) and ZIP files to Web Archives (WARC). | X | |||||||
Warctools | Command line tools and libraries for handling and manipulating WARC files (and HTTP contents) | X | X | X | |||||
Warrick | Warrick is a free utility for reconstructing (or recovering) a website from web archives. | X | |||||||
Wayback Machine | The Wayback Machine is a powerful search and discovery tool for use with collections of Web site "snapshots" collected through Web harvesting, usually with Heritrix (ARC or WARC files). | X | X | ||||||
Web Archive Discovery | Indexing and discovery tools for web archives. | X | X | X | X | ||||
Web Recorder Player | A tool that replays WARC files on your local computer. | X | X | ||||||
Web Scraper Plus+ | Web Scraper Plus+ takes data from the web and puts it into a spreadsheet or database. | X | |||||||
WebCite | WebCite is an on-demand web archiving service that takes snapshots of Internet-accessible digital objects at the behest of users, storing the data on their own servers and assigning unique identifiers to those instances of the material. | X | X | X | X | ||||
WebShot | WebShot allows you to take screenshots of web pages and save them as full sized images or thumbnails. | X | |||||||
Webkit2png | webkit2png is a command line tool that creates png screenshots of webpages. | X | |||||||
Webrecorder | Webrecorder is a hosted web archiving tool with which users can capture what they see as they browse websites and save that information (locally or to a free account) | X | |||||||
Xenu's Link Sleuth | The tool checks the hyperlinks on websites. | X | |||||||
YT-DLP (You Tube Download P) | Supports download of youtube videos, based on the now defunct YT-DL | X |
Also see the Web Archiving Community master list of software.