Semantic search

Jump to navigation Jump to search
The DeDuplicator (Heritrix add-on module)De-Duplication
Web Capture
The DeDuplicator is an add-on module for Heritrix to reduce the amount of duplicate data collected in a series of snapshot crawls.
TreeFile Management
Metadata Processing
Tree displays the directory structure of a path or of the disk in a drive graphically.
TreeSizeFile Management
Manage disk space and scan your hard disks.
TubeKitWeb CaptureTubeKit is a toolkit for creating YouTube crawlers.
Tufts Submission-Agreement Builder ToolData capture and Deposit
SABT is a web-based tool that guides records creators and records managers through the process of creating submission agreements, both for single transfers and for standing submissions.
UKWA GSuite Add-OnValidation

GSuite functions for people working with web archives. The functions use the Memento API (specifically the TimeGate) to look up whether a given archive holds a given URL. It currently supports checks against:

  • UK Web Archive
  • UK Government Web Archive
  • Internet Archive
UnArchiver is a native macOS utility which supports infinitely more archive formats then other common archiving utilities.
Securely encrypts large amounts of files
Virtual CloneDriveDisk ImagingVirtual CloneDrive works and behaves just like a physical CD/DVD drive, but it exists only virtually.
WARCreateData capture and Deposit
Personal Archiving
Web Capture
Google Chrome browser extension for creating WARC files from web pages
WAS (Web Archiving Service)Web CaptureThe Web Archiving Service (WAS) is a Web-based curatorial tool that enables libraries and archivists to capture, curate, analyze, and preserve Web-based government and political information.
WAXToolbarWeb CaptureWAXToolbar is a firefox extension to help users with common tasks encountered surfing a web archive.
WCT (Web Curator Tool)Metadata Processing
Web Capture
Web Curator Tool (WCT) is a workflow management application for selective web archiving.
WarcManagerFile Management
Web Capture
The WARC Manager is a web-based UI for managing and querying collections of web crawl data.
WarrickWeb CaptureWarrick is a free utility for reconstructing (or recovering) a website from web archives.
Wayback MachineAccess
Web Capture
The Wayback Machine is a powerful search and discovery tool for use with collections of Web site "snapshots" collected through Web harvesting, usually with Heritrix (ARC or WARC files).
Web Scraper Plus+Web CaptureWeb Scraper Plus+ takes data from the web and puts it into a spreadsheet or database.
WebCiteCitation and Impact Tracking
Persistent Identification
Web Capture
WebCite is an on-demand web archiving service that takes snapshots of Internet-accessible digital objects at the behest of users, storing the data on their own servers and assigning unique identifiers to those instances of the material.
WebShotWeb CaptureWebShot allows you to take screenshots of web pages and save them as full sized images or thumbnails.
Webkit2pngWeb Capturewebkit2png is a command line tool that creates png screenshots of webpages.
WebrecorderWeb CaptureWebrecorder is a hosted web archiving tool with which users can capture what they see as they browse websites and save that information (locally or to a free account)
XXCopyFile CopyXXCopy is an expanded version of Xcopy
XcopyFile CopyXcopy copies files and directories, including subdirectories.
Xenu's Link SleuthWeb CaptureThe tool checks the hyperlinks on websites.
YT-DLP (You Tube Download P)Web CaptureSupports download of youtube videos, based on the now defunct YT-DL