WARC

From COPTR
Jump to navigation Jump to search
WARC
Wikidata:Q7978505
File formats wiki:formats:WARC

Tools that have this format as input[edit]

ToolPurpose
JHOVE (Harvard Object Validation Environment)JHOVE provides functions to perform format-specific identification, validation, and characterization of digital objects.
TweetSetsTweetSets provides a web interface that allows users to (1) select from existing datasets; (2) limit the dataset by querying on keywords, hashtags, and other parameters; (3) generate and download dataset derivatives such as the list of tweet ids and mention nodes/edges.
Warc AnalyzerA proof-of-concept client side webapp for analyzing WARC data using Webrecorder's warcio.js. No WARC data is uploaded anywhere it runs on your machine. The idea is that it would be useful for archivists who have been given a pile of WARC data and they would like to quickly know what it contains.
Warc-proxyWarc-proxy is a simple tool to view WARC content in Firefox
WarctoolsCommand line tools and libraries for handling and manipulating WARC files (and HTTP contents)

Tools that have this format as output[edit]

ToolPurpose
ArchiveBoxArchiveBox is an open source tool that lets organizations & individuals archive both public & private web content while retaining control over their data.
Perma.ccA tool that captures, stores, plays-back and provides a new URL for web citation. Built and maintained at the Harvard Law School Library.
SFM (Social Feed Manager)Social Feed Manager is open source software that provides a web interface to enable users to harvest social media data and web resources from Twitter and other social media platforms.
WARCreateGoogle Chrome browser extension for creating WARC files from web pages
WarcitWarcit is a command-line tool that converts directories (including nested directories), files (including HTML or other web assets and data files) and ZIP files to Web Archives (WARC).
WarctoolsCommand line tools and libraries for handling and manipulating WARC files (and HTTP contents)