Web Archive Discovery

From COPTR
Revision as of 10:22, 14 February 2014 by MediaWiki default (talk | contribs) (Added initial webarchive-discovery outline)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search


Indexing and discovery tools for web archives.
Homepage:https://github.com/ukwa/webarchive-discovery
License:Mixed
Platforms:Java

Description

Full-text indexing system, using Apache Solr as the search back-end. Supports command-line and large-scale map-reduce (Hadoop) processing of ARC and WARC files. Also integrates file format analysis and scans for some known preservation risks.

User Experiences

  • Used by the UK Web Archive to provide access to their collections. More details TBA.

Development Activity

Error in widget Ohloh Project: unable to write file /var/www/html/extensions/Widgets/compiled_templates/wrt673f6832e05413_20179938