Difference between revisions of "Web Archive Discovery"

From COPTR
Jump to: navigation, search
(Added initial webarchive-discovery outline)
 
 
Line 17: Line 17:
 
<!-- Add relevant categories to describe the content type that the tool addresses. Choose carefully, and view the list of existing categories first (see the Navigation sidebar on the left). If the tool works on any content type, do not add a category. The following are common category examples, remove those that don't apply -->
 
<!-- Add relevant categories to describe the content type that the tool addresses. Choose carefully, and view the list of existing categories first (see the Navigation sidebar on the left). If the tool works on any content type, do not add a category. The following are common category examples, remove those that don't apply -->
 
[[Category:Web]]
 
[[Category:Web]]
[[Category:Web Archive]]
 
  
 
== Description ==
 
== Description ==

Latest revision as of 15:29, 28 October 2014



Indexing and discovery tools for web archives.
Homepage:https://github.com/ukwa/webarchive-discovery
License:Mixed
Platforms:Java

[edit] Description

Full-text indexing system, using Apache Solr as the search back-end. Supports command-line and large-scale map-reduce (Hadoop) processing of ARC and WARC files. Also integrates file format analysis and scans for some known preservation risks.

[edit] User Experiences

  • Used by the UK Web Archive to provide access to their collections. More details TBA.

[edit] Development Activity


Contributors

Andy Jackson (100.0%)