Web Archive Discovery
Full-text indexing system, using Apache Solr as the search back-end. Supports command-line or large-scale map-reduce (Hadoop) processing of ARC and WARC files. Also integrates file format analysis and scans for some known preservation risks.
- Used by the UK Web Archive to provide access to their collections. More details TBA.
All development activity is visible on GitHub: http://github.com/ukwa/webarchive-discovery
There is also a #webarchive-discovery channel on the IIPC Slack service. Contact https://twitter.com/NetPreserve for details.
Below the last 3 release feeds: Failed to load RSS feed from https://github.com/ukwa/webarchive-discovery/releases.atom: Error fetching URL: error:1407742E:SSL routines:SSL23_GET_SERVER_HELLO:tlsv1 alert protocol version
Below the last 5 commits: Failed to load RSS feed from https://github.com/ukwa/webarchive-discovery/commits/master.atom: Error fetching URL: error:1407742E:SSL routines:SSL23_GET_SERVER_HELLO:tlsv1 alert protocol version
Andy Jackson (100.0%)