Web Archive Discovery

From COPTR
Revision as of 14:16, 24 September 2018 by MediaWiki default (talk | contribs) (Added activity feeds.)
Jump to navigation Jump to search


Indexing and discovery tools for web archives.
Homepage:https://github.com/ukwa/webarchive-discovery
License:Mixed
Platforms:Java

Description

Full-text indexing system, using Apache Solr as the search back-end. Supports command-line or large-scale map-reduce (Hadoop) processing of ARC and WARC files. Also integrates file format analysis and scans for some known preservation risks.

User Experiences

  • Used by the UK Web Archive to provide access to their collections. More details TBA.

Development Activity

All development activity is visible on GitHub: http://github.com/ukwa/webarchive-discovery

There is also a #webarchive-discovery channel on the IIPC Slack service. Contact https://twitter.com/NetPreserve for details.

Release Feed

Below the last 3 release feeds:

2024-04-02 09:25:58
[tag:github.com,2008:Repository/7257232/warc-discovery-3.3.1 Revert of source_file_path]
by GilHoggarth
2023-06-02 11:04:22
[tag:github.com,2008:Repository/7257232/warc-discovery-3.3.0 warc-discovery-3.3.0]
by anjackson
2020-11-27 12:25:29
[tag:github.com,2008:Repository/7257232/warc-discovery-3.1.0 warc-discovery-3.1.0]
by anjackson


Activity Feed

Below the last 5 commits:

2024-04-02 09:24:57
[tag:github.com,2008:Grit::Commit/13595bead029fd44f133ec6c18f689edde202e53 Update CHANGES.md]
by GilHoggarth https://github.com/GilHoggarth
2024-04-02 08:33:41
[tag:github.com,2008:Grit::Commit/2581409f298d2617fb21461edadd0044f70db617 Merge pull request #313 from thomasegense/master]
by GilHoggarth https://github.com/GilHoggarth
2023-12-26 09:58:01
[tag:github.com,2008:Grit::Commit/f98deaddfde179051ee3ba67adb3263b8111fc81 typo fix]
by teg@kb.dk
2023-12-24 09:02:03
[tag:github.com,2008:Grit::Commit/c7873c9a60e7029b70c57a3836690699dd74fa34 Added comment]
by teg@kb.dk
2023-12-24 08:59:23
[tag:github.com,2008:Grit::Commit/9f7e9105841a1aa64613cf39c8be0b9edd1b5947 Changed to debug. Some harvest tools generate a request record for every]
by teg@kb.dk


Error in widget Ohloh Project: unable to write file /var/www/html/extensions/Widgets/compiled_templates/wrt6628f6e8ae2944_53084598