Web Archive Discovery
Revision as of 14:16, 24 September 2018 by MediaWiki default (talk | contribs) (Added activity feeds.)
Description
Full-text indexing system, using Apache Solr as the search back-end. Supports command-line or large-scale map-reduce (Hadoop) processing of ARC and WARC files. Also integrates file format analysis and scans for some known preservation risks.
User Experiences
- Used by the UK Web Archive to provide access to their collections. More details TBA.
Development Activity
All development activity is visible on GitHub: http://github.com/ukwa/webarchive-discovery
There is also a #webarchive-discovery channel on the IIPC Slack service. Contact https://twitter.com/NetPreserve for details.
Release Feed
Below the last 3 release feeds:
- 2024-04-02 09:25:58
- [tag:github.com,2008:Repository/7257232/warc-discovery-3.3.1 Revert of source_file_path]
- by GilHoggarth
- 2023-06-02 11:04:22
- [tag:github.com,2008:Repository/7257232/warc-discovery-3.3.0 warc-discovery-3.3.0]
- by anjackson
- 2020-11-27 12:25:29
- [tag:github.com,2008:Repository/7257232/warc-discovery-3.1.0 warc-discovery-3.1.0]
- by anjackson
Activity Feed
Below the last 5 commits:
- 2024-04-02 09:24:57
- [tag:github.com,2008:Grit::Commit/13595bead029fd44f133ec6c18f689edde202e53 Update CHANGES.md]
- by GilHoggarth https://github.com/GilHoggarth
- 2024-04-02 08:33:41
- [tag:github.com,2008:Grit::Commit/2581409f298d2617fb21461edadd0044f70db617 Merge pull request #313 from thomasegense/master]
- by GilHoggarth https://github.com/GilHoggarth
- 2023-12-26 09:58:01
- [tag:github.com,2008:Grit::Commit/f98deaddfde179051ee3ba67adb3263b8111fc81 typo fix]
- by teg@kb.dk
- 2023-12-24 09:02:03
- [tag:github.com,2008:Grit::Commit/c7873c9a60e7029b70c57a3836690699dd74fa34 Added comment]
- by teg@kb.dk
- 2023-12-24 08:59:23
- [tag:github.com,2008:Grit::Commit/9f7e9105841a1aa64613cf39c8be0b9edd1b5947 Changed to debug. Some harvest tools generate a request record for every]
- by teg@kb.dk
Error in widget Ohloh Project: unable to write file /var/www/html/extensions/Widgets/compiled_templates/wrt662a957886f251_15521948