Editing Web Archive Discovery
Jump to navigation
Jump to search
Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.
The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.
Latest revision | Your text | ||
Line 1: | Line 1: | ||
− | {{ | + | <!-- Use the structure provided in this template, do not change it! --> |
+ | |||
+ | {{Infobox_tool | ||
|purpose=Indexing and discovery tools for web archives. | |purpose=Indexing and discovery tools for web archives. | ||
|homepage=https://github.com/ukwa/webarchive-discovery | |homepage=https://github.com/ukwa/webarchive-discovery | ||
|license=Mixed | |license=Mixed | ||
|platforms=Java | |platforms=Java | ||
− | |||
− | |||
− | |||
− | |||
− | |||
}} | }} | ||
− | |||
− | |||
− | |||
− | + | <!-- Add one ore more categories to describe the function of the tool. Choose carefully, and view the list of existing categories first (see the Navigation sidebar on the left). The following are common category examples, remove those that don't apply --> | |
+ | [[Category:Metadata Extraction]] | ||
+ | [[Category:File Format Identification]] | ||
+ | [[Category:Content Profiling]] | ||
+ | [[Category:Discovery]] | ||
− | |||
− | + | <!-- Add relevant categories to describe the content type that the tool addresses. Choose carefully, and view the list of existing categories first (see the Navigation sidebar on the left). If the tool works on any content type, do not add a category. The following are common category examples, remove those that don't apply --> | |
+ | [[Category:Web]] | ||
+ | [[Category:Web_Archive]] | ||
− | + | == Description == | |
+ | <!-- Describe the what the tool does, focusing on it's digital preservation value. Keep it factual. --> | ||
+ | Full-text indexing system, using Apache Solr as the search back-end. Supports command-line or large-scale map-reduce (Hadoop) processing of ARC and WARC files. Also integrates file format analysis and scans for some known preservation risks. | ||
== User Experiences == | == User Experiences == | ||
<!-- Add hotlinks to user experiences with the tool (eg. blog posts). These should illustrate the effectiveness (or otherwise) of the tool. --> | <!-- Add hotlinks to user experiences with the tool (eg. blog posts). These should illustrate the effectiveness (or otherwise) of the tool. --> | ||
− | * Used by the [http://www.webarchive.org.uk/ UK Web Archive] to provide access to their collections. | + | * Used by the [http://www.webarchive.org.uk/ UK Web Archive] to provide access to their collections. More details TBA. |
− | |||
= Development Activity = | = Development Activity = | ||
Line 41: | Line 41: | ||
Below the last 5 commits: | Below the last 5 commits: | ||
<rss max=5>https://github.com/ukwa/webarchive-discovery/commits/master.atom</rss> | <rss max=5>https://github.com/ukwa/webarchive-discovery/commits/master.atom</rss> | ||
+ | |||
+ | |||
+ | {{Infobox_tool_details | ||
+ | |releases_rss= | ||
+ | |issues_rss= | ||
+ | |mailing_lists= | ||
+ | |ohloh_id=Heritrix | ||
+ | }} |