Difference between revisions of "Apache Tika"
Jump to navigation
Jump to search
Prwheatley (talk | contribs) m (Reverted edits by Prwheatley (talk) to last revision by Andy Jackson) |
Prwheatley (talk | contribs) |
||
Line 13: | Line 13: | ||
= Description = | = Description = | ||
− | Java based tool for | + | Java based tool for identifying file formats using signatures and extracting metadata and text content from documents. |
= User Experiences = | = User Experiences = |
Revision as of 10:58, 7 November 2014
Description
Java based tool for identifying file formats using signatures and extracting metadata and text content from documents.
User Experiences
- Comparing how Apache Tika and DROID perform HTML identification: How much of the UK's HTML is valid?
- Apache Tika is a core component of the Web Archive Discovery indexer and profiler.
- A number of pages on the OPF Wiki mention Tika.
Development Activity
Error in widget Ohloh Project: unable to write file /var/www/html/extensions/Widgets/compiled_templates/wrt6740a3d66a8969_66451640
Release Feed
Link to any RSS feed that is updated when new releases occur, if any, e.g: Failed to load RSS feed from http://projects.apache.org/feeds/rss/tika.xml: There was a problem during the HTTP request: 404 Not Found
Activity Feed
Link to any RSS feed that is updated when issue or code updates occur, if any, e.g:
- 2024-11-22 15:31:34
- opened BEAM-12133 - Tracking: DataFrame API future e...
- by Anonymous Useranonymoushttp://activitystrea.ms/schema/1.0/person
- 2024-11-22 15:31:32
- ASF GitHub Bot logged '10m' on https://gith...
- by ASF GitHub Bothttps://issues.apache.org/jira/secure/ViewProfile.jspa?name=githubbotgithubbothttp://activitystrea.ms/schema/1.0/person
- 2024-11-22 15:31:32
- changed the status to Triage Needed on changed the status to Triage Needed on changed the status to Triage Needed on changed the status to Triage Needed on changed the status to Triage Needed on