Apache Tika
Description
Java based tool for detecting and extracting metadata and text content from documents. Apache Tika is a core component of the Web Archive Discovery indexer and profiler.
User Experiences
- Comparing how Apache Tika and DROID perform HTML identification: How much of the UK's HTML is valid?
- A number of pages on the OPF Wiki mention Tika.
Development Activity
Release Feed
Link to any RSS feed that is updated when new releases occur, if any, e.g: Failed to load RSS feed from http://projects.apache.org/feeds/rss/tika.xml: There was a problem during the HTTP request: 404 Not Found
Activity Feed
Link to any RSS feed that is updated when issue or code updates occur, if any, e.g:
- 2024-06-05 20:39:36
- Michael Osipov removed the Fix Version 'Justine Olshan commented on Kirill Gusakov updated the Epic Child of Kirill Gusakov created Gray updated the Description of
When an enum field is written with a symbol that is not one of the valid enum symbols configured in the schema, the code generates a NPE which turns into a null valu...
- by Grayhttps://issues.apache.org/jira/secure/ViewProfile.jspa?name=dig090dig090http://activitystrea.ms/schema/1.0/person
- 2024-06-05 20:38:27
- ASF subversion and git services commented on Gray updated the Description of
When an enum field is written with a symbol that is not one of the valid enum symbols configured in the schema, the code generates a NPE which turns into a null valu...
- by Grayhttps://issues.apache.org/jira/secure/ViewProfile.jspa?name=dig090dig090http://activitystrea.ms/schema/1.0/person