Description

	Java based tool for identifying file formats using signatures and extracting metadata and text content from documents.
Homepage:	http://tika.apache.org/
License:	Apache License, Version 2.0
Platforms:	Java

Java based tool for identifying file formats using signatures and extracting metadata and text content from documents.

User Experiences

Comparing how Apache Tika and DROID perform HTML identification: How much of the UK's HTML is valid?
Apache Tika is a core component of the Web Archive Discovery indexer and profiler.
A number of pages on the OPF Wiki mention Tika.

Error in widget Ohloh Project: unable to write file /var/www/html/extensions/Widgets/compiled_templates/wrt662844d40e5708_33374305

Failed to load RSS feed from http://projects.apache.org/feeds/rss/tika.xml: There was a problem during the HTTP request: 404 Not Found

2024-04-23 23:29:53: ASF GitHub Bot updated a link from ASF GitHub Bot updated a link from Mason Chen added the Component 'Mason Chen commented on Mason Chen resolved

Java based tool for identifying file formats using signatures and extracting metadata and text content from documents.
Homepage:	http://tika.apache.org/
License:	Apache License, Version 2.0
Platforms:	Java