Difference between revisions of "Kraken"

From COPTR
Jump to navigation Jump to search
 
(One intermediate revision by one other user not shown)
Line 1: Line 1:
<!-- Use the structure provided in this template, do not change it! -->
+
{{Infobox tool
 
 
{{Infobox_tool
 
 
|purpose=Open Source turn-key OCR system forked from ocropus
 
|purpose=Open Source turn-key OCR system forked from ocropus
 
|homepage=http://kraken.re/
 
|homepage=http://kraken.re/
 
|license=Apache 2.0 License
 
|license=Apache 2.0 License
 
|platforms=Linux
 
|platforms=Linux
 +
|formats_out=ALTO (Analyzed Layout and Text Object)
 +
|function=OCR
 +
|content=Image, Document
 
}}
 
}}
<!-- Note that to use the image field, you should leave the value as {{PAGENAMEE}}.png (or similar) and upload a copy of the image. Hot-linking is not supported. If you don't want an image, just remove that line. -->
+
{{Infobox tool details}}
 +
<!-- Use the structure provided in this template, do not change it! -->
  
<!-- Add one or more categories to describe the function of the tool, such as:
 
[[Category:Metadata Extraction]] or [[Category:Preservation System]] or [[Category:Backup]]
 
Choose carefully, and view the list of existing categories first (see the Navigation sidebar on the left) -->
 
[[Category:OCR]]
 
  
<!-- Add relevant categories to describe the content type that the tool addresses, such as:
+
<!-- Note that to use the image field, you should leave the value as {{PAGENAMEE}}.png (or similar) and upload a copy of the image. Hot-linking is not supported. If you don't want an image, just remove that line. -->
[[Category:Audio]] or [[Category:Document]] or [[Category:Research Data]]
 
Choose carefully, and view the list of existing categories first (see the Navigation sidebar on the left). If the tool works on any content type, do not add a category. -->
 
[[Category:ALTO format]]
 
  
 
== Description ==
 
== Description ==

Latest revision as of 14:54, 8 June 2021


Open Source turn-key OCR system forked from ocropus
Homepage:http://kraken.re/
License:Apache 2.0 License
Platforms:Linux
Output Formats:ALTO (Analyzed Layout and Text Object)
Function:OCR
Content type:Image,Document





Description[edit]

kraken is a turn-key OCR system forked from ocropus. It is intended to rectify a number of issues while preserving (mostly) functional equivalence.

main features:

  • Script detection and multi-script recognition support
  • Right-to-Left, BiDi, and Top-to-Bottom script support
  • ALTO, abbyXML, and hOCR output
  • Word bounding boxes and character cuts
  • Public repository of model files
  • Lightweight model files
  • Variable recognition network architectures

All functionality not pertaining to OCR and prerequisite steps has been removed, i.e. no more error rate measuring, etc.

User Experiences[edit]

Development Activity[edit]

Commits : https://github.com/mittagessen/kraken/commits

Issues : https://github.com/mittagessen/kraken/issues