Difference between revisions of "Kraken"

From COPTR
Jump to navigation Jump to search
Line 1: Line 1:
<!-- Use the structure provided in this template, do not change it! -->
+
{{Infobox tool
 
 
{{Infobox_tool
 
 
|purpose=Open Source turn-key OCR system forked from ocropus
 
|purpose=Open Source turn-key OCR system forked from ocropus
 
|homepage=http://kraken.re/
 
|homepage=http://kraken.re/
 
|license=Apache 2.0 License
 
|license=Apache 2.0 License
 
|platforms=Linux
 
|platforms=Linux
 +
|function=OCR
 +
|content=ALTO format
 
}}
 
}}
<!-- Note that to use the image field, you should leave the value as {{PAGENAMEE}}.png (or similar) and upload a copy of the image. Hot-linking is not supported. If you don't want an image, just remove that line. -->
+
{{Infobox tool details}}
 +
<!-- Use the structure provided in this template, do not change it! -->
  
<!-- Add one or more categories to describe the function of the tool, such as:
 
[[Category:Metadata Extraction]] or [[Category:Preservation System]] or [[Category:Backup]]
 
Choose carefully, and view the list of existing categories first (see the Navigation sidebar on the left) -->
 
[[Category:OCR]]
 
  
<!-- Add relevant categories to describe the content type that the tool addresses, such as:
+
<!-- Note that to use the image field, you should leave the value as {{PAGENAMEE}}.png (or similar) and upload a copy of the image. Hot-linking is not supported. If you don't want an image, just remove that line. -->
[[Category:Audio]] or [[Category:Document]] or [[Category:Research Data]]
 
Choose carefully, and view the list of existing categories first (see the Navigation sidebar on the left). If the tool works on any content type, do not add a category. -->
 
[[Category:ALTO format]]
 
  
 
== Description ==
 
== Description ==

Revision as of 09:42, 23 April 2021



Open Source turn-key OCR system forked from ocropus
Homepage:http://kraken.re/
License:Apache 2.0 License
Platforms:Linux
Function:OCR
Content type:ALTO format





Description

kraken is a turn-key OCR system forked from ocropus. It is intended to rectify a number of issues while preserving (mostly) functional equivalence.

main features:

  • Script detection and multi-script recognition support
  • Right-to-Left, BiDi, and Top-to-Bottom script support
  • ALTO, abbyXML, and hOCR output
  • Word bounding boxes and character cuts
  • Public repository of model files
  • Lightweight model files
  • Variable recognition network architectures

All functionality not pertaining to OCR and prerequisite steps has been removed, i.e. no more error rate measuring, etc.

User Experiences

Development Activity

Commits : https://github.com/mittagessen/kraken/commits

Issues : https://github.com/mittagessen/kraken/issues