Difference between revisions of "Kraken"

From COPTR
Jump to navigation Jump to search
m
 
(2 intermediate revisions by 2 users not shown)
Line 1: Line 1:
<!-- Use the structure provided in this template, do not change it! -->
+
{{Infobox tool
 
 
{{Infobox_tool
 
 
|purpose=Open Source turn-key OCR system forked from ocropus
 
|purpose=Open Source turn-key OCR system forked from ocropus
 
|homepage=http://kraken.re/
 
|homepage=http://kraken.re/
 
|license=Apache 2.0 License
 
|license=Apache 2.0 License
 
|platforms=Linux
 
|platforms=Linux
 +
|formats_out=ALTO (Analyzed Layout and Text Object)
 +
|function=OCR
 +
|content=Image, Document
 
}}
 
}}
<!-- Note that to use the image field, you should leave the value as {{PAGENAMEE}}.png (or similar) and upload a copy of the image. Hot-linking is not supported. If you don't want an image, just remove that line. -->
+
{{Infobox tool details}}
 +
<!-- Use the structure provided in this template, do not change it! -->
  
<!-- Add one or more categories to describe the function of the tool, such as:
 
[[Category:Metadata Extraction]] or [[Category:Preservation System]] or [[Category:Backup]]
 
Choose carefully, and view the list of existing categories first (see the Navigation sidebar on the left) -->
 
[[Category:OCR]]
 
  
<!-- Add relevant categories to describe the content type that the tool addresses, such as:
+
<!-- Note that to use the image field, you should leave the value as {{PAGENAMEE}}.png (or similar) and upload a copy of the image. Hot-linking is not supported. If you don't want an image, just remove that line. -->
[[Category:Audio]] or [[Category:Document]] or [[Category:Research Data]]
 
Choose carefully, and view the list of existing categories first (see the Navigation sidebar on the left). If the tool works on any content type, do not add a category. -->
 
[[Category:ALTO format]]
 
  
 
== Description ==
 
== Description ==
 
<!-- Describe the what the tool does, focusing on it's digital preservation value. Keep it factual. -->
 
<!-- Describe the what the tool does, focusing on it's digital preservation value. Keep it factual. -->
kraken is a turn-key OCR system forked from ocropus. It is intended to rectify a number of issues while preserving (mostly) functional equivalence.
+
kraken is a turn-key OCR system forked from [https://github.com/tmbarchive/ocropy ocropus]. It is intended to rectify a number of issues while preserving (mostly) functional equivalence.
  
 
'''main features:'''
 
'''main features:'''
Line 33: Line 28:
  
 
All functionality not pertaining to OCR and prerequisite steps has been removed, i.e. no more error rate measuring, etc.
 
All functionality not pertaining to OCR and prerequisite steps has been removed, i.e. no more error rate measuring, etc.
 
  
 
== User Experiences ==
 
== User Experiences ==

Latest revision as of 14:54, 8 June 2021


Open Source turn-key OCR system forked from ocropus
Homepage:http://kraken.re/
License:Apache 2.0 License
Platforms:Linux
Output Formats:ALTO (Analyzed Layout and Text Object)
Function:OCR
Content type:Image,Document





Description[edit]

kraken is a turn-key OCR system forked from ocropus. It is intended to rectify a number of issues while preserving (mostly) functional equivalence.

main features:

  • Script detection and multi-script recognition support
  • Right-to-Left, BiDi, and Top-to-Bottom script support
  • ALTO, abbyXML, and hOCR output
  • Word bounding boxes and character cuts
  • Public repository of model files
  • Lightweight model files
  • Variable recognition network architectures

All functionality not pertaining to OCR and prerequisite steps has been removed, i.e. no more error rate measuring, etc.

User Experiences[edit]

Development Activity[edit]

Commits : https://github.com/mittagessen/kraken/commits

Issues : https://github.com/mittagessen/kraken/issues