Difference between revisions of "Tesseract-ocr"
Jump to navigation
Jump to search
Line 1: | Line 1: | ||
{{Infobox_tool | {{Infobox_tool | ||
|purpose=Open source OCR engine, accepting uncompressed TIFF files as input | |purpose=Open source OCR engine, accepting uncompressed TIFF files as input | ||
− | |image= | + | |image= |
|homepage=http://code.google.com/p/tesseract-ocr/ | |homepage=http://code.google.com/p/tesseract-ocr/ | ||
|license=Apache 2.0 License EXCEPT the tesseractTrainer.py, which is licensed with GPL | |license=Apache 2.0 License EXCEPT the tesseractTrainer.py, which is licensed with GPL |
Revision as of 19:25, 3 October 2014
Description
Tesseract is probably the most accurate open source OCR engine available. Combined with the Leptonica Image Processing Library it can read a wide variety of image formats and convert them to text in over 60 languages. It was one of the top 3 engines in the 1995 UNLV Accuracy test. Between 1995 and 2006 it had little work done on it, but since then it has been improved extensively by Google.
User Experiences
Applied in an AQuA Mashup that resulted in the Solution page: Compare OCR results of the same source material in different formats (TIFF, JP2)
Development Activity
Error in widget Ohloh Project: unable to write file /var/www/html/extensions/Widgets/compiled_templates/wrt673fdffe4c76e9_56440073