COPTR - User contributions [en-gb]

Tesseract-ocr

2014-10-04T14:30:00Z

Lreilly:

{{Infobox_tool
|purpose=Open source OCR engine, accepting uncompressed TIFF files as input
|image=Tesseract.png
|homepage=http://code.google.com/p/tesseract-ocr/
|license=Apache 2.0 License EXCEPT the tesseractTrainer.py, which is licensed with GPL
|platforms=Linux, Windows, MacOSX
}}


[[Category:OCR]]

= Description =
Tesseract is probably the most accurate open source OCR engine available. Combined with the [http://leptonica.com/ Leptonica Image Processing Library] it can read a wide variety of image formats and convert them to text in over 60 languages. It was one of the top 3 engines in the 1995 UNLV Accuracy test. Between 1995 and 2006 it had little work done on it, but since then it has been improved extensively by Google.

== Provider ==
Development of Tesseract is sponsored by Google. Its chief developer is [http://research.google.com/pubs/author4479.html Ray Smith].

==Licensing and cost==
Tesseract is an Open Source OCR engine, available under the [http://www.apache.org/licenses/LICENSE-2.0 Apache 2.0 license]. It can be used directly, or (for programmers) using an [http://code.google.com/p/tesseract-ocr/source/browse/trunk/api/baseapi.h API].

==History==
It was initially developed at HP during a 10 year period from 1984 to 1994. After a decade of minimal development it was released in 2005 for open source. Google acquired Tesseract in 2006 and currently maintains its development.

==Platform and interoperability==
*The latest downloads for Linux and Windows are found on [https://drive.google.com/folderview?id=0B7l10Bj_LprhQnpSRkpGMGV2eE0&usp=sharing GoogleDrive]. Older versions of Tesseract and its language packs are found on the discontinued [https://code.google.com/p/tesseract-ocr/downloads/list Google Code download page].
*The easiest way to install Tesseract on Mac OSX is with [https://www.macports.org/ MacPorts]. Once it is installed, you can install Tesseract by running the command ''sudo port install tesseract'', and any language with ''sudo port install tesseract-<langcode>''. A list of available langcodes can be found on the [https://www.macports.org/ports.php?by=name&substr=tesseract- MacPorts Tesseract page].
*Dependencies for running Tesseract include Autotools and [http://www.leptonica.org/ Leptonica] . The Windows version requires installation of [http://msdn.microsoft.com/en-us/vstudio/aa718325.aspx Visual Studio]. More information about required Ubantu libraries and links to specific requirements are on the [https://code.google.com/p/tesseract-ocr/wiki/Compiling Tesseract Wiki].
*Other programs such as Scan Tailor, unpaper, ImageJ, Gimp or ImageMagick may be needed to properly prepare images for use in Tesseract.

==Functional notes==
===Input supported===
Any image readable by Leptonica is supported in Tesseract including BMP, PNM, PNG, JFIF, JPEG, and TIFF.
GIF is not supported [http://www.leptonica.com/library-overview.html http://www.leptonica.com/library-overview.html].
===Output generated===
Tesseract outputs to TXT. PDF output was added in version 3.03.

==Documentation and support==
*Smith, Ray (2007). An Overview of the Tesseract OCR Engine [http://static.googleusercontent.com/media/research.google.com/en/us/pubs/archive/33418.pdf http://static.googleusercontent.com/media/research.google.com/en/us/pubs/archive/33418.pdf ]
*Installation information is found on the [https://code.google.com/p/tesseract-ocr/wiki/ReadMe ReadMe] page of the project site.
*Support is offered and issues are addressed on the [https://code.google.com/p/tesseract-ocr/issues/list Issues] page of the project site.

==Included in==
*Integration with the free [http://www.dcc.ac.uk/resources/external/xena-software-0 Xena-Digital Preservation Software][http://sourceforge.net/projects/xena/?source=navbar http://sourceforge.net/projects/xena/?source=navbar]
*Integration with Free Online OCR [http://www.free-ocr.com/faq.html http://www.free-ocr.com/faq.html]

==Usability==
*Tesseract was primarily developed for English OCR capability, but 47 language packs have been developed for use with other languages [https://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseract.1.html https://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseract.1.html]. Tesseract 2.0x and 3.0x are [https://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3 trainable] for other languages.
*There is no built-in GUI, but there are several available from the [https://code.google.com/p/tesseract-ocr/wiki/3rdParty 3rdParty] page.

= User Experiences =
*Lazorchak, Butch. (2014). Making Scanned Content Accessible Using Full-text Search and OCR [http://blogs.loc.gov/digitalpreservation/2014/08/making-scanned-content-accessible-using-full-text-search-and-ocr/ http://blogs.loc.gov/digitalpreservation/2014/08/making-scanned-content-accessible-using-full-text-search-and-ocr/]
*Texas A&M University. (2012-). Early Modern OCR Project Workflow [http://emop.tamu.edu/about http://emop.tamu.edu/about http://emop.tamu.edu/about http://emop.tamu.edu/about]
*Adams, Chris. (2014). Content Search on a Budget-using Tesseract on large TIFF files [http://chris.improbable.org/2014/3/17/content-search-on-a-budget/ http://chris.improbable.org/2014/3/17/content-search-on-a-budget/]
*PSNC Digital Libraries Team. (2011). Tesseract 3.0 installation on Ubuntu 10.10 server [http://dl.psnc.pl/2011/01/24/tesseract-3-0-installation-on-ubuntu-10-10-server/ http://dl.psnc.pl/2011/01/24/tesseract-3-0-installation-on-ubuntu-10-10-server/]
*Lacy, David. (2014). Digital Library upgrade provides enhanced discovery [http://blog.library.villanova.edu/digitallibrary/2014/02/18/digital-library-upgrade-provides-enhanced-discovery/#sthash.OwCtLlEc.dpuf http://blog.library.villanova.edu/digitallibrary/2014/02/18/digital-library-upgrade-provides-enhanced-discovery/#sthash.OwCtLlEc.dpuf]
*Applied in an AQuA Mashup that resulted in the Solution page: [http://wiki.opf-labs.org/display/AQuA/Compare+OCR+results+of+the+same+source+material+in+different+formats+%28TIFF%2C+JP2%29 Compare OCR results of the same source material in different formats (TIFF, JP2)]

= Activity Feeds =
==Google Code Source Feed==
Below the last 5 source updates:
<rss max=5>https://code.google.com/feeds/p/tesseract-ocr/gitchanges/basic</rss>
==Google Code Wiki Feed==
Below are the last 3 wiki updates:
<rss max=3>https://code.google.com/feeds/p/tesseract-ocr/gitchanges/basic?repo=wiki</rss>
==Google Code Issue Feed==
Below are the last 3 issue updates:
<rss max=3>https://code.google.com/feeds/p/tesseract-ocr/issueupdates/basic</rss>

{{Infobox_tool_details
|ohloh_id=tesseract-ocr
}}

Tesseract-ocr

2014-10-04T14:28:45Z

Lreilly:

{{Infobox_tool
|purpose=Open source OCR engine, accepting uncompressed TIFF files as input
|image=Tesseract.png
|homepage=http://code.google.com/p/tesseract-ocr/
|license=Apache 2.0 License EXCEPT the tesseractTrainer.py, which is licensed with GPL
|platforms=Linux, Windows, MacOSX
}}


[[Category:OCR]]

= Description =
Tesseract is probably the most accurate open source OCR engine available. Combined with the [http://leptonica.com/ Leptonica Image Processing Library] it can read a wide variety of image formats and convert them to text in over 60 languages. It was one of the top 3 engines in the 1995 UNLV Accuracy test. Between 1995 and 2006 it had little work done on it, but since then it has been improved extensively by Google.

== Provider ==
Development of Tesseract is sponsored by Google. Its chief developer is [http://research.google.com/pubs/author4479.html Ray Smith].

==Licensing and cost==
Tesseract is an Open Source OCR engine, available under the [http://www.apache.org/licenses/LICENSE-2.0 Apache 2.0 license]. It can be used directly, or (for programmers) using an [http://code.google.com/p/tesseract-ocr/source/browse/trunk/api/baseapi.h API].

==History==
It was initially developed at HP during a 10 year period from 1984 to 1994. After a decade of minimal development it was released in 2005 for open source. Google acquired Tesseract in 2006 and currently maintains its development.

==Platform and interoperability==
*The latest downloads for Linux and Windows are found on [https://drive.google.com/folderview?id=0B7l10Bj_LprhQnpSRkpGMGV2eE0&usp=sharing GoogleDrive]. Older versions of Tesseract and its language packs are found on the discontinued [https://code.google.com/p/tesseract-ocr/downloads/list Google Code download page].
*The easiest way to install Tesseract on Mac OSX is with [https://www.macports.org/ MacPorts]. Once it is installed, you can install Tesseract by running the command ''sudo port install tesseract'', and any language with ''sudo port install tesseract-<langcode>''. A list of available langcodes can be found on the [https://www.macports.org/ports.php?by=name&substr=tesseract- MacPorts Tesseract page].
*Dependencies for running Tesseract include Autotools and [http://www.leptonica.org/ Leptonica] . The Windows version requires installation of [http://msdn.microsoft.com/en-us/vstudio/aa718325.aspx Visual Studio]. More information about required Ubantu libraries and links to specific requirements are on the [https://code.google.com/p/tesseract-ocr/wiki/Compiling Tesseract Wiki].
*Other programs such as Scan Tailor, unpaper, ImageJ, Gimp or ImageMagick may be needed to properly prepare images for use in Tesseract.

==Functional notes==
===Input supported===
Any image readable by Leptonica is supported in Tesseract including BMP, PNM, PNG, JFIF, JPEG, and TIFF.
GIF is not supported [http://www.leptonica.com/library-overview.html http://www.leptonica.com/library-overview.html].
===Output generated===
Tesseract outputs to TXT. PDF output was added in version 3.03.

==Documentation and support==
*Smith, Ray (2007). An Overview of the Tesseract OCR Engine [http://static.googleusercontent.com/media/research.google.com/en/us/pubs/archive/33418.pdf http://static.googleusercontent.com/media/research.google.com/en/us/pubs/archive/33418.pdf ]
*Installation information is found on the [https://code.google.com/p/tesseract-ocr/wiki/ReadMe ReadMe] page of the project site.
*Support is offered and issues are addressed on the [https://code.google.com/p/tesseract-ocr/issues/list Issues] page of the project site.

=Included in=
*Integration with the free [http://www.dcc.ac.uk/resources/external/xena-software-0 Xena-Digital Preservation Software][http://sourceforge.net/projects/xena/?source=navbar http://sourceforge.net/projects/xena/?source=navbar]
*Integration with Free Online OCR [http://www.free-ocr.com/faq.html http://www.free-ocr.com/faq.html]

==Usability==
*Tesseract was primarily developed for English OCR capability, but 47 language packs have been developed for use with other languages [https://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseract.1.html https://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseract.1.html]. Tesseract 2.0x and 3.0x are [https://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3 trainable] for other languages.
*There is no built-in GUI, but there are several available from the [https://code.google.com/p/tesseract-ocr/wiki/3rdParty 3rdParty] page.

= User Experiences =
*Lazorchak, Butch. (2014). Making Scanned Content Accessible Using Full-text Search and OCR [http://blogs.loc.gov/digitalpreservation/2014/08/making-scanned-content-accessible-using-full-text-search-and-ocr/ http://blogs.loc.gov/digitalpreservation/2014/08/making-scanned-content-accessible-using-full-text-search-and-ocr/]
*Texas A&M University. (2012-). Early Modern OCR Project Workflow [http://emop.tamu.edu/about http://emop.tamu.edu/about http://emop.tamu.edu/about http://emop.tamu.edu/about]
*Adams, Chris. (2014). Content Search on a Budget-using Tesseract on large TIFF files [http://chris.improbable.org/2014/3/17/content-search-on-a-budget/ http://chris.improbable.org/2014/3/17/content-search-on-a-budget/]
*PSNC Digital Libraries Team. (2011). Tesseract 3.0 installation on Ubuntu 10.10 server [http://dl.psnc.pl/2011/01/24/tesseract-3-0-installation-on-ubuntu-10-10-server/ http://dl.psnc.pl/2011/01/24/tesseract-3-0-installation-on-ubuntu-10-10-server/]
*Lacy, David. (2014). Digital Library upgrade provides enhanced discovery [http://blog.library.villanova.edu/digitallibrary/2014/02/18/digital-library-upgrade-provides-enhanced-discovery/#sthash.OwCtLlEc.dpuf http://blog.library.villanova.edu/digitallibrary/2014/02/18/digital-library-upgrade-provides-enhanced-discovery/#sthash.OwCtLlEc.dpuf]
*Applied in an AQuA Mashup that resulted in the Solution page: [http://wiki.opf-labs.org/display/AQuA/Compare+OCR+results+of+the+same+source+material+in+different+formats+%28TIFF%2C+JP2%29 Compare OCR results of the same source material in different formats (TIFF, JP2)]

= Activity Feeds =
==Google Code Source Feed==
Below the last 5 source updates:
<rss max=5>https://code.google.com/feeds/p/tesseract-ocr/gitchanges/basic</rss>
==Google Code Wiki Feed==
Below are the last 3 wiki updates:
<rss max=3>https://code.google.com/feeds/p/tesseract-ocr/gitchanges/basic?repo=wiki</rss>
==Google Code Issue Feed==
Below are the last 3 issue updates:
<rss max=3>https://code.google.com/feeds/p/tesseract-ocr/issueupdates/basic</rss>

{{Infobox_tool_details
|ohloh_id=tesseract-ocr
}}

Tesseract-ocr

2014-10-04T14:23:37Z

Lreilly:

{{Infobox_tool
|purpose=Open source OCR engine, accepting uncompressed TIFF files as input
|image=Tesseract.png
|homepage=http://code.google.com/p/tesseract-ocr/
|license=Apache 2.0 License EXCEPT the tesseractTrainer.py, which is licensed with GPL
|platforms=Linux, Windows, MacOSX
}}


[[Category:OCR]]

= Description =
Tesseract is probably the most accurate open source OCR engine available. Combined with the [http://leptonica.com/ Leptonica Image Processing Library] it can read a wide variety of image formats and convert them to text in over 60 languages. It was one of the top 3 engines in the 1995 UNLV Accuracy test. Between 1995 and 2006 it had little work done on it, but since then it has been improved extensively by Google.

== Provider ==
Development of Tesseract is sponsored by Google. Its chief developer is [http://research.google.com/pubs/author4479.html Ray Smith].

==Licensing and cost==
Tesseract is an Open Source OCR engine, available under the [http://www.apache.org/licenses/LICENSE-2.0 Apache 2.0 license]. It can be used directly, or (for programmers) using an [http://code.google.com/p/tesseract-ocr/source/browse/trunk/api/baseapi.h API].

==History==
It was initially developed at HP during a 10 year period from 1984 to 1994. After a decade of minimal development it was released in 2005 for open source. Google acquired Tesseract in 2006 and currently maintains its development.

==Platform and interoperability==
*The latest downloads for Linux and Windows are found on [https://drive.google.com/folderview?id=0B7l10Bj_LprhQnpSRkpGMGV2eE0&usp=sharing GoogleDrive]. Older versions of Tesseract and its language packs are found on the discontinued [https://code.google.com/p/tesseract-ocr/downloads/list Google Code download page].
*The easiest way to install Tesseract on Mac OSX is with [https://www.macports.org/ MacPorts]. Once it is installed, you can install Tesseract by running the command ''sudo port install tesseract'', and any language with ''sudo port install tesseract-<langcode>''. A list of available langcodes can be found on the [https://www.macports.org/ports.php?by=name&substr=tesseract- MacPorts Tesseract page].
*Dependencies for running Tesseract include Autotools and [http://www.leptonica.org/ Leptonica] . The Windows version requires installation of [http://msdn.microsoft.com/en-us/vstudio/aa718325.aspx Visual Studio]. More information about required Ubantu libraries and links to specific requirements are on the [https://code.google.com/p/tesseract-ocr/wiki/Compiling Tesseract Wiki].
*Other programs such as Scan Tailor, unpaper, ImageJ, Gimp or ImageMagick may be needed to properly prepare images for use in Tesseract.

==Functional notes==
===Input supported===
Any image readable by Leptonica is supported in Tesseract including BMP, PNM, PNG, JFIF, JPEG, and TIFF.
GIF is not supported [http://www.leptonica.com/library-overview.html http://www.leptonica.com/library-overview.html].
===Output generated===
Tesseract outputs to TXT. PDF output was added in version 3.03.

==Documentation and support==
*Smith, Ray (2007). An Overview of the Tesseract OCR Engine [http://static.googleusercontent.com/media/research.google.com/en/us/pubs/archive/33418.pdf http://static.googleusercontent.com/media/research.google.com/en/us/pubs/archive/33418.pdf ]
*Installation information is found on the [https://code.google.com/p/tesseract-ocr/wiki/ReadMe ReadMe] page of the project site.
*Support is offered and issues are addressed on the [https://code.google.com/p/tesseract-ocr/issues/list Issues] page of the project site.

==Usability==
*Tesseract was primarily developed for English OCR capability, but 47 language packs have been developed for use with other languages [https://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseract.1.html https://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseract.1.html]. Tesseract 2.0x and 3.0x are [https://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3 trainable] for other languages.
*There is no built-in GUI, but there are several available from the [https://code.google.com/p/tesseract-ocr/wiki/3rdParty 3rdParty] page.

= User Experiences =
==Experiences with Tesseract==
*Lazorchak, Butch. (2014). Making Scanned Content Accessible Using Full-text Search and OCR [http://blogs.loc.gov/digitalpreservation/2014/08/making-scanned-content-accessible-using-full-text-search-and-ocr/ http://blogs.loc.gov/digitalpreservation/2014/08/making-scanned-content-accessible-using-full-text-search-and-ocr/]
*Texas A&M University. (2012-). Early Modern OCR Project Workflow [http://emop.tamu.edu/about http://emop.tamu.edu/about http://emop.tamu.edu/about http://emop.tamu.edu/about]
*Adams, Chris. (2014). Content Search on a Budget-using Tesseract on large TIFF files [http://chris.improbable.org/2014/3/17/content-search-on-a-budget/ http://chris.improbable.org/2014/3/17/content-search-on-a-budget/]
*PSNC Digital Libraries Team. (2011). Tesseract 3.0 installation on Ubuntu 10.10 server [http://dl.psnc.pl/2011/01/24/tesseract-3-0-installation-on-ubuntu-10-10-server/ http://dl.psnc.pl/2011/01/24/tesseract-3-0-installation-on-ubuntu-10-10-server/]
*Lacy, David. (2014). Digital Library upgrade provides enhanced discovery [http://blog.library.villanova.edu/digitallibrary/2014/02/18/digital-library-upgrade-provides-enhanced-discovery/#sthash.OwCtLlEc.dpuf http://blog.library.villanova.edu/digitallibrary/2014/02/18/digital-library-upgrade-provides-enhanced-discovery/#sthash.OwCtLlEc.dpuf]
*Applied in an AQuA Mashup that resulted in the Solution page: [http://wiki.opf-labs.org/display/AQuA/Compare+OCR+results+of+the+same+source+material+in+different+formats+%28TIFF%2C+JP2%29 Compare OCR results of the same source material in different formats (TIFF, JP2)]
==Experiences with software integrating Tesseract==
*Integration with the free [http://www.dcc.ac.uk/resources/external/xena-software-0 Xena-Digital Preservation Software][http://sourceforge.net/projects/xena/?source=navbar http://sourceforge.net/projects/xena/?source=navbar]
*Integration with Free Online OCR [http://www.free-ocr.com/faq.html http://www.free-ocr.com/faq.html]

= Activity Feeds =
==Google Code Source Feed==
Below the last 5 source updates:
<rss max=5>https://code.google.com/feeds/p/tesseract-ocr/gitchanges/basic</rss>
==Google Code Wiki Feed==
Below are the last 3 wiki updates:
<rss max=3>https://code.google.com/feeds/p/tesseract-ocr/gitchanges/basic?repo=wiki</rss>
==Google Code Issue Feed==
Below are the last 3 issue updates:
<rss max=3>https://code.google.com/feeds/p/tesseract-ocr/issueupdates/basic</rss>

{{Infobox_tool_details
|ohloh_id=tesseract-ocr
}}

Tesseract-ocr

2014-10-04T14:12:41Z

Lreilly:

{{Infobox_tool
|purpose=Open source OCR engine, accepting uncompressed TIFF files as input
|image=Tesseract.png
|homepage=http://code.google.com/p/tesseract-ocr/
|license=Apache 2.0 License EXCEPT the tesseractTrainer.py, which is licensed with GPL
|platforms=Linux, Windows, MacOSX
}}


[[Category:OCR]]
[[Category:Quality Assurance]]

= Description =
Tesseract is probably the most accurate open source OCR engine available. Combined with the [http://leptonica.com/ Leptonica Image Processing Library] it can read a wide variety of image formats and convert them to text in over 60 languages. It was one of the top 3 engines in the 1995 UNLV Accuracy test. Between 1995 and 2006 it had little work done on it, but since then it has been improved extensively by Google.

== Provider ==
Development of Tesseract is sponsored by Google. Its chief developer is [http://research.google.com/pubs/author4479.html Ray Smith].

==Licensing and cost==
Tesseract is an Open Source OCR engine, available under the [http://www.apache.org/licenses/LICENSE-2.0 Apache 2.0 license]. It can be used directly, or (for programmers) using an [http://code.google.com/p/tesseract-ocr/source/browse/trunk/api/baseapi.h API].

==History==
It was initially developed at HP during a 10 year period from 1984 to 1994. After a decade of minimal development it was released in 2005 for open source. Google acquired Tesseract in 2006 and currently maintains its development.

==Platform and interoperability==
*The latest downloads for Linux and Windows are found on [https://drive.google.com/folderview?id=0B7l10Bj_LprhQnpSRkpGMGV2eE0&usp=sharing GoogleDrive]. Older versions of Tesseract and its language packs are found on the discontinued [https://code.google.com/p/tesseract-ocr/downloads/list Google Code download page].
*The easiest way to install Tesseract on Mac OSX is with [https://www.macports.org/ MacPorts]. Once it is installed, you can install Tesseract by running the command ''sudo port install tesseract'', and any language with ''sudo port install tesseract-<langcode>''. A list of available langcodes can be found on the [https://www.macports.org/ports.php?by=name&substr=tesseract- MacPorts Tesseract page].
*Dependencies for running Tesseract include Autotools and [http://www.leptonica.org/ Leptonica] . The Windows version requires installation of [http://msdn.microsoft.com/en-us/vstudio/aa718325.aspx Visual Studio]. More information about required Ubantu libraries and links to specific requirements are on the [https://code.google.com/p/tesseract-ocr/wiki/Compiling Tesseract Wiki].
*Other programs such as Scan Tailor, unpaper, ImageJ, Gimp or ImageMagick may be needed to properly prepare images for use in Tesseract.

==Functional notes==
===Input supported===
Any image readable by Leptonica is supported in Tesseract including BMP, PNM, PNG, JFIF, JPEG, and TIFF.
GIF is not supported [http://www.leptonica.com/library-overview.html http://www.leptonica.com/library-overview.html].
===Output generated===
Tesseract outputs to TXT. PDF output was added in version 3.03.

==Documentation and support==
*Smith, Ray (2007). An Overview of the Tesseract OCR Engine [http://static.googleusercontent.com/media/research.google.com/en/us/pubs/archive/33418.pdf http://static.googleusercontent.com/media/research.google.com/en/us/pubs/archive/33418.pdf ]
*Installation information is found on the [https://code.google.com/p/tesseract-ocr/wiki/ReadMe ReadMe] page of the project site.
*Support is offered and issues are addressed on the [https://code.google.com/p/tesseract-ocr/issues/list Issues] page of the project site.

==Usability==
*Tesseract was primarily developed for English OCR capability, but 47 language packs have been developed for use with other languages [https://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseract.1.html https://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseract.1.html]. Tesseract 2.0x and 3.0x are [https://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3 trainable] for other languages.
*There is no built-in GUI, but there are several available from the [https://code.google.com/p/tesseract-ocr/wiki/3rdParty 3rdParty] page.

= User Experiences =
==Experiences with Tesseract==
*Lazorchak, Butch. (2014). Making Scanned Content Accessible Using Full-text Search and OCR [http://blogs.loc.gov/digitalpreservation/2014/08/making-scanned-content-accessible-using-full-text-search-and-ocr/ http://blogs.loc.gov/digitalpreservation/2014/08/making-scanned-content-accessible-using-full-text-search-and-ocr/]
*Texas A&M University. (2012-). Early Modern OCR Project Workflow [http://emop.tamu.edu/about http://emop.tamu.edu/about http://emop.tamu.edu/about http://emop.tamu.edu/about]
*Adams, Chris. (2014). Content Search on a Budget-using Tesseract on large TIFF files [http://chris.improbable.org/2014/3/17/content-search-on-a-budget/ http://chris.improbable.org/2014/3/17/content-search-on-a-budget/]
*PSNC Digital Libraries Team. (2011). Tesseract 3.0 installation on Ubuntu 10.10 server [http://dl.psnc.pl/2011/01/24/tesseract-3-0-installation-on-ubuntu-10-10-server/ http://dl.psnc.pl/2011/01/24/tesseract-3-0-installation-on-ubuntu-10-10-server/]
*Lacy, David. (2014). Digital Library upgrade provides enhanced discovery [http://blog.library.villanova.edu/digitallibrary/2014/02/18/digital-library-upgrade-provides-enhanced-discovery/#sthash.OwCtLlEc.dpuf http://blog.library.villanova.edu/digitallibrary/2014/02/18/digital-library-upgrade-provides-enhanced-discovery/#sthash.OwCtLlEc.dpuf]
*Applied in an AQuA Mashup that resulted in the Solution page: [http://wiki.opf-labs.org/display/AQuA/Compare+OCR+results+of+the+same+source+material+in+different+formats+%28TIFF%2C+JP2%29 Compare OCR results of the same source material in different formats (TIFF, JP2)]
==Experiences with software integrating Tesseract==
*Integration with the free [http://www.dcc.ac.uk/resources/external/xena-software-0 Xena-Digital Preservation Software][http://sourceforge.net/projects/xena/?source=navbar http://sourceforge.net/projects/xena/?source=navbar]
*Integration with Free Online OCR [http://www.free-ocr.com/faq.html http://www.free-ocr.com/faq.html]

= Activity Feeds =
==Google Code Source Feed==
Below the last 5 source updates:
<rss max=5>https://code.google.com/feeds/p/tesseract-ocr/gitchanges/basic</rss>
==Google Code Wiki Feed==
Below are the last 3 wiki updates:
<rss max=3>https://code.google.com/feeds/p/tesseract-ocr/gitchanges/basic?repo=wiki</rss>
==Google Code Issue Feed==
Below are the last 3 issue updates:
<rss max=3>https://code.google.com/feeds/p/tesseract-ocr/issueupdates/basic</rss>

{{Infobox_tool_details
|ohloh_id=tesseract-ocr
}}

Tesseract-ocr

2014-10-04T14:06:25Z

Lreilly: Added activity feeds.

{{Infobox_tool
|purpose=Open source OCR engine, accepting uncompressed TIFF files as input
|image=Tesseract.png
|homepage=http://code.google.com/p/tesseract-ocr/
|license=Apache 2.0 License EXCEPT the tesseractTrainer.py, which is licensed with GPL
|platforms=LInux, Windows, MacOSX
}}


[[Category:OCR]]
[[Category:Quality Assurance]]

= Description =
Tesseract is probably the most accurate open source OCR engine available. Combined with the [http://leptonica.com/ Leptonica Image Processing Library] it can read a wide variety of image formats and convert them to text in over 60 languages. It was one of the top 3 engines in the 1995 UNLV Accuracy test. Between 1995 and 2006 it had little work done on it, but since then it has been improved extensively by Google.

== Provider ==
Development of Tesseract is sponsored by Google. Its chief developer is [http://research.google.com/pubs/author4479.html Ray Smith].

==Licensing and cost==
Tesseract is an Open Source OCR engine, available under the [http://www.apache.org/licenses/LICENSE-2.0 Apache 2.0 license]. It can be used directly, or (for programmers) using an [http://code.google.com/p/tesseract-ocr/source/browse/trunk/api/baseapi.h API].

==History==
It was initially developed at HP during a 10 year period from 1984 to 1994. After a decade of minimal development it was released in 2005 for open source. Google acquired Tesseract in 2006 and currently maintains its development.

==Platform and interoperability==
*The latest downloads for Linux and Windows are found on [https://drive.google.com/folderview?id=0B7l10Bj_LprhQnpSRkpGMGV2eE0&usp=sharing GoogleDrive]. Older versions of Tesseract and its language packs are found on the discontinued [https://code.google.com/p/tesseract-ocr/downloads/list Google Code download page].
*The easiest way to install Tesseract on Mac OSX is with [https://www.macports.org/ MacPorts]. Once it is installed, you can install Tesseract by running the command ''sudo port install tesseract'', and any language with ''sudo port install tesseract-<langcode>''. List of available langcodes can be found onMacPorts tesseract page. A list of available langcodes can be found on the [https://www.macports.org/ports.php?by=name&substr=tesseract- MacPorts Tesseract page].
*Dependencies for running Tesseract on the Linux system include Autotools and [http://www.leptonica.org/ Leptonica] . The Windows version requires installation of [http://msdn.microsoft.com/en-us/vstudio/aa718325.aspx Visual Studio]. More information about required Ubantu libraries and links to specific requirements are on the [https://code.google.com/p/tesseract-ocr/wiki/Compiling Tesseract Wiki].
*Other programs such as Scan Tailor, unpaper, ImageJ, Gimp or ImageMagick may be needed to properly prepare images for use in Tesseract.

==Functional notes==
===Input supported===
Any image readable by Leptonica is supported in Tesseract including BMP, PNM, PNG, JFIF, JPEG, TIFF.
GIF is not supported [http://www.leptonica.com/library-overview.html http://www.leptonica.com/library-overview.html].
===Output generated===
Tesseract outputs to TXT. PDF output was added in version 3.03.

==Documentation and support==
*Smith, Ray (2007). An Overview of the Tesseract OCR Engine [http://static.googleusercontent.com/media/research.google.com/en/us/pubs/archive/33418.pdf http://static.googleusercontent.com/media/research.google.com/en/us/pubs/archive/33418.pdf ]
*Installation information is found on the [https://code.google.com/p/tesseract-ocr/wiki/ReadMe ReadMe] page of the project site.
*Support is offered and issues are addressed on the [https://code.google.com/p/tesseract-ocr/issues/list Issues] page of project site.

==Usability==
*Tesseract was primarily developed for English OCR capability, but 47 language packs have been developed for use with other languages [https://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseract.1.html https://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseract.1.html]. Tesseract 2.0x and 3.0x are [https://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3 trainable] for other languages.
*There is no built-in GUI, but there are several available from the [https://code.google.com/p/tesseract-ocr/wiki/3rdParty 3rdParty] page.

= User Experiences =
==Experiences with Tesseract==
*Lazorchak, Butch. (2014). Making Scanned Content Accessible Using Full-text Search and OCR [http://blogs.loc.gov/digitalpreservation/2014/08/making-scanned-content-accessible-using-full-text-search-and-ocr/ http://blogs.loc.gov/digitalpreservation/2014/08/making-scanned-content-accessible-using-full-text-search-and-ocr/]
*Texas A&M University. (2012-). Early Modern OCR Project Workflow [http://emop.tamu.edu/about http://emop.tamu.edu/about]
*Adams, Chris. (2014). Content Search on a Budget-using Tesseract on large TIFF files[http://chris.improbable.org/2014/3/17/content-search-on-a-budget/ http://chris.improbable.org/2014/3/17/content-search-on-a-budget/]
*PSNC Digital Libraries Team. (2011). Tesseract 3.0 installation on Ubuntu 10.10 server [http://dl.psnc.pl/2011/01/24/tesseract-3-0-installation-on-ubuntu-10-10-server/ http://dl.psnc.pl/2011/01/24/tesseract-3-0-installation-on-ubuntu-10-10-server/]
*Lacy, David. (2014). Digital Library upgrade provides enhanced discovery [http://blog.library.villanova.edu/digitallibrary/2014/02/18/digital-library-upgrade-provides-enhanced-discovery/#sthash.OwCtLlEc.dpuf http://blog.library.villanova.edu/digitallibrary/2014/02/18/digital-library-upgrade-provides-enhanced-discovery/#sthash.OwCtLlEc.dpuf]
*Applied in an AQuA Mashup that resulted in the Solution page: [http://wiki.opf-labs.org/display/AQuA/Compare+OCR+results+of+the+same+source+material+in+different+formats+%28TIFF%2C+JP2%29 Compare OCR results of the same source material in different formats (TIFF, JP2)]
==Experiences with software integrating Tesseract==
*Integration with the free [http://www.dcc.ac.uk/resources/external/xena-software-0 Xena-Digital Preservation Software][http://sourceforge.net/projects/xena/?source=navbar http://sourceforge.net/projects/xena/?source=navbar]
*Integration with Free Online OCR [http://www.free-ocr.com/faq.html http://www.free-ocr.com/faq.html]

= Activity Feeds =
==Google Code Source Feed==
Below the last 5 source updates:
<rss max=5>https://code.google.com/feeds/p/tesseract-ocr/gitchanges/basic</rss>
==Google Code Wiki Feed==
Below are the last 3 wiki updates:
<rss max=3>https://code.google.com/feeds/p/tesseract-ocr/gitchanges/basic?repo=wiki</rss>
==Google Code Issue Feed==
Below are the last 3 issue updates:
<rss max=3>https://code.google.com/feeds/p/tesseract-ocr/issueupdates/basic</rss>

{{Infobox_tool_details
|ohloh_id=tesseract-ocr
}}

Tesseract-ocr

2014-10-04T13:45:17Z

Lreilly: /* Experiences with software integrated with Tesseract */

{{Infobox_tool
|purpose=Open source OCR engine, accepting uncompressed TIFF files as input
|image=Tesseract.png
|homepage=http://code.google.com/p/tesseract-ocr/
|license=Apache 2.0 License EXCEPT the tesseractTrainer.py, which is licensed with GPL
|platforms=LInux, Windows, MacOSX
}}


[[Category:OCR]]
[[Category:Quality Assurance]]

= Description =
Tesseract is probably the most accurate open source OCR engine available. Combined with the [http://leptonica.com/ Leptonica Image Processing Library] it can read a wide variety of image formats and convert them to text in over 60 languages. It was one of the top 3 engines in the 1995 UNLV Accuracy test. Between 1995 and 2006 it had little work done on it, but since then it has been improved extensively by Google.

== Provider ==
Development of Tesseract is sponsored by Google. Its chief developer is [http://research.google.com/pubs/author4479.html Ray Smith].

==Licensing and cost==
Tesseract is an Open Source OCR engine, available under the [http://www.apache.org/licenses/LICENSE-2.0 Apache 2.0 license]. It can be used directly, or (for programmers) using an [http://code.google.com/p/tesseract-ocr/source/browse/trunk/api/baseapi.h API].

==History==
It was initially developed at HP during a 10 year period from 1984 to 1994. After a decade of minimal development it was released in 2005 for open source. Google acquired Tesseract in 2006 and currently maintains its development.

==Platform and interoperability==
*The latest downloads for Linux and Windows are found on [https://drive.google.com/folderview?id=0B7l10Bj_LprhQnpSRkpGMGV2eE0&usp=sharing GoogleDrive]. Older versions of Tesseract and its language packs are found on the discontinued [https://code.google.com/p/tesseract-ocr/downloads/list Google Code download page].
*The easiest way to install Tesseract on Mac OSX is with [https://www.macports.org/ MacPorts]. Once it is installed, you can install Tesseract by running the command ''sudo port install tesseract'', and any language with ''sudo port install tesseract-<langcode>''. List of available langcodes can be found onMacPorts tesseract page. A list of available langcodes can be found on the [https://www.macports.org/ports.php?by=name&substr=tesseract- MacPorts Tesseract page].
*Dependencies for running Tesseract on the Linux system include Autotools and [http://www.leptonica.org/ Leptonica] . The Windows version requires installation of [http://msdn.microsoft.com/en-us/vstudio/aa718325.aspx Visual Studio]. More information about required Ubantu libraries and links to specific requirements are on the [https://code.google.com/p/tesseract-ocr/wiki/Compiling Tesseract Wiki].
*Other programs such as Scan Tailor, unpaper, ImageJ, Gimp or ImageMagick may be needed to properly prepare images for use in Tesseract.

==Functional notes==
===Input supported===
Any image readable by Leptonica is supported in Tesseract including BMP, PNM, PNG, JFIF, JPEG, TIFF.
GIF is not supported [http://www.leptonica.com/library-overview.html http://www.leptonica.com/library-overview.html].
===Output generated===
Tesseract outputs to TXT. PDF output was added in version 3.03.

==Documentation and support==
*Smith, Ray (2007). An Overview of the Tesseract OCR Engine [http://static.googleusercontent.com/media/research.google.com/en/us/pubs/archive/33418.pdf http://static.googleusercontent.com/media/research.google.com/en/us/pubs/archive/33418.pdf ]
*Installation information is found on the [https://code.google.com/p/tesseract-ocr/wiki/ReadMe ReadMe] page of the project site.
*Support is offered and issues are addressed on the [https://code.google.com/p/tesseract-ocr/issues/list Issues] page of project site.

==Usability==
*Tesseract was primarily developed for English OCR capability, but 47 language packs have been developed for use with other languages [https://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseract.1.html https://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseract.1.html]. Tesseract 2.0x and 3.0x are [https://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3 trainable] for other languages.
*There is no built-in GUI, but there are several available from the [https://code.google.com/p/tesseract-ocr/wiki/3rdParty 3rdParty] page.

= User Experiences =
==Experiences with Tesseract==
*Lazorchak, Butch. (2014). Making Scanned Content Accessible Using Full-text Search and OCR [http://blogs.loc.gov/digitalpreservation/2014/08/making-scanned-content-accessible-using-full-text-search-and-ocr/ http://blogs.loc.gov/digitalpreservation/2014/08/making-scanned-content-accessible-using-full-text-search-and-ocr/]
*Texas A&M University. (2012-). Early Modern OCR Project Workflow [http://emop.tamu.edu/about http://emop.tamu.edu/about]
*Adams, Chris. (2014). Content Search on a Budget-using Tesseract on large TIFF files[http://chris.improbable.org/2014/3/17/content-search-on-a-budget/ http://chris.improbable.org/2014/3/17/content-search-on-a-budget/]
*PSNC Digital Libraries Team. (2011). Tesseract 3.0 installation on Ubuntu 10.10 server [http://dl.psnc.pl/2011/01/24/tesseract-3-0-installation-on-ubuntu-10-10-server/ http://dl.psnc.pl/2011/01/24/tesseract-3-0-installation-on-ubuntu-10-10-server/]
*Lacy, David. (2014). Digital Library upgrade provides enhanced discovery [http://blog.library.villanova.edu/digitallibrary/2014/02/18/digital-library-upgrade-provides-enhanced-discovery/#sthash.OwCtLlEc.dpuf http://blog.library.villanova.edu/digitallibrary/2014/02/18/digital-library-upgrade-provides-enhanced-discovery/#sthash.OwCtLlEc.dpuf]
*Applied in an AQuA Mashup that resulted in the Solution page: [http://wiki.opf-labs.org/display/AQuA/Compare+OCR+results+of+the+same+source+material+in+different+formats+%28TIFF%2C+JP2%29 Compare OCR results of the same source material in different formats (TIFF, JP2)]
==Experiences with software integrating Tesseract==
*Integration with the free [http://www.dcc.ac.uk/resources/external/xena-software-0 Xena-Digital Preservation Software][http://sourceforge.net/projects/xena/?source=navbar http://sourceforge.net/projects/xena/?source=navbar]
*Integration with Free Online OCR [http://www.free-ocr.com/faq.html http://www.free-ocr.com/faq.html]

= Development Activity =

{{Infobox_tool_details
|ohloh_id=tesseract-ocr
}}

Tesseract-ocr

2014-10-04T13:39:25Z

Lreilly: Major update of user experiences.

{{Infobox_tool
|purpose=Open source OCR engine, accepting uncompressed TIFF files as input
|image=Tesseract.png
|homepage=http://code.google.com/p/tesseract-ocr/
|license=Apache 2.0 License EXCEPT the tesseractTrainer.py, which is licensed with GPL
|platforms=LInux, Windows, MacOSX
}}


[[Category:OCR]]
[[Category:Quality Assurance]]

= Description =
Tesseract is probably the most accurate open source OCR engine available. Combined with the [http://leptonica.com/ Leptonica Image Processing Library] it can read a wide variety of image formats and convert them to text in over 60 languages. It was one of the top 3 engines in the 1995 UNLV Accuracy test. Between 1995 and 2006 it had little work done on it, but since then it has been improved extensively by Google.

== Provider ==
Development of Tesseract is sponsored by Google. Its chief developer is [http://research.google.com/pubs/author4479.html Ray Smith].

==Licensing and cost==
Tesseract is an Open Source OCR engine, available under the [http://www.apache.org/licenses/LICENSE-2.0 Apache 2.0 license]. It can be used directly, or (for programmers) using an [http://code.google.com/p/tesseract-ocr/source/browse/trunk/api/baseapi.h API].

==History==
It was initially developed at HP during a 10 year period from 1984 to 1994. After a decade of minimal development it was released in 2005 for open source. Google acquired Tesseract in 2006 and currently maintains its development.

==Platform and interoperability==
*The latest downloads for Linux and Windows are found on [https://drive.google.com/folderview?id=0B7l10Bj_LprhQnpSRkpGMGV2eE0&usp=sharing GoogleDrive]. Older versions of Tesseract and its language packs are found on the discontinued [https://code.google.com/p/tesseract-ocr/downloads/list Google Code download page].
*The easiest way to install Tesseract on Mac OSX is with [https://www.macports.org/ MacPorts]. Once it is installed, you can install Tesseract by running the command ''sudo port install tesseract'', and any language with ''sudo port install tesseract-<langcode>''. List of available langcodes can be found onMacPorts tesseract page. A list of available langcodes can be found on the [https://www.macports.org/ports.php?by=name&substr=tesseract- MacPorts Tesseract page].
*Dependencies for running Tesseract on the Linux system include Autotools and [http://www.leptonica.org/ Leptonica] . The Windows version requires installation of [http://msdn.microsoft.com/en-us/vstudio/aa718325.aspx Visual Studio]. More information about required Ubantu libraries and links to specific requirements are on the [https://code.google.com/p/tesseract-ocr/wiki/Compiling Tesseract Wiki].
*Other programs such as Scan Tailor, unpaper, ImageJ, Gimp or ImageMagick may be needed to properly prepare images for use in Tesseract.

==Functional notes==
===Input supported===
Any image readable by Leptonica is supported in Tesseract including BMP, PNM, PNG, JFIF, JPEG, TIFF.
GIF is not supported [http://www.leptonica.com/library-overview.html http://www.leptonica.com/library-overview.html].
===Output generated===
Tesseract outputs to TXT. PDF output was added in version 3.03.

==Documentation and support==
*Smith, Ray (2007). An Overview of the Tesseract OCR Engine [http://static.googleusercontent.com/media/research.google.com/en/us/pubs/archive/33418.pdf http://static.googleusercontent.com/media/research.google.com/en/us/pubs/archive/33418.pdf ]
*Installation information is found on the [https://code.google.com/p/tesseract-ocr/wiki/ReadMe ReadMe] page of the project site.
*Support is offered and issues are addressed on the [https://code.google.com/p/tesseract-ocr/issues/list Issues] page of project site.

==Usability==
*Tesseract was primarily developed for English OCR capability, but 47 language packs have been developed for use with other languages [https://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseract.1.html https://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseract.1.html]. Tesseract 2.0x and 3.0x are [https://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3 trainable] for other languages.
*There is no built-in GUI, but there are several available from the [https://code.google.com/p/tesseract-ocr/wiki/3rdParty 3rdParty] page.

= User Experiences =
==Experiences with Tesseract==
*Lazorchak, Butch. (2014). Making Scanned Content Accessible Using Full-text Search and OCR [http://blogs.loc.gov/digitalpreservation/2014/08/making-scanned-content-accessible-using-full-text-search-and-ocr/ http://blogs.loc.gov/digitalpreservation/2014/08/making-scanned-content-accessible-using-full-text-search-and-ocr/]
*Texas A&M University. (2012-). Early Modern OCR Project Workflow [http://emop.tamu.edu/about http://emop.tamu.edu/about]
*Adams, Chris. (2014). Content Search on a Budget-using Tesseract on large TIFF files[http://chris.improbable.org/2014/3/17/content-search-on-a-budget/ http://chris.improbable.org/2014/3/17/content-search-on-a-budget/]
*PSNC Digital Libraries Team. (2011). Tesseract 3.0 installation on Ubuntu 10.10 server [http://dl.psnc.pl/2011/01/24/tesseract-3-0-installation-on-ubuntu-10-10-server/ http://dl.psnc.pl/2011/01/24/tesseract-3-0-installation-on-ubuntu-10-10-server/]
*Lacy, David. (2014). Digital Library upgrade provides enhanced discovery [http://blog.library.villanova.edu/digitallibrary/2014/02/18/digital-library-upgrade-provides-enhanced-discovery/#sthash.OwCtLlEc.dpuf http://blog.library.villanova.edu/digitallibrary/2014/02/18/digital-library-upgrade-provides-enhanced-discovery/#sthash.OwCtLlEc.dpuf]
*Applied in an AQuA Mashup that resulted in the Solution page: [http://wiki.opf-labs.org/display/AQuA/Compare+OCR+results+of+the+same+source+material+in+different+formats+%28TIFF%2C+JP2%29 Compare OCR results of the same source material in different formats (TIFF, JP2)]
==Experiences with software integrated with Tesseract==
*Integration with the free [http://www.dcc.ac.uk/resources/external/xena-software-0 Xena-Digital Preservation Software][http://sourceforge.net/projects/xena/?source=navbar http://sourceforge.net/projects/xena/?source=navbar]
*Integration with Free Online OCR [http://www.free-ocr.com/faq.html http://www.free-ocr.com/faq.html]

= Development Activity =

{{Infobox_tool_details
|ohloh_id=tesseract-ocr
}}

Tesseract-ocr

2014-10-04T12:47:08Z

Lreilly: /* Platform and interoperability */

Tesseract-ocr

2014-10-04T12:45:49Z

Lreilly: /* History */

Tesseract-ocr

2014-10-04T12:35:51Z

Lreilly: Major update to description.

Tesseract-ocr

2014-10-04T11:57:05Z

Lreilly: updated infobox

Tesseract-ocr

2014-10-03T19:25:26Z

Lreilly:

{{Infobox_tool
|purpose=Open source OCR engine, accepting uncompressed TIFF files as input
|image=
|homepage=http://code.google.com/p/tesseract-ocr/
|license=Apache 2.0 License EXCEPT the tesseractTrainer.py, which is licensed with GPL
|platforms=
}}


[[Category:OCR]]
[[Category:Quality Assurance]]

= Description =
Tesseract is probably the most accurate open source OCR engine available. Combined with the [http://leptonica.com/ Leptonica Image Processing Library] it can read a wide variety of image formats and convert them to text in over 60 languages. It was one of the top 3 engines in the 1995 UNLV Accuracy test. Between 1995 and 2006 it had little work done on it, but since then it has been improved extensively by Google.

= User Experiences =
Applied in an AQuA Mashup that resulted in the Solution page: [http://wiki.opf-labs.org/display/AQuA/Compare+OCR+results+of+the+same+source+material+in+different+formats+%28TIFF%2C+JP2%29 Compare OCR results of the same source material in different formats (TIFF, JP2)]

= Development Activity =

{{Infobox_tool_details
|ohloh_id=tesseract-ocr
}}

Tesseract-ocr

2014-10-03T19:24:43Z

Lreilly:

{{Infobox_tool
|purpose=Open source OCR engine, accepting uncompressed TIFF files as input
|image=[[[File:Tesseract.png]]]
|homepage=http://code.google.com/p/tesseract-ocr/
|license=Apache 2.0 License EXCEPT the tesseractTrainer.py, which is licensed with GPL
|platforms=
}}


[[Category:OCR]]
[[Category:Quality Assurance]]

= Description =
Tesseract is probably the most accurate open source OCR engine available. Combined with the [http://leptonica.com/ Leptonica Image Processing Library] it can read a wide variety of image formats and convert them to text in over 60 languages. It was one of the top 3 engines in the 1995 UNLV Accuracy test. Between 1995 and 2006 it had little work done on it, but since then it has been improved extensively by Google.

= User Experiences =
Applied in an AQuA Mashup that resulted in the Solution page: [http://wiki.opf-labs.org/display/AQuA/Compare+OCR+results+of+the+same+source+material+in+different+formats+%28TIFF%2C+JP2%29 Compare OCR results of the same source material in different formats (TIFF, JP2)]

= Development Activity =

{{Infobox_tool_details
|ohloh_id=tesseract-ocr
}}

File:Tesseract.png

2014-10-03T19:17:46Z

Lreilly: Lreilly uploaded a new version of "File:Tesseract.png"

Title image including a tesserarct image found in the creative commons at http://commons.wikimedia.org/wiki/File:Tesseract_frame.png.

File:Tesseract.png

2014-10-03T19:05:40Z

Lreilly: Title image including a tesserarct image found in the creative commons at http://commons.wikimedia.org/wiki/File:Tesseract_frame.png.

Title image including a tesserarct image found in the creative commons at http://commons.wikimedia.org/wiki/File:Tesseract_frame.png.

Tesseract-ocr

2014-10-03T18:47:39Z

Lreilly: /* Description */