<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en-GB">
	<id>https://coptr.digipres.org/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Lreilly</id>
	<title>COPTR - User contributions [en-gb]</title>
	<link rel="self" type="application/atom+xml" href="https://coptr.digipres.org/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Lreilly"/>
	<link rel="alternate" type="text/html" href="https://coptr.digipres.org/Special:Contributions/Lreilly"/>
	<updated>2026-04-06T17:34:59Z</updated>
	<subtitle>User contributions</subtitle>
	<generator>MediaWiki 1.35.14</generator>
	<entry>
		<id>https://coptr.digipres.org/index.php?title=Tesseract-ocr&amp;diff=1998</id>
		<title>Tesseract-ocr</title>
		<link rel="alternate" type="text/html" href="https://coptr.digipres.org/index.php?title=Tesseract-ocr&amp;diff=1998"/>
		<updated>2014-10-04T14:30:00Z</updated>

		<summary type="html">&lt;p&gt;Lreilly: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Infobox_tool&lt;br /&gt;
|purpose=Open source OCR engine, accepting uncompressed TIFF files as input&lt;br /&gt;
|image=Tesseract.png&lt;br /&gt;
|homepage=http://code.google.com/p/tesseract-ocr/&lt;br /&gt;
|license=Apache 2.0 License EXCEPT the tesseractTrainer.py, which is licensed with GPL&lt;br /&gt;
|platforms=Linux, Windows, MacOSX&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- Delete the Categories that do not apply --&amp;gt;&lt;br /&gt;
[[Category:OCR]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Description =&lt;br /&gt;
Tesseract is probably the most accurate open source OCR engine available. Combined with the [http://leptonica.com/ Leptonica Image Processing Library] it can read a wide variety of image formats and convert them to text in over 60 languages. It was one of the top 3 engines in the 1995 UNLV Accuracy test. Between 1995 and 2006 it had little work done on it, but since then it has been improved extensively by Google.&lt;br /&gt;
&lt;br /&gt;
== Provider ==&lt;br /&gt;
Development of Tesseract is sponsored by Google. Its chief developer is [http://research.google.com/pubs/author4479.html Ray Smith].&lt;br /&gt;
&lt;br /&gt;
==Licensing and cost==&lt;br /&gt;
Tesseract is an Open Source OCR engine, available under the [http://www.apache.org/licenses/LICENSE-2.0 Apache 2.0 license]. It can be used directly, or (for programmers) using an [http://code.google.com/p/tesseract-ocr/source/browse/trunk/api/baseapi.h API].&lt;br /&gt;
&lt;br /&gt;
==History==&lt;br /&gt;
It was initially developed at HP during a 10 year period from 1984 to 1994. After a decade of minimal development it was released in 2005 for open source. Google acquired Tesseract in 2006 and currently maintains its development.&lt;br /&gt;
&lt;br /&gt;
==Platform and interoperability==&lt;br /&gt;
*The latest downloads for Linux and Windows are found on [https://drive.google.com/folderview?id=0B7l10Bj_LprhQnpSRkpGMGV2eE0&amp;amp;usp=sharing GoogleDrive]. Older versions of Tesseract and its language packs are found on the discontinued [https://code.google.com/p/tesseract-ocr/downloads/list Google Code download page].&lt;br /&gt;
*The easiest way to install Tesseract on Mac OSX is with [https://www.macports.org/ MacPorts]. Once it is installed, you can install Tesseract by running the command ''sudo port install tesseract'', and any language with ''sudo port install tesseract-&amp;lt;langcode&amp;gt;''. A list of available langcodes can be found on the [https://www.macports.org/ports.php?by=name&amp;amp;substr=tesseract- MacPorts Tesseract page].&lt;br /&gt;
*Dependencies for running Tesseract include Autotools and [http://www.leptonica.org/ Leptonica] . The Windows version requires installation of [http://msdn.microsoft.com/en-us/vstudio/aa718325.aspx Visual Studio]. More information about required Ubantu libraries and links to specific requirements are on the [https://code.google.com/p/tesseract-ocr/wiki/Compiling Tesseract Wiki].&lt;br /&gt;
*Other programs such as Scan Tailor, unpaper, ImageJ, Gimp or ImageMagick may be needed to properly prepare images for use in Tesseract.&lt;br /&gt;
&lt;br /&gt;
==Functional notes==&lt;br /&gt;
===Input supported===&lt;br /&gt;
Any image readable by Leptonica is supported in Tesseract including BMP, PNM, PNG, JFIF, JPEG, and TIFF. &lt;br /&gt;
GIF is not supported [http://www.leptonica.com/library-overview.html http://www.leptonica.com/library-overview.html].  &lt;br /&gt;
===Output generated===&lt;br /&gt;
Tesseract outputs to TXT. PDF output was added in version 3.03.&lt;br /&gt;
&lt;br /&gt;
==Documentation and support==&lt;br /&gt;
*Smith, Ray (2007). An Overview of the Tesseract OCR Engine [http://static.googleusercontent.com/media/research.google.com/en/us/pubs/archive/33418.pdf http://static.googleusercontent.com/media/research.google.com/en/us/pubs/archive/33418.pdf ]&lt;br /&gt;
*Installation information is found on the [https://code.google.com/p/tesseract-ocr/wiki/ReadMe ReadMe] page of the project site.&lt;br /&gt;
*Support is offered and issues are addressed on the [https://code.google.com/p/tesseract-ocr/issues/list Issues] page of the project site.&lt;br /&gt;
&lt;br /&gt;
==Included in==&lt;br /&gt;
*Integration with the free [http://www.dcc.ac.uk/resources/external/xena-software-0 Xena-Digital Preservation Software][http://sourceforge.net/projects/xena/?source=navbar http://sourceforge.net/projects/xena/?source=navbar]&lt;br /&gt;
*Integration with Free Online OCR [http://www.free-ocr.com/faq.html http://www.free-ocr.com/faq.html]&lt;br /&gt;
&lt;br /&gt;
==Usability==&lt;br /&gt;
*Tesseract was primarily developed for English OCR capability, but 47 language packs have been developed for use with other languages [https://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseract.1.html https://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseract.1.html]. Tesseract 2.0x and 3.0x are [https://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3 trainable] for other languages.&lt;br /&gt;
*There is no built-in GUI, but there are several available from the [https://code.google.com/p/tesseract-ocr/wiki/3rdParty 3rdParty] page.&lt;br /&gt;
&lt;br /&gt;
= User Experiences =&lt;br /&gt;
*Lazorchak, Butch. (2014). Making Scanned Content Accessible Using Full-text Search and OCR [http://blogs.loc.gov/digitalpreservation/2014/08/making-scanned-content-accessible-using-full-text-search-and-ocr/ http://blogs.loc.gov/digitalpreservation/2014/08/making-scanned-content-accessible-using-full-text-search-and-ocr/]&lt;br /&gt;
*Texas A&amp;amp;M University. (2012-). Early Modern OCR Project Workflow [http://emop.tamu.edu/about http://emop.tamu.edu/about http://emop.tamu.edu/about http://emop.tamu.edu/about]&lt;br /&gt;
*Adams, Chris. (2014). Content Search on a Budget-using Tesseract on large TIFF files [http://chris.improbable.org/2014/3/17/content-search-on-a-budget/ http://chris.improbable.org/2014/3/17/content-search-on-a-budget/]&lt;br /&gt;
*PSNC Digital Libraries Team. (2011). Tesseract 3.0 installation on Ubuntu 10.10 server [http://dl.psnc.pl/2011/01/24/tesseract-3-0-installation-on-ubuntu-10-10-server/ http://dl.psnc.pl/2011/01/24/tesseract-3-0-installation-on-ubuntu-10-10-server/]&lt;br /&gt;
*Lacy, David. (2014). Digital Library upgrade provides enhanced discovery [http://blog.library.villanova.edu/digitallibrary/2014/02/18/digital-library-upgrade-provides-enhanced-discovery/#sthash.OwCtLlEc.dpuf http://blog.library.villanova.edu/digitallibrary/2014/02/18/digital-library-upgrade-provides-enhanced-discovery/#sthash.OwCtLlEc.dpuf]&lt;br /&gt;
*Applied in an AQuA Mashup that resulted in the Solution page: [http://wiki.opf-labs.org/display/AQuA/Compare+OCR+results+of+the+same+source+material+in+different+formats+%28TIFF%2C+JP2%29 Compare OCR results of the same source material in different formats (TIFF, JP2)]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Activity Feeds =&lt;br /&gt;
==Google Code Source Feed==&lt;br /&gt;
Below the last 5 source updates:&lt;br /&gt;
&amp;lt;rss max=5&amp;gt;https://code.google.com/feeds/p/tesseract-ocr/gitchanges/basic&amp;lt;/rss&amp;gt;&lt;br /&gt;
==Google Code Wiki Feed==&lt;br /&gt;
Below are the last 3 wiki updates:&lt;br /&gt;
&amp;lt;rss max=3&amp;gt;https://code.google.com/feeds/p/tesseract-ocr/gitchanges/basic?repo=wiki&amp;lt;/rss&amp;gt;&lt;br /&gt;
==Google Code Issue Feed==&lt;br /&gt;
Below are the last 3 issue updates:&lt;br /&gt;
&amp;lt;rss max=3&amp;gt;https://code.google.com/feeds/p/tesseract-ocr/issueupdates/basic&amp;lt;/rss&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Infobox_tool_details&lt;br /&gt;
|ohloh_id=tesseract-ocr&lt;br /&gt;
}}&lt;/div&gt;</summary>
		<author><name>Lreilly</name></author>
	</entry>
	<entry>
		<id>https://coptr.digipres.org/index.php?title=Tesseract-ocr&amp;diff=1997</id>
		<title>Tesseract-ocr</title>
		<link rel="alternate" type="text/html" href="https://coptr.digipres.org/index.php?title=Tesseract-ocr&amp;diff=1997"/>
		<updated>2014-10-04T14:28:45Z</updated>

		<summary type="html">&lt;p&gt;Lreilly: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Infobox_tool&lt;br /&gt;
|purpose=Open source OCR engine, accepting uncompressed TIFF files as input&lt;br /&gt;
|image=Tesseract.png&lt;br /&gt;
|homepage=http://code.google.com/p/tesseract-ocr/&lt;br /&gt;
|license=Apache 2.0 License EXCEPT the tesseractTrainer.py, which is licensed with GPL&lt;br /&gt;
|platforms=Linux, Windows, MacOSX&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- Delete the Categories that do not apply --&amp;gt;&lt;br /&gt;
[[Category:OCR]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Description =&lt;br /&gt;
Tesseract is probably the most accurate open source OCR engine available. Combined with the [http://leptonica.com/ Leptonica Image Processing Library] it can read a wide variety of image formats and convert them to text in over 60 languages. It was one of the top 3 engines in the 1995 UNLV Accuracy test. Between 1995 and 2006 it had little work done on it, but since then it has been improved extensively by Google.&lt;br /&gt;
&lt;br /&gt;
== Provider ==&lt;br /&gt;
Development of Tesseract is sponsored by Google. Its chief developer is [http://research.google.com/pubs/author4479.html Ray Smith].&lt;br /&gt;
&lt;br /&gt;
==Licensing and cost==&lt;br /&gt;
Tesseract is an Open Source OCR engine, available under the [http://www.apache.org/licenses/LICENSE-2.0 Apache 2.0 license]. It can be used directly, or (for programmers) using an [http://code.google.com/p/tesseract-ocr/source/browse/trunk/api/baseapi.h API].&lt;br /&gt;
&lt;br /&gt;
==History==&lt;br /&gt;
It was initially developed at HP during a 10 year period from 1984 to 1994. After a decade of minimal development it was released in 2005 for open source. Google acquired Tesseract in 2006 and currently maintains its development.&lt;br /&gt;
&lt;br /&gt;
==Platform and interoperability==&lt;br /&gt;
*The latest downloads for Linux and Windows are found on [https://drive.google.com/folderview?id=0B7l10Bj_LprhQnpSRkpGMGV2eE0&amp;amp;usp=sharing GoogleDrive]. Older versions of Tesseract and its language packs are found on the discontinued [https://code.google.com/p/tesseract-ocr/downloads/list Google Code download page].&lt;br /&gt;
*The easiest way to install Tesseract on Mac OSX is with [https://www.macports.org/ MacPorts]. Once it is installed, you can install Tesseract by running the command ''sudo port install tesseract'', and any language with ''sudo port install tesseract-&amp;lt;langcode&amp;gt;''. A list of available langcodes can be found on the [https://www.macports.org/ports.php?by=name&amp;amp;substr=tesseract- MacPorts Tesseract page].&lt;br /&gt;
*Dependencies for running Tesseract include Autotools and [http://www.leptonica.org/ Leptonica] . The Windows version requires installation of [http://msdn.microsoft.com/en-us/vstudio/aa718325.aspx Visual Studio]. More information about required Ubantu libraries and links to specific requirements are on the [https://code.google.com/p/tesseract-ocr/wiki/Compiling Tesseract Wiki].&lt;br /&gt;
*Other programs such as Scan Tailor, unpaper, ImageJ, Gimp or ImageMagick may be needed to properly prepare images for use in Tesseract.&lt;br /&gt;
&lt;br /&gt;
==Functional notes==&lt;br /&gt;
===Input supported===&lt;br /&gt;
Any image readable by Leptonica is supported in Tesseract including BMP, PNM, PNG, JFIF, JPEG, and TIFF. &lt;br /&gt;
GIF is not supported [http://www.leptonica.com/library-overview.html http://www.leptonica.com/library-overview.html].  &lt;br /&gt;
===Output generated===&lt;br /&gt;
Tesseract outputs to TXT. PDF output was added in version 3.03.&lt;br /&gt;
&lt;br /&gt;
==Documentation and support==&lt;br /&gt;
*Smith, Ray (2007). An Overview of the Tesseract OCR Engine [http://static.googleusercontent.com/media/research.google.com/en/us/pubs/archive/33418.pdf http://static.googleusercontent.com/media/research.google.com/en/us/pubs/archive/33418.pdf ]&lt;br /&gt;
*Installation information is found on the [https://code.google.com/p/tesseract-ocr/wiki/ReadMe ReadMe] page of the project site.&lt;br /&gt;
*Support is offered and issues are addressed on the [https://code.google.com/p/tesseract-ocr/issues/list Issues] page of the project site.&lt;br /&gt;
&lt;br /&gt;
=Included in=&lt;br /&gt;
*Integration with the free [http://www.dcc.ac.uk/resources/external/xena-software-0 Xena-Digital Preservation Software][http://sourceforge.net/projects/xena/?source=navbar http://sourceforge.net/projects/xena/?source=navbar]&lt;br /&gt;
*Integration with Free Online OCR [http://www.free-ocr.com/faq.html http://www.free-ocr.com/faq.html]&lt;br /&gt;
&lt;br /&gt;
==Usability==&lt;br /&gt;
*Tesseract was primarily developed for English OCR capability, but 47 language packs have been developed for use with other languages [https://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseract.1.html https://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseract.1.html]. Tesseract 2.0x and 3.0x are [https://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3 trainable] for other languages.&lt;br /&gt;
*There is no built-in GUI, but there are several available from the [https://code.google.com/p/tesseract-ocr/wiki/3rdParty 3rdParty] page.&lt;br /&gt;
&lt;br /&gt;
= User Experiences =&lt;br /&gt;
*Lazorchak, Butch. (2014). Making Scanned Content Accessible Using Full-text Search and OCR [http://blogs.loc.gov/digitalpreservation/2014/08/making-scanned-content-accessible-using-full-text-search-and-ocr/ http://blogs.loc.gov/digitalpreservation/2014/08/making-scanned-content-accessible-using-full-text-search-and-ocr/]&lt;br /&gt;
*Texas A&amp;amp;M University. (2012-). Early Modern OCR Project Workflow [http://emop.tamu.edu/about http://emop.tamu.edu/about http://emop.tamu.edu/about http://emop.tamu.edu/about]&lt;br /&gt;
*Adams, Chris. (2014). Content Search on a Budget-using Tesseract on large TIFF files [http://chris.improbable.org/2014/3/17/content-search-on-a-budget/ http://chris.improbable.org/2014/3/17/content-search-on-a-budget/]&lt;br /&gt;
*PSNC Digital Libraries Team. (2011). Tesseract 3.0 installation on Ubuntu 10.10 server [http://dl.psnc.pl/2011/01/24/tesseract-3-0-installation-on-ubuntu-10-10-server/ http://dl.psnc.pl/2011/01/24/tesseract-3-0-installation-on-ubuntu-10-10-server/]&lt;br /&gt;
*Lacy, David. (2014). Digital Library upgrade provides enhanced discovery [http://blog.library.villanova.edu/digitallibrary/2014/02/18/digital-library-upgrade-provides-enhanced-discovery/#sthash.OwCtLlEc.dpuf http://blog.library.villanova.edu/digitallibrary/2014/02/18/digital-library-upgrade-provides-enhanced-discovery/#sthash.OwCtLlEc.dpuf]&lt;br /&gt;
*Applied in an AQuA Mashup that resulted in the Solution page: [http://wiki.opf-labs.org/display/AQuA/Compare+OCR+results+of+the+same+source+material+in+different+formats+%28TIFF%2C+JP2%29 Compare OCR results of the same source material in different formats (TIFF, JP2)]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Activity Feeds =&lt;br /&gt;
==Google Code Source Feed==&lt;br /&gt;
Below the last 5 source updates:&lt;br /&gt;
&amp;lt;rss max=5&amp;gt;https://code.google.com/feeds/p/tesseract-ocr/gitchanges/basic&amp;lt;/rss&amp;gt;&lt;br /&gt;
==Google Code Wiki Feed==&lt;br /&gt;
Below are the last 3 wiki updates:&lt;br /&gt;
&amp;lt;rss max=3&amp;gt;https://code.google.com/feeds/p/tesseract-ocr/gitchanges/basic?repo=wiki&amp;lt;/rss&amp;gt;&lt;br /&gt;
==Google Code Issue Feed==&lt;br /&gt;
Below are the last 3 issue updates:&lt;br /&gt;
&amp;lt;rss max=3&amp;gt;https://code.google.com/feeds/p/tesseract-ocr/issueupdates/basic&amp;lt;/rss&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Infobox_tool_details&lt;br /&gt;
|ohloh_id=tesseract-ocr&lt;br /&gt;
}}&lt;/div&gt;</summary>
		<author><name>Lreilly</name></author>
	</entry>
	<entry>
		<id>https://coptr.digipres.org/index.php?title=Tesseract-ocr&amp;diff=1996</id>
		<title>Tesseract-ocr</title>
		<link rel="alternate" type="text/html" href="https://coptr.digipres.org/index.php?title=Tesseract-ocr&amp;diff=1996"/>
		<updated>2014-10-04T14:23:37Z</updated>

		<summary type="html">&lt;p&gt;Lreilly: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Infobox_tool&lt;br /&gt;
|purpose=Open source OCR engine, accepting uncompressed TIFF files as input&lt;br /&gt;
|image=Tesseract.png&lt;br /&gt;
|homepage=http://code.google.com/p/tesseract-ocr/&lt;br /&gt;
|license=Apache 2.0 License EXCEPT the tesseractTrainer.py, which is licensed with GPL&lt;br /&gt;
|platforms=Linux, Windows, MacOSX&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- Delete the Categories that do not apply --&amp;gt;&lt;br /&gt;
[[Category:OCR]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Description =&lt;br /&gt;
Tesseract is probably the most accurate open source OCR engine available. Combined with the [http://leptonica.com/ Leptonica Image Processing Library] it can read a wide variety of image formats and convert them to text in over 60 languages. It was one of the top 3 engines in the 1995 UNLV Accuracy test. Between 1995 and 2006 it had little work done on it, but since then it has been improved extensively by Google.&lt;br /&gt;
&lt;br /&gt;
== Provider ==&lt;br /&gt;
Development of Tesseract is sponsored by Google. Its chief developer is [http://research.google.com/pubs/author4479.html Ray Smith].&lt;br /&gt;
&lt;br /&gt;
==Licensing and cost==&lt;br /&gt;
Tesseract is an Open Source OCR engine, available under the [http://www.apache.org/licenses/LICENSE-2.0 Apache 2.0 license]. It can be used directly, or (for programmers) using an [http://code.google.com/p/tesseract-ocr/source/browse/trunk/api/baseapi.h API].&lt;br /&gt;
&lt;br /&gt;
==History==&lt;br /&gt;
It was initially developed at HP during a 10 year period from 1984 to 1994. After a decade of minimal development it was released in 2005 for open source. Google acquired Tesseract in 2006 and currently maintains its development.&lt;br /&gt;
&lt;br /&gt;
==Platform and interoperability==&lt;br /&gt;
*The latest downloads for Linux and Windows are found on [https://drive.google.com/folderview?id=0B7l10Bj_LprhQnpSRkpGMGV2eE0&amp;amp;usp=sharing GoogleDrive]. Older versions of Tesseract and its language packs are found on the discontinued [https://code.google.com/p/tesseract-ocr/downloads/list Google Code download page].&lt;br /&gt;
*The easiest way to install Tesseract on Mac OSX is with [https://www.macports.org/ MacPorts]. Once it is installed, you can install Tesseract by running the command ''sudo port install tesseract'', and any language with ''sudo port install tesseract-&amp;lt;langcode&amp;gt;''. A list of available langcodes can be found on the [https://www.macports.org/ports.php?by=name&amp;amp;substr=tesseract- MacPorts Tesseract page].&lt;br /&gt;
*Dependencies for running Tesseract include Autotools and [http://www.leptonica.org/ Leptonica] . The Windows version requires installation of [http://msdn.microsoft.com/en-us/vstudio/aa718325.aspx Visual Studio]. More information about required Ubantu libraries and links to specific requirements are on the [https://code.google.com/p/tesseract-ocr/wiki/Compiling Tesseract Wiki].&lt;br /&gt;
*Other programs such as Scan Tailor, unpaper, ImageJ, Gimp or ImageMagick may be needed to properly prepare images for use in Tesseract.&lt;br /&gt;
&lt;br /&gt;
==Functional notes==&lt;br /&gt;
===Input supported===&lt;br /&gt;
Any image readable by Leptonica is supported in Tesseract including BMP, PNM, PNG, JFIF, JPEG, and TIFF. &lt;br /&gt;
GIF is not supported [http://www.leptonica.com/library-overview.html http://www.leptonica.com/library-overview.html].  &lt;br /&gt;
===Output generated===&lt;br /&gt;
Tesseract outputs to TXT. PDF output was added in version 3.03.&lt;br /&gt;
&lt;br /&gt;
==Documentation and support==&lt;br /&gt;
*Smith, Ray (2007). An Overview of the Tesseract OCR Engine [http://static.googleusercontent.com/media/research.google.com/en/us/pubs/archive/33418.pdf http://static.googleusercontent.com/media/research.google.com/en/us/pubs/archive/33418.pdf ]&lt;br /&gt;
*Installation information is found on the [https://code.google.com/p/tesseract-ocr/wiki/ReadMe ReadMe] page of the project site.&lt;br /&gt;
*Support is offered and issues are addressed on the [https://code.google.com/p/tesseract-ocr/issues/list Issues] page of the project site.&lt;br /&gt;
&lt;br /&gt;
==Usability==&lt;br /&gt;
*Tesseract was primarily developed for English OCR capability, but 47 language packs have been developed for use with other languages [https://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseract.1.html https://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseract.1.html]. Tesseract 2.0x and 3.0x are [https://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3 trainable] for other languages.&lt;br /&gt;
*There is no built-in GUI, but there are several available from the [https://code.google.com/p/tesseract-ocr/wiki/3rdParty 3rdParty] page.&lt;br /&gt;
&lt;br /&gt;
= User Experiences =&lt;br /&gt;
==Experiences with Tesseract==&lt;br /&gt;
*Lazorchak, Butch. (2014). Making Scanned Content Accessible Using Full-text Search and OCR [http://blogs.loc.gov/digitalpreservation/2014/08/making-scanned-content-accessible-using-full-text-search-and-ocr/ http://blogs.loc.gov/digitalpreservation/2014/08/making-scanned-content-accessible-using-full-text-search-and-ocr/]&lt;br /&gt;
*Texas A&amp;amp;M University. (2012-). Early Modern OCR Project Workflow [http://emop.tamu.edu/about http://emop.tamu.edu/about http://emop.tamu.edu/about http://emop.tamu.edu/about]&lt;br /&gt;
*Adams, Chris. (2014). Content Search on a Budget-using Tesseract on large TIFF files [http://chris.improbable.org/2014/3/17/content-search-on-a-budget/ http://chris.improbable.org/2014/3/17/content-search-on-a-budget/]&lt;br /&gt;
*PSNC Digital Libraries Team. (2011). Tesseract 3.0 installation on Ubuntu 10.10 server [http://dl.psnc.pl/2011/01/24/tesseract-3-0-installation-on-ubuntu-10-10-server/ http://dl.psnc.pl/2011/01/24/tesseract-3-0-installation-on-ubuntu-10-10-server/]&lt;br /&gt;
*Lacy, David. (2014). Digital Library upgrade provides enhanced discovery [http://blog.library.villanova.edu/digitallibrary/2014/02/18/digital-library-upgrade-provides-enhanced-discovery/#sthash.OwCtLlEc.dpuf http://blog.library.villanova.edu/digitallibrary/2014/02/18/digital-library-upgrade-provides-enhanced-discovery/#sthash.OwCtLlEc.dpuf]&lt;br /&gt;
*Applied in an AQuA Mashup that resulted in the Solution page: [http://wiki.opf-labs.org/display/AQuA/Compare+OCR+results+of+the+same+source+material+in+different+formats+%28TIFF%2C+JP2%29 Compare OCR results of the same source material in different formats (TIFF, JP2)]&lt;br /&gt;
==Experiences with software integrating Tesseract==&lt;br /&gt;
*Integration with the free [http://www.dcc.ac.uk/resources/external/xena-software-0 Xena-Digital Preservation Software][http://sourceforge.net/projects/xena/?source=navbar http://sourceforge.net/projects/xena/?source=navbar]&lt;br /&gt;
*Integration with Free Online OCR [http://www.free-ocr.com/faq.html http://www.free-ocr.com/faq.html]&lt;br /&gt;
&lt;br /&gt;
= Activity Feeds =&lt;br /&gt;
==Google Code Source Feed==&lt;br /&gt;
Below the last 5 source updates:&lt;br /&gt;
&amp;lt;rss max=5&amp;gt;https://code.google.com/feeds/p/tesseract-ocr/gitchanges/basic&amp;lt;/rss&amp;gt;&lt;br /&gt;
==Google Code Wiki Feed==&lt;br /&gt;
Below are the last 3 wiki updates:&lt;br /&gt;
&amp;lt;rss max=3&amp;gt;https://code.google.com/feeds/p/tesseract-ocr/gitchanges/basic?repo=wiki&amp;lt;/rss&amp;gt;&lt;br /&gt;
==Google Code Issue Feed==&lt;br /&gt;
Below are the last 3 issue updates:&lt;br /&gt;
&amp;lt;rss max=3&amp;gt;https://code.google.com/feeds/p/tesseract-ocr/issueupdates/basic&amp;lt;/rss&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Infobox_tool_details&lt;br /&gt;
|ohloh_id=tesseract-ocr&lt;br /&gt;
}}&lt;/div&gt;</summary>
		<author><name>Lreilly</name></author>
	</entry>
	<entry>
		<id>https://coptr.digipres.org/index.php?title=Tesseract-ocr&amp;diff=1995</id>
		<title>Tesseract-ocr</title>
		<link rel="alternate" type="text/html" href="https://coptr.digipres.org/index.php?title=Tesseract-ocr&amp;diff=1995"/>
		<updated>2014-10-04T14:12:41Z</updated>

		<summary type="html">&lt;p&gt;Lreilly: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Infobox_tool&lt;br /&gt;
|purpose=Open source OCR engine, accepting uncompressed TIFF files as input&lt;br /&gt;
|image=Tesseract.png&lt;br /&gt;
|homepage=http://code.google.com/p/tesseract-ocr/&lt;br /&gt;
|license=Apache 2.0 License EXCEPT the tesseractTrainer.py, which is licensed with GPL&lt;br /&gt;
|platforms=Linux, Windows, MacOSX&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- Delete the Categories that do not apply --&amp;gt;&lt;br /&gt;
[[Category:OCR]]&lt;br /&gt;
[[Category:Quality Assurance]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Description =&lt;br /&gt;
Tesseract is probably the most accurate open source OCR engine available. Combined with the [http://leptonica.com/ Leptonica Image Processing Library] it can read a wide variety of image formats and convert them to text in over 60 languages. It was one of the top 3 engines in the 1995 UNLV Accuracy test. Between 1995 and 2006 it had little work done on it, but since then it has been improved extensively by Google.&lt;br /&gt;
&lt;br /&gt;
== Provider ==&lt;br /&gt;
Development of Tesseract is sponsored by Google. Its chief developer is [http://research.google.com/pubs/author4479.html Ray Smith].&lt;br /&gt;
&lt;br /&gt;
==Licensing and cost==&lt;br /&gt;
Tesseract is an Open Source OCR engine, available under the [http://www.apache.org/licenses/LICENSE-2.0 Apache 2.0 license]. It can be used directly, or (for programmers) using an [http://code.google.com/p/tesseract-ocr/source/browse/trunk/api/baseapi.h API].&lt;br /&gt;
&lt;br /&gt;
==History==&lt;br /&gt;
It was initially developed at HP during a 10 year period from 1984 to 1994. After a decade of minimal development it was released in 2005 for open source. Google acquired Tesseract in 2006 and currently maintains its development.&lt;br /&gt;
&lt;br /&gt;
==Platform and interoperability==&lt;br /&gt;
*The latest downloads for Linux and Windows are found on [https://drive.google.com/folderview?id=0B7l10Bj_LprhQnpSRkpGMGV2eE0&amp;amp;usp=sharing GoogleDrive]. Older versions of Tesseract and its language packs are found on the discontinued [https://code.google.com/p/tesseract-ocr/downloads/list Google Code download page].&lt;br /&gt;
*The easiest way to install Tesseract on Mac OSX is with [https://www.macports.org/ MacPorts]. Once it is installed, you can install Tesseract by running the command ''sudo port install tesseract'', and any language with ''sudo port install tesseract-&amp;lt;langcode&amp;gt;''. A list of available langcodes can be found on the [https://www.macports.org/ports.php?by=name&amp;amp;substr=tesseract- MacPorts Tesseract page].&lt;br /&gt;
*Dependencies for running Tesseract include Autotools and [http://www.leptonica.org/ Leptonica] . The Windows version requires installation of [http://msdn.microsoft.com/en-us/vstudio/aa718325.aspx Visual Studio]. More information about required Ubantu libraries and links to specific requirements are on the [https://code.google.com/p/tesseract-ocr/wiki/Compiling Tesseract Wiki].&lt;br /&gt;
*Other programs such as Scan Tailor, unpaper, ImageJ, Gimp or ImageMagick may be needed to properly prepare images for use in Tesseract.&lt;br /&gt;
&lt;br /&gt;
==Functional notes==&lt;br /&gt;
===Input supported===&lt;br /&gt;
Any image readable by Leptonica is supported in Tesseract including BMP, PNM, PNG, JFIF, JPEG, and TIFF. &lt;br /&gt;
GIF is not supported [http://www.leptonica.com/library-overview.html http://www.leptonica.com/library-overview.html].  &lt;br /&gt;
===Output generated===&lt;br /&gt;
Tesseract outputs to TXT. PDF output was added in version 3.03.&lt;br /&gt;
&lt;br /&gt;
==Documentation and support==&lt;br /&gt;
*Smith, Ray (2007). An Overview of the Tesseract OCR Engine [http://static.googleusercontent.com/media/research.google.com/en/us/pubs/archive/33418.pdf http://static.googleusercontent.com/media/research.google.com/en/us/pubs/archive/33418.pdf ]&lt;br /&gt;
*Installation information is found on the [https://code.google.com/p/tesseract-ocr/wiki/ReadMe ReadMe] page of the project site.&lt;br /&gt;
*Support is offered and issues are addressed on the [https://code.google.com/p/tesseract-ocr/issues/list Issues] page of the project site.&lt;br /&gt;
&lt;br /&gt;
==Usability==&lt;br /&gt;
*Tesseract was primarily developed for English OCR capability, but 47 language packs have been developed for use with other languages [https://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseract.1.html https://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseract.1.html]. Tesseract 2.0x and 3.0x are [https://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3 trainable] for other languages.&lt;br /&gt;
*There is no built-in GUI, but there are several available from the [https://code.google.com/p/tesseract-ocr/wiki/3rdParty 3rdParty] page.&lt;br /&gt;
&lt;br /&gt;
= User Experiences =&lt;br /&gt;
==Experiences with Tesseract==&lt;br /&gt;
*Lazorchak, Butch. (2014). Making Scanned Content Accessible Using Full-text Search and OCR [http://blogs.loc.gov/digitalpreservation/2014/08/making-scanned-content-accessible-using-full-text-search-and-ocr/ http://blogs.loc.gov/digitalpreservation/2014/08/making-scanned-content-accessible-using-full-text-search-and-ocr/]&lt;br /&gt;
*Texas A&amp;amp;M University. (2012-). Early Modern OCR Project Workflow [http://emop.tamu.edu/about http://emop.tamu.edu/about http://emop.tamu.edu/about http://emop.tamu.edu/about]&lt;br /&gt;
*Adams, Chris. (2014). Content Search on a Budget-using Tesseract on large TIFF files [http://chris.improbable.org/2014/3/17/content-search-on-a-budget/ http://chris.improbable.org/2014/3/17/content-search-on-a-budget/]&lt;br /&gt;
*PSNC Digital Libraries Team. (2011). Tesseract 3.0 installation on Ubuntu 10.10 server [http://dl.psnc.pl/2011/01/24/tesseract-3-0-installation-on-ubuntu-10-10-server/ http://dl.psnc.pl/2011/01/24/tesseract-3-0-installation-on-ubuntu-10-10-server/]&lt;br /&gt;
*Lacy, David. (2014). Digital Library upgrade provides enhanced discovery [http://blog.library.villanova.edu/digitallibrary/2014/02/18/digital-library-upgrade-provides-enhanced-discovery/#sthash.OwCtLlEc.dpuf http://blog.library.villanova.edu/digitallibrary/2014/02/18/digital-library-upgrade-provides-enhanced-discovery/#sthash.OwCtLlEc.dpuf]&lt;br /&gt;
*Applied in an AQuA Mashup that resulted in the Solution page: [http://wiki.opf-labs.org/display/AQuA/Compare+OCR+results+of+the+same+source+material+in+different+formats+%28TIFF%2C+JP2%29 Compare OCR results of the same source material in different formats (TIFF, JP2)]&lt;br /&gt;
==Experiences with software integrating Tesseract==&lt;br /&gt;
*Integration with the free [http://www.dcc.ac.uk/resources/external/xena-software-0 Xena-Digital Preservation Software][http://sourceforge.net/projects/xena/?source=navbar http://sourceforge.net/projects/xena/?source=navbar]&lt;br /&gt;
*Integration with Free Online OCR [http://www.free-ocr.com/faq.html http://www.free-ocr.com/faq.html]&lt;br /&gt;
&lt;br /&gt;
= Activity Feeds =&lt;br /&gt;
==Google Code Source Feed==&lt;br /&gt;
Below the last 5 source updates:&lt;br /&gt;
&amp;lt;rss max=5&amp;gt;https://code.google.com/feeds/p/tesseract-ocr/gitchanges/basic&amp;lt;/rss&amp;gt;&lt;br /&gt;
==Google Code Wiki Feed==&lt;br /&gt;
Below are the last 3 wiki updates:&lt;br /&gt;
&amp;lt;rss max=3&amp;gt;https://code.google.com/feeds/p/tesseract-ocr/gitchanges/basic?repo=wiki&amp;lt;/rss&amp;gt;&lt;br /&gt;
==Google Code Issue Feed==&lt;br /&gt;
Below are the last 3 issue updates:&lt;br /&gt;
&amp;lt;rss max=3&amp;gt;https://code.google.com/feeds/p/tesseract-ocr/issueupdates/basic&amp;lt;/rss&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Infobox_tool_details&lt;br /&gt;
|ohloh_id=tesseract-ocr&lt;br /&gt;
}}&lt;/div&gt;</summary>
		<author><name>Lreilly</name></author>
	</entry>
	<entry>
		<id>https://coptr.digipres.org/index.php?title=Tesseract-ocr&amp;diff=1994</id>
		<title>Tesseract-ocr</title>
		<link rel="alternate" type="text/html" href="https://coptr.digipres.org/index.php?title=Tesseract-ocr&amp;diff=1994"/>
		<updated>2014-10-04T14:06:25Z</updated>

		<summary type="html">&lt;p&gt;Lreilly: Added activity feeds.&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Infobox_tool&lt;br /&gt;
|purpose=Open source OCR engine, accepting uncompressed TIFF files as input&lt;br /&gt;
|image=Tesseract.png&lt;br /&gt;
|homepage=http://code.google.com/p/tesseract-ocr/&lt;br /&gt;
|license=Apache 2.0 License EXCEPT the tesseractTrainer.py, which is licensed with GPL&lt;br /&gt;
|platforms=LInux, Windows, MacOSX&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- Delete the Categories that do not apply --&amp;gt;&lt;br /&gt;
[[Category:OCR]]&lt;br /&gt;
[[Category:Quality Assurance]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Description =&lt;br /&gt;
Tesseract is probably the most accurate open source OCR engine available. Combined with the [http://leptonica.com/ Leptonica Image Processing Library] it can read a wide variety of image formats and convert them to text in over 60 languages. It was one of the top 3 engines in the 1995 UNLV Accuracy test. Between 1995 and 2006 it had little work done on it, but since then it has been improved extensively by Google.&lt;br /&gt;
&lt;br /&gt;
== Provider ==&lt;br /&gt;
Development of Tesseract is sponsored by Google. Its chief developer is [http://research.google.com/pubs/author4479.html Ray Smith].&lt;br /&gt;
&lt;br /&gt;
==Licensing and cost==&lt;br /&gt;
Tesseract is an Open Source OCR engine, available under the [http://www.apache.org/licenses/LICENSE-2.0 Apache 2.0 license]. It can be used directly, or (for programmers) using an [http://code.google.com/p/tesseract-ocr/source/browse/trunk/api/baseapi.h API].&lt;br /&gt;
&lt;br /&gt;
==History==&lt;br /&gt;
It was initially developed at HP during a 10 year period from 1984 to 1994. After a decade of minimal development it was released in 2005 for open source. Google acquired Tesseract in 2006 and currently maintains its development.&lt;br /&gt;
&lt;br /&gt;
==Platform and interoperability==&lt;br /&gt;
*The latest downloads for Linux and Windows are found on [https://drive.google.com/folderview?id=0B7l10Bj_LprhQnpSRkpGMGV2eE0&amp;amp;usp=sharing GoogleDrive]. Older versions of Tesseract and its language packs are found on the discontinued [https://code.google.com/p/tesseract-ocr/downloads/list Google Code download page].&lt;br /&gt;
*The easiest way to install Tesseract on Mac OSX is with [https://www.macports.org/ MacPorts]. Once it is installed, you can install Tesseract by running the command ''sudo port install tesseract'', and any language with ''sudo port install tesseract-&amp;lt;langcode&amp;gt;''. List of available langcodes can be found onMacPorts tesseract page. A list of available langcodes can be found on the [https://www.macports.org/ports.php?by=name&amp;amp;substr=tesseract- MacPorts Tesseract page].&lt;br /&gt;
*Dependencies for running Tesseract on the Linux system include Autotools and [http://www.leptonica.org/ Leptonica] . The Windows version requires installation of [http://msdn.microsoft.com/en-us/vstudio/aa718325.aspx Visual Studio]. More information about required Ubantu libraries and links to specific requirements are on the [https://code.google.com/p/tesseract-ocr/wiki/Compiling Tesseract Wiki].&lt;br /&gt;
*Other programs such as Scan Tailor, unpaper, ImageJ, Gimp or ImageMagick may be needed to properly prepare images for use in Tesseract.&lt;br /&gt;
&lt;br /&gt;
==Functional notes==&lt;br /&gt;
===Input supported===&lt;br /&gt;
Any image readable by Leptonica is supported in Tesseract including BMP, PNM, PNG, JFIF, JPEG, TIFF. &lt;br /&gt;
GIF is not supported [http://www.leptonica.com/library-overview.html http://www.leptonica.com/library-overview.html].  &lt;br /&gt;
===Output generated===&lt;br /&gt;
Tesseract outputs to TXT. PDF output was added in version 3.03.&lt;br /&gt;
&lt;br /&gt;
==Documentation and support==&lt;br /&gt;
*Smith, Ray (2007). An Overview of the Tesseract OCR Engine [http://static.googleusercontent.com/media/research.google.com/en/us/pubs/archive/33418.pdf http://static.googleusercontent.com/media/research.google.com/en/us/pubs/archive/33418.pdf ]&lt;br /&gt;
*Installation information is found on the [https://code.google.com/p/tesseract-ocr/wiki/ReadMe ReadMe] page of the project site.&lt;br /&gt;
*Support is offered and issues are addressed on the [https://code.google.com/p/tesseract-ocr/issues/list Issues] page of project site.&lt;br /&gt;
&lt;br /&gt;
==Usability==&lt;br /&gt;
*Tesseract was primarily developed for English OCR capability, but 47 language packs have been developed for use with other languages [https://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseract.1.html https://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseract.1.html]. Tesseract 2.0x and 3.0x are [https://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3 trainable] for other languages.&lt;br /&gt;
*There is no built-in GUI, but there are several available from the [https://code.google.com/p/tesseract-ocr/wiki/3rdParty 3rdParty] page.&lt;br /&gt;
&lt;br /&gt;
= User Experiences =&lt;br /&gt;
==Experiences with Tesseract==&lt;br /&gt;
*Lazorchak, Butch. (2014). Making Scanned Content Accessible Using Full-text Search and OCR [http://blogs.loc.gov/digitalpreservation/2014/08/making-scanned-content-accessible-using-full-text-search-and-ocr/ http://blogs.loc.gov/digitalpreservation/2014/08/making-scanned-content-accessible-using-full-text-search-and-ocr/]&lt;br /&gt;
*Texas A&amp;amp;M University. (2012-). Early Modern OCR Project Workflow [http://emop.tamu.edu/about http://emop.tamu.edu/about]&lt;br /&gt;
*Adams, Chris. (2014). Content Search on a Budget-using Tesseract on large TIFF files[http://chris.improbable.org/2014/3/17/content-search-on-a-budget/ http://chris.improbable.org/2014/3/17/content-search-on-a-budget/]&lt;br /&gt;
*PSNC Digital Libraries Team. (2011). Tesseract 3.0 installation on Ubuntu 10.10 server [http://dl.psnc.pl/2011/01/24/tesseract-3-0-installation-on-ubuntu-10-10-server/ http://dl.psnc.pl/2011/01/24/tesseract-3-0-installation-on-ubuntu-10-10-server/]&lt;br /&gt;
*Lacy, David. (2014). Digital Library upgrade provides enhanced discovery [http://blog.library.villanova.edu/digitallibrary/2014/02/18/digital-library-upgrade-provides-enhanced-discovery/#sthash.OwCtLlEc.dpuf http://blog.library.villanova.edu/digitallibrary/2014/02/18/digital-library-upgrade-provides-enhanced-discovery/#sthash.OwCtLlEc.dpuf]&lt;br /&gt;
*Applied in an AQuA Mashup that resulted in the Solution page: [http://wiki.opf-labs.org/display/AQuA/Compare+OCR+results+of+the+same+source+material+in+different+formats+%28TIFF%2C+JP2%29 Compare OCR results of the same source material in different formats (TIFF, JP2)]&lt;br /&gt;
==Experiences with software integrating Tesseract==&lt;br /&gt;
*Integration with the free [http://www.dcc.ac.uk/resources/external/xena-software-0 Xena-Digital Preservation Software][http://sourceforge.net/projects/xena/?source=navbar http://sourceforge.net/projects/xena/?source=navbar]&lt;br /&gt;
*Integration with Free Online OCR [http://www.free-ocr.com/faq.html http://www.free-ocr.com/faq.html]&lt;br /&gt;
&lt;br /&gt;
= Activity Feeds =&lt;br /&gt;
==Google Code Source Feed==&lt;br /&gt;
Below the last 5 source updates:&lt;br /&gt;
&amp;lt;rss max=5&amp;gt;https://code.google.com/feeds/p/tesseract-ocr/gitchanges/basic&amp;lt;/rss&amp;gt;&lt;br /&gt;
==Google Code Wiki Feed==&lt;br /&gt;
Below are the last 3 wiki updates:&lt;br /&gt;
&amp;lt;rss max=3&amp;gt;https://code.google.com/feeds/p/tesseract-ocr/gitchanges/basic?repo=wiki&amp;lt;/rss&amp;gt;&lt;br /&gt;
==Google Code Issue Feed==&lt;br /&gt;
Below are the last 3 issue updates:&lt;br /&gt;
&amp;lt;rss max=3&amp;gt;https://code.google.com/feeds/p/tesseract-ocr/issueupdates/basic&amp;lt;/rss&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Infobox_tool_details&lt;br /&gt;
|ohloh_id=tesseract-ocr&lt;br /&gt;
}}&lt;/div&gt;</summary>
		<author><name>Lreilly</name></author>
	</entry>
	<entry>
		<id>https://coptr.digipres.org/index.php?title=Tesseract-ocr&amp;diff=1993</id>
		<title>Tesseract-ocr</title>
		<link rel="alternate" type="text/html" href="https://coptr.digipres.org/index.php?title=Tesseract-ocr&amp;diff=1993"/>
		<updated>2014-10-04T13:45:17Z</updated>

		<summary type="html">&lt;p&gt;Lreilly: /* Experiences with software integrated with Tesseract */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Infobox_tool&lt;br /&gt;
|purpose=Open source OCR engine, accepting uncompressed TIFF files as input&lt;br /&gt;
|image=Tesseract.png&lt;br /&gt;
|homepage=http://code.google.com/p/tesseract-ocr/&lt;br /&gt;
|license=Apache 2.0 License EXCEPT the tesseractTrainer.py, which is licensed with GPL&lt;br /&gt;
|platforms=LInux, Windows, MacOSX&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- Delete the Categories that do not apply --&amp;gt;&lt;br /&gt;
[[Category:OCR]]&lt;br /&gt;
[[Category:Quality Assurance]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Description =&lt;br /&gt;
Tesseract is probably the most accurate open source OCR engine available. Combined with the [http://leptonica.com/ Leptonica Image Processing Library] it can read a wide variety of image formats and convert them to text in over 60 languages. It was one of the top 3 engines in the 1995 UNLV Accuracy test. Between 1995 and 2006 it had little work done on it, but since then it has been improved extensively by Google.&lt;br /&gt;
&lt;br /&gt;
== Provider ==&lt;br /&gt;
Development of Tesseract is sponsored by Google. Its chief developer is [http://research.google.com/pubs/author4479.html Ray Smith].&lt;br /&gt;
&lt;br /&gt;
==Licensing and cost==&lt;br /&gt;
Tesseract is an Open Source OCR engine, available under the [http://www.apache.org/licenses/LICENSE-2.0 Apache 2.0 license]. It can be used directly, or (for programmers) using an [http://code.google.com/p/tesseract-ocr/source/browse/trunk/api/baseapi.h API].&lt;br /&gt;
&lt;br /&gt;
==History==&lt;br /&gt;
It was initially developed at HP during a 10 year period from 1984 to 1994. After a decade of minimal development it was released in 2005 for open source. Google acquired Tesseract in 2006 and currently maintains its development.&lt;br /&gt;
&lt;br /&gt;
==Platform and interoperability==&lt;br /&gt;
*The latest downloads for Linux and Windows are found on [https://drive.google.com/folderview?id=0B7l10Bj_LprhQnpSRkpGMGV2eE0&amp;amp;usp=sharing GoogleDrive]. Older versions of Tesseract and its language packs are found on the discontinued [https://code.google.com/p/tesseract-ocr/downloads/list Google Code download page].&lt;br /&gt;
*The easiest way to install Tesseract on Mac OSX is with [https://www.macports.org/ MacPorts]. Once it is installed, you can install Tesseract by running the command ''sudo port install tesseract'', and any language with ''sudo port install tesseract-&amp;lt;langcode&amp;gt;''. List of available langcodes can be found onMacPorts tesseract page. A list of available langcodes can be found on the [https://www.macports.org/ports.php?by=name&amp;amp;substr=tesseract- MacPorts Tesseract page].&lt;br /&gt;
*Dependencies for running Tesseract on the Linux system include Autotools and [http://www.leptonica.org/ Leptonica] . The Windows version requires installation of [http://msdn.microsoft.com/en-us/vstudio/aa718325.aspx Visual Studio]. More information about required Ubantu libraries and links to specific requirements are on the [https://code.google.com/p/tesseract-ocr/wiki/Compiling Tesseract Wiki].&lt;br /&gt;
*Other programs such as Scan Tailor, unpaper, ImageJ, Gimp or ImageMagick may be needed to properly prepare images for use in Tesseract.&lt;br /&gt;
&lt;br /&gt;
==Functional notes==&lt;br /&gt;
===Input supported===&lt;br /&gt;
Any image readable by Leptonica is supported in Tesseract including BMP, PNM, PNG, JFIF, JPEG, TIFF. &lt;br /&gt;
GIF is not supported [http://www.leptonica.com/library-overview.html http://www.leptonica.com/library-overview.html].  &lt;br /&gt;
===Output generated===&lt;br /&gt;
Tesseract outputs to TXT. PDF output was added in version 3.03.&lt;br /&gt;
&lt;br /&gt;
==Documentation and support==&lt;br /&gt;
*Smith, Ray (2007). An Overview of the Tesseract OCR Engine [http://static.googleusercontent.com/media/research.google.com/en/us/pubs/archive/33418.pdf http://static.googleusercontent.com/media/research.google.com/en/us/pubs/archive/33418.pdf ]&lt;br /&gt;
*Installation information is found on the [https://code.google.com/p/tesseract-ocr/wiki/ReadMe ReadMe] page of the project site.&lt;br /&gt;
*Support is offered and issues are addressed on the [https://code.google.com/p/tesseract-ocr/issues/list Issues] page of project site.&lt;br /&gt;
&lt;br /&gt;
==Usability==&lt;br /&gt;
*Tesseract was primarily developed for English OCR capability, but 47 language packs have been developed for use with other languages [https://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseract.1.html https://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseract.1.html]. Tesseract 2.0x and 3.0x are [https://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3 trainable] for other languages.&lt;br /&gt;
*There is no built-in GUI, but there are several available from the [https://code.google.com/p/tesseract-ocr/wiki/3rdParty 3rdParty] page.&lt;br /&gt;
&lt;br /&gt;
= User Experiences =&lt;br /&gt;
==Experiences with Tesseract==&lt;br /&gt;
*Lazorchak, Butch. (2014). Making Scanned Content Accessible Using Full-text Search and OCR [http://blogs.loc.gov/digitalpreservation/2014/08/making-scanned-content-accessible-using-full-text-search-and-ocr/ http://blogs.loc.gov/digitalpreservation/2014/08/making-scanned-content-accessible-using-full-text-search-and-ocr/]&lt;br /&gt;
*Texas A&amp;amp;M University. (2012-). Early Modern OCR Project Workflow [http://emop.tamu.edu/about http://emop.tamu.edu/about]&lt;br /&gt;
*Adams, Chris. (2014). Content Search on a Budget-using Tesseract on large TIFF files[http://chris.improbable.org/2014/3/17/content-search-on-a-budget/ http://chris.improbable.org/2014/3/17/content-search-on-a-budget/]&lt;br /&gt;
*PSNC Digital Libraries Team. (2011). Tesseract 3.0 installation on Ubuntu 10.10 server [http://dl.psnc.pl/2011/01/24/tesseract-3-0-installation-on-ubuntu-10-10-server/ http://dl.psnc.pl/2011/01/24/tesseract-3-0-installation-on-ubuntu-10-10-server/]&lt;br /&gt;
*Lacy, David. (2014). Digital Library upgrade provides enhanced discovery [http://blog.library.villanova.edu/digitallibrary/2014/02/18/digital-library-upgrade-provides-enhanced-discovery/#sthash.OwCtLlEc.dpuf http://blog.library.villanova.edu/digitallibrary/2014/02/18/digital-library-upgrade-provides-enhanced-discovery/#sthash.OwCtLlEc.dpuf]&lt;br /&gt;
*Applied in an AQuA Mashup that resulted in the Solution page: [http://wiki.opf-labs.org/display/AQuA/Compare+OCR+results+of+the+same+source+material+in+different+formats+%28TIFF%2C+JP2%29 Compare OCR results of the same source material in different formats (TIFF, JP2)]&lt;br /&gt;
==Experiences with software integrating Tesseract==&lt;br /&gt;
*Integration with the free [http://www.dcc.ac.uk/resources/external/xena-software-0 Xena-Digital Preservation Software][http://sourceforge.net/projects/xena/?source=navbar http://sourceforge.net/projects/xena/?source=navbar]&lt;br /&gt;
*Integration with Free Online OCR [http://www.free-ocr.com/faq.html http://www.free-ocr.com/faq.html]&lt;br /&gt;
&lt;br /&gt;
= Development Activity =&lt;br /&gt;
&lt;br /&gt;
{{Infobox_tool_details&lt;br /&gt;
|ohloh_id=tesseract-ocr&lt;br /&gt;
}}&lt;/div&gt;</summary>
		<author><name>Lreilly</name></author>
	</entry>
	<entry>
		<id>https://coptr.digipres.org/index.php?title=Tesseract-ocr&amp;diff=1992</id>
		<title>Tesseract-ocr</title>
		<link rel="alternate" type="text/html" href="https://coptr.digipres.org/index.php?title=Tesseract-ocr&amp;diff=1992"/>
		<updated>2014-10-04T13:39:25Z</updated>

		<summary type="html">&lt;p&gt;Lreilly: Major update of user experiences.&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Infobox_tool&lt;br /&gt;
|purpose=Open source OCR engine, accepting uncompressed TIFF files as input&lt;br /&gt;
|image=Tesseract.png&lt;br /&gt;
|homepage=http://code.google.com/p/tesseract-ocr/&lt;br /&gt;
|license=Apache 2.0 License EXCEPT the tesseractTrainer.py, which is licensed with GPL&lt;br /&gt;
|platforms=LInux, Windows, MacOSX&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- Delete the Categories that do not apply --&amp;gt;&lt;br /&gt;
[[Category:OCR]]&lt;br /&gt;
[[Category:Quality Assurance]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Description =&lt;br /&gt;
Tesseract is probably the most accurate open source OCR engine available. Combined with the [http://leptonica.com/ Leptonica Image Processing Library] it can read a wide variety of image formats and convert them to text in over 60 languages. It was one of the top 3 engines in the 1995 UNLV Accuracy test. Between 1995 and 2006 it had little work done on it, but since then it has been improved extensively by Google.&lt;br /&gt;
&lt;br /&gt;
== Provider ==&lt;br /&gt;
Development of Tesseract is sponsored by Google. Its chief developer is [http://research.google.com/pubs/author4479.html Ray Smith].&lt;br /&gt;
&lt;br /&gt;
==Licensing and cost==&lt;br /&gt;
Tesseract is an Open Source OCR engine, available under the [http://www.apache.org/licenses/LICENSE-2.0 Apache 2.0 license]. It can be used directly, or (for programmers) using an [http://code.google.com/p/tesseract-ocr/source/browse/trunk/api/baseapi.h API].&lt;br /&gt;
&lt;br /&gt;
==History==&lt;br /&gt;
It was initially developed at HP during a 10 year period from 1984 to 1994. After a decade of minimal development it was released in 2005 for open source. Google acquired Tesseract in 2006 and currently maintains its development.&lt;br /&gt;
&lt;br /&gt;
==Platform and interoperability==&lt;br /&gt;
*The latest downloads for Linux and Windows are found on [https://drive.google.com/folderview?id=0B7l10Bj_LprhQnpSRkpGMGV2eE0&amp;amp;usp=sharing GoogleDrive]. Older versions of Tesseract and its language packs are found on the discontinued [https://code.google.com/p/tesseract-ocr/downloads/list Google Code download page].&lt;br /&gt;
*The easiest way to install Tesseract on Mac OSX is with [https://www.macports.org/ MacPorts]. Once it is installed, you can install Tesseract by running the command ''sudo port install tesseract'', and any language with ''sudo port install tesseract-&amp;lt;langcode&amp;gt;''. List of available langcodes can be found onMacPorts tesseract page. A list of available langcodes can be found on the [https://www.macports.org/ports.php?by=name&amp;amp;substr=tesseract- MacPorts Tesseract page].&lt;br /&gt;
*Dependencies for running Tesseract on the Linux system include Autotools and [http://www.leptonica.org/ Leptonica] . The Windows version requires installation of [http://msdn.microsoft.com/en-us/vstudio/aa718325.aspx Visual Studio]. More information about required Ubantu libraries and links to specific requirements are on the [https://code.google.com/p/tesseract-ocr/wiki/Compiling Tesseract Wiki].&lt;br /&gt;
*Other programs such as Scan Tailor, unpaper, ImageJ, Gimp or ImageMagick may be needed to properly prepare images for use in Tesseract.&lt;br /&gt;
&lt;br /&gt;
==Functional notes==&lt;br /&gt;
===Input supported===&lt;br /&gt;
Any image readable by Leptonica is supported in Tesseract including BMP, PNM, PNG, JFIF, JPEG, TIFF. &lt;br /&gt;
GIF is not supported [http://www.leptonica.com/library-overview.html http://www.leptonica.com/library-overview.html].  &lt;br /&gt;
===Output generated===&lt;br /&gt;
Tesseract outputs to TXT. PDF output was added in version 3.03.&lt;br /&gt;
&lt;br /&gt;
==Documentation and support==&lt;br /&gt;
*Smith, Ray (2007). An Overview of the Tesseract OCR Engine [http://static.googleusercontent.com/media/research.google.com/en/us/pubs/archive/33418.pdf http://static.googleusercontent.com/media/research.google.com/en/us/pubs/archive/33418.pdf ]&lt;br /&gt;
*Installation information is found on the [https://code.google.com/p/tesseract-ocr/wiki/ReadMe ReadMe] page of the project site.&lt;br /&gt;
*Support is offered and issues are addressed on the [https://code.google.com/p/tesseract-ocr/issues/list Issues] page of project site.&lt;br /&gt;
&lt;br /&gt;
==Usability==&lt;br /&gt;
*Tesseract was primarily developed for English OCR capability, but 47 language packs have been developed for use with other languages [https://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseract.1.html https://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseract.1.html]. Tesseract 2.0x and 3.0x are [https://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3 trainable] for other languages.&lt;br /&gt;
*There is no built-in GUI, but there are several available from the [https://code.google.com/p/tesseract-ocr/wiki/3rdParty 3rdParty] page.&lt;br /&gt;
&lt;br /&gt;
= User Experiences =&lt;br /&gt;
==Experiences with Tesseract==&lt;br /&gt;
*Lazorchak, Butch. (2014). Making Scanned Content Accessible Using Full-text Search and OCR [http://blogs.loc.gov/digitalpreservation/2014/08/making-scanned-content-accessible-using-full-text-search-and-ocr/ http://blogs.loc.gov/digitalpreservation/2014/08/making-scanned-content-accessible-using-full-text-search-and-ocr/]&lt;br /&gt;
*Texas A&amp;amp;M University. (2012-). Early Modern OCR Project Workflow [http://emop.tamu.edu/about http://emop.tamu.edu/about]&lt;br /&gt;
*Adams, Chris. (2014). Content Search on a Budget-using Tesseract on large TIFF files[http://chris.improbable.org/2014/3/17/content-search-on-a-budget/ http://chris.improbable.org/2014/3/17/content-search-on-a-budget/]&lt;br /&gt;
*PSNC Digital Libraries Team. (2011). Tesseract 3.0 installation on Ubuntu 10.10 server [http://dl.psnc.pl/2011/01/24/tesseract-3-0-installation-on-ubuntu-10-10-server/ http://dl.psnc.pl/2011/01/24/tesseract-3-0-installation-on-ubuntu-10-10-server/]&lt;br /&gt;
*Lacy, David. (2014). Digital Library upgrade provides enhanced discovery [http://blog.library.villanova.edu/digitallibrary/2014/02/18/digital-library-upgrade-provides-enhanced-discovery/#sthash.OwCtLlEc.dpuf http://blog.library.villanova.edu/digitallibrary/2014/02/18/digital-library-upgrade-provides-enhanced-discovery/#sthash.OwCtLlEc.dpuf]&lt;br /&gt;
*Applied in an AQuA Mashup that resulted in the Solution page: [http://wiki.opf-labs.org/display/AQuA/Compare+OCR+results+of+the+same+source+material+in+different+formats+%28TIFF%2C+JP2%29 Compare OCR results of the same source material in different formats (TIFF, JP2)]&lt;br /&gt;
==Experiences with software integrated with Tesseract==&lt;br /&gt;
*Integration with the free [http://www.dcc.ac.uk/resources/external/xena-software-0 Xena-Digital Preservation Software][http://sourceforge.net/projects/xena/?source=navbar http://sourceforge.net/projects/xena/?source=navbar]&lt;br /&gt;
*Integration with Free Online OCR [http://www.free-ocr.com/faq.html http://www.free-ocr.com/faq.html]&lt;br /&gt;
&lt;br /&gt;
= Development Activity =&lt;br /&gt;
&lt;br /&gt;
{{Infobox_tool_details&lt;br /&gt;
|ohloh_id=tesseract-ocr&lt;br /&gt;
}}&lt;/div&gt;</summary>
		<author><name>Lreilly</name></author>
	</entry>
	<entry>
		<id>https://coptr.digipres.org/index.php?title=Tesseract-ocr&amp;diff=1991</id>
		<title>Tesseract-ocr</title>
		<link rel="alternate" type="text/html" href="https://coptr.digipres.org/index.php?title=Tesseract-ocr&amp;diff=1991"/>
		<updated>2014-10-04T12:47:08Z</updated>

		<summary type="html">&lt;p&gt;Lreilly: /* Platform and interoperability */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Infobox_tool&lt;br /&gt;
|purpose=Open source OCR engine, accepting uncompressed TIFF files as input&lt;br /&gt;
|image=Tesseract.png&lt;br /&gt;
|homepage=http://code.google.com/p/tesseract-ocr/&lt;br /&gt;
|license=Apache 2.0 License EXCEPT the tesseractTrainer.py, which is licensed with GPL&lt;br /&gt;
|platforms=LInux, Windows, MacOSX&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- Delete the Categories that do not apply --&amp;gt;&lt;br /&gt;
[[Category:OCR]]&lt;br /&gt;
[[Category:Quality Assurance]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Description =&lt;br /&gt;
Tesseract is probably the most accurate open source OCR engine available. Combined with the [http://leptonica.com/ Leptonica Image Processing Library] it can read a wide variety of image formats and convert them to text in over 60 languages. It was one of the top 3 engines in the 1995 UNLV Accuracy test. Between 1995 and 2006 it had little work done on it, but since then it has been improved extensively by Google.&lt;br /&gt;
&lt;br /&gt;
== Provider ==&lt;br /&gt;
Development of Tesseract is sponsored by Google. Its chief developer is [http://research.google.com/pubs/author4479.html Ray Smith].&lt;br /&gt;
&lt;br /&gt;
==Licensing and cost==&lt;br /&gt;
Tesseract is an Open Source OCR engine, available under the [http://www.apache.org/licenses/LICENSE-2.0 Apache 2.0 license]. It can be used directly, or (for programmers) using an [http://code.google.com/p/tesseract-ocr/source/browse/trunk/api/baseapi.h API].&lt;br /&gt;
&lt;br /&gt;
==History==&lt;br /&gt;
It was initially developed at HP during a 10 year period from 1984 to 1994. After a decade of minimal development it was released in 2005 for open source. Google acquired Tesseract in 2006 and currently maintains its development.&lt;br /&gt;
&lt;br /&gt;
==Platform and interoperability==&lt;br /&gt;
*The latest downloads for Linux and Windows are found on [https://drive.google.com/folderview?id=0B7l10Bj_LprhQnpSRkpGMGV2eE0&amp;amp;usp=sharing GoogleDrive]. Older versions of Tesseract and its language packs are found on the discontinued [https://code.google.com/p/tesseract-ocr/downloads/list Google Code download page].&lt;br /&gt;
*The easiest way to install Tesseract on Mac OSX is with [https://www.macports.org/ MacPorts]. Once it is installed, you can install Tesseract by running the command ''sudo port install tesseract'', and any language with ''sudo port install tesseract-&amp;lt;langcode&amp;gt;''. List of available langcodes can be found onMacPorts tesseract page. A list of available langcodes can be found on the [https://www.macports.org/ports.php?by=name&amp;amp;substr=tesseract- MacPorts Tesseract page].&lt;br /&gt;
*Dependencies for running Tesseract on the Linux system include Autotools and [http://www.leptonica.org/ Leptonica] . The Windows version requires installation of [http://msdn.microsoft.com/en-us/vstudio/aa718325.aspx Visual Studio]. More information about required Ubantu libraries and links to specific requirements are on the [https://code.google.com/p/tesseract-ocr/wiki/Compiling Tesseract Wiki].&lt;br /&gt;
*Other programs such as Scan Tailor, unpaper, ImageJ, Gimp or ImageMagick may be needed to properly prepare images for use in Tesseract.&lt;br /&gt;
&lt;br /&gt;
==Functional notes==&lt;br /&gt;
===Input supported===&lt;br /&gt;
Any image readable by Leptonica is supported in Tesseract including BMP, PNM, PNG, JFIF, JPEG, TIFF. &lt;br /&gt;
GIF is not supported [http://www.leptonica.com/library-overview.html http://www.leptonica.com/library-overview.html].  &lt;br /&gt;
===Output generated===&lt;br /&gt;
Tesseract outputs to TXT. PDF output was added in version 3.03.&lt;br /&gt;
&lt;br /&gt;
==Documentation and support==&lt;br /&gt;
*Smith, Ray (2007). An Overview of the Tesseract OCR Engine [http://static.googleusercontent.com/media/research.google.com/en/us/pubs/archive/33418.pdf http://static.googleusercontent.com/media/research.google.com/en/us/pubs/archive/33418.pdf ]&lt;br /&gt;
*Installation information is found on the [https://code.google.com/p/tesseract-ocr/wiki/ReadMe ReadMe] page of the project site.&lt;br /&gt;
*Support is offered and issues are addressed on the [https://code.google.com/p/tesseract-ocr/issues/list Issues] page of project site.&lt;br /&gt;
&lt;br /&gt;
==Usability==&lt;br /&gt;
*Tesseract was primarily developed for English OCR capability, but 47 language packs have been developed for use with other languages [https://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseract.1.html https://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseract.1.html]. Tesseract 2.0x and 3.0x are [https://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3 trainable] for other languages.&lt;br /&gt;
*There is no built-in GUI, but there are several available from the [https://code.google.com/p/tesseract-ocr/wiki/3rdParty 3rdParty] page.&lt;br /&gt;
&lt;br /&gt;
= User Experiences =&lt;br /&gt;
Applied in an AQuA Mashup that resulted in the Solution page: [http://wiki.opf-labs.org/display/AQuA/Compare+OCR+results+of+the+same+source+material+in+different+formats+%28TIFF%2C+JP2%29 Compare OCR results of the same source material in different formats (TIFF, JP2)]&lt;br /&gt;
&lt;br /&gt;
= Development Activity =&lt;br /&gt;
&lt;br /&gt;
{{Infobox_tool_details&lt;br /&gt;
|ohloh_id=tesseract-ocr&lt;br /&gt;
}}&lt;/div&gt;</summary>
		<author><name>Lreilly</name></author>
	</entry>
	<entry>
		<id>https://coptr.digipres.org/index.php?title=Tesseract-ocr&amp;diff=1990</id>
		<title>Tesseract-ocr</title>
		<link rel="alternate" type="text/html" href="https://coptr.digipres.org/index.php?title=Tesseract-ocr&amp;diff=1990"/>
		<updated>2014-10-04T12:45:49Z</updated>

		<summary type="html">&lt;p&gt;Lreilly: /* History */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Infobox_tool&lt;br /&gt;
|purpose=Open source OCR engine, accepting uncompressed TIFF files as input&lt;br /&gt;
|image=Tesseract.png&lt;br /&gt;
|homepage=http://code.google.com/p/tesseract-ocr/&lt;br /&gt;
|license=Apache 2.0 License EXCEPT the tesseractTrainer.py, which is licensed with GPL&lt;br /&gt;
|platforms=LInux, Windows, MacOSX&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- Delete the Categories that do not apply --&amp;gt;&lt;br /&gt;
[[Category:OCR]]&lt;br /&gt;
[[Category:Quality Assurance]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Description =&lt;br /&gt;
Tesseract is probably the most accurate open source OCR engine available. Combined with the [http://leptonica.com/ Leptonica Image Processing Library] it can read a wide variety of image formats and convert them to text in over 60 languages. It was one of the top 3 engines in the 1995 UNLV Accuracy test. Between 1995 and 2006 it had little work done on it, but since then it has been improved extensively by Google.&lt;br /&gt;
&lt;br /&gt;
== Provider ==&lt;br /&gt;
Development of Tesseract is sponsored by Google. Its chief developer is [http://research.google.com/pubs/author4479.html Ray Smith].&lt;br /&gt;
&lt;br /&gt;
==Licensing and cost==&lt;br /&gt;
Tesseract is an Open Source OCR engine, available under the [http://www.apache.org/licenses/LICENSE-2.0 Apache 2.0 license]. It can be used directly, or (for programmers) using an [http://code.google.com/p/tesseract-ocr/source/browse/trunk/api/baseapi.h API].&lt;br /&gt;
&lt;br /&gt;
==History==&lt;br /&gt;
It was initially developed at HP during a 10 year period from 1984 to 1994. After a decade of minimal development it was released in 2005 for open source. Google acquired Tesseract in 2006 and currently maintains its development.&lt;br /&gt;
&lt;br /&gt;
==Platform and interoperability==&lt;br /&gt;
*The latest downloads for Linux and Windows may be found on [https://drive.google.com/folderview?id=0B7l10Bj_LprhQnpSRkpGMGV2eE0&amp;amp;usp=sharing GoogleDrive]. Older versions of Tesseract and its language packs may be found on the discontinued [https://code.google.com/p/tesseract-ocr/downloads/list Google Code download page].&lt;br /&gt;
*The easiest way to install Tesseract on Mac OSX is with [https://www.macports.org/ MacPorts]. Once it is installed, you can install Tesseract by running the command ''sudo port install tesseract'', and any language with ''sudo port install tesseract-&amp;lt;langcode&amp;gt;''. List of available langcodes can be found onMacPorts tesseract page. A list of available langcodes can be found on the [https://www.macports.org/ports.php?by=name&amp;amp;substr=tesseract- MacPorts Tesseract page].&lt;br /&gt;
*Dependencies for running Tesseract on the Linux system include Autotools and [http://www.leptonica.org/ Leptonica] . The Windows version requires installation of [http://msdn.microsoft.com/en-us/vstudio/aa718325.aspx Visual Studio]. More information about required Ubantu libraries and links to specific requirements are on the [https://code.google.com/p/tesseract-ocr/wiki/Compiling Tesseract Wiki].&lt;br /&gt;
*Other programs such as Scan Tailor, unpaper, ImageJ, Gimp or ImageMagick may be needed to properly prepare images for use in Tesseract.&lt;br /&gt;
&lt;br /&gt;
==Functional notes==&lt;br /&gt;
===Input supported===&lt;br /&gt;
Any image readable by Leptonica is supported in Tesseract including BMP, PNM, PNG, JFIF, JPEG, TIFF. &lt;br /&gt;
GIF is not supported [http://www.leptonica.com/library-overview.html http://www.leptonica.com/library-overview.html].  &lt;br /&gt;
===Output generated===&lt;br /&gt;
Tesseract outputs to TXT. PDF output was added in version 3.03.&lt;br /&gt;
&lt;br /&gt;
==Documentation and support==&lt;br /&gt;
*Smith, Ray (2007). An Overview of the Tesseract OCR Engine [http://static.googleusercontent.com/media/research.google.com/en/us/pubs/archive/33418.pdf http://static.googleusercontent.com/media/research.google.com/en/us/pubs/archive/33418.pdf ]&lt;br /&gt;
*Installation information is found on the [https://code.google.com/p/tesseract-ocr/wiki/ReadMe ReadMe] page of the project site.&lt;br /&gt;
*Support is offered and issues are addressed on the [https://code.google.com/p/tesseract-ocr/issues/list Issues] page of project site.&lt;br /&gt;
&lt;br /&gt;
==Usability==&lt;br /&gt;
*Tesseract was primarily developed for English OCR capability, but 47 language packs have been developed for use with other languages [https://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseract.1.html https://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseract.1.html]. Tesseract 2.0x and 3.0x are [https://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3 trainable] for other languages.&lt;br /&gt;
*There is no built-in GUI, but there are several available from the [https://code.google.com/p/tesseract-ocr/wiki/3rdParty 3rdParty] page.&lt;br /&gt;
&lt;br /&gt;
= User Experiences =&lt;br /&gt;
Applied in an AQuA Mashup that resulted in the Solution page: [http://wiki.opf-labs.org/display/AQuA/Compare+OCR+results+of+the+same+source+material+in+different+formats+%28TIFF%2C+JP2%29 Compare OCR results of the same source material in different formats (TIFF, JP2)]&lt;br /&gt;
&lt;br /&gt;
= Development Activity =&lt;br /&gt;
&lt;br /&gt;
{{Infobox_tool_details&lt;br /&gt;
|ohloh_id=tesseract-ocr&lt;br /&gt;
}}&lt;/div&gt;</summary>
		<author><name>Lreilly</name></author>
	</entry>
	<entry>
		<id>https://coptr.digipres.org/index.php?title=Tesseract-ocr&amp;diff=1989</id>
		<title>Tesseract-ocr</title>
		<link rel="alternate" type="text/html" href="https://coptr.digipres.org/index.php?title=Tesseract-ocr&amp;diff=1989"/>
		<updated>2014-10-04T12:35:51Z</updated>

		<summary type="html">&lt;p&gt;Lreilly: Major update to description.&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Infobox_tool&lt;br /&gt;
|purpose=Open source OCR engine, accepting uncompressed TIFF files as input&lt;br /&gt;
|image=Tesseract.png&lt;br /&gt;
|homepage=http://code.google.com/p/tesseract-ocr/&lt;br /&gt;
|license=Apache 2.0 License EXCEPT the tesseractTrainer.py, which is licensed with GPL&lt;br /&gt;
|platforms=LInux, Windows, MacOSX&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- Delete the Categories that do not apply --&amp;gt;&lt;br /&gt;
[[Category:OCR]]&lt;br /&gt;
[[Category:Quality Assurance]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Description =&lt;br /&gt;
Tesseract is probably the most accurate open source OCR engine available. Combined with the [http://leptonica.com/ Leptonica Image Processing Library] it can read a wide variety of image formats and convert them to text in over 60 languages. It was one of the top 3 engines in the 1995 UNLV Accuracy test. Between 1995 and 2006 it had little work done on it, but since then it has been improved extensively by Google.&lt;br /&gt;
&lt;br /&gt;
== Provider ==&lt;br /&gt;
Development of Tesseract is sponsored by Google. Its chief developer is [http://research.google.com/pubs/author4479.html Ray Smith].&lt;br /&gt;
&lt;br /&gt;
==Licensing and cost==&lt;br /&gt;
Tesseract is an Open Source OCR engine, available under the [http://www.apache.org/licenses/LICENSE-2.0 Apache 2.0 license]. It can be used directly, or (for programmers) using an [http://code.google.com/p/tesseract-ocr/source/browse/trunk/api/baseapi.h API].&lt;br /&gt;
&lt;br /&gt;
==History==&lt;br /&gt;
It was initially developed at HP during a 10 year period from 1984 to 1994. After years of testing it was released in 2005 for open source. Google acquired Tesseract in 2006 and currently maintains its development. &lt;br /&gt;
&lt;br /&gt;
==Platform and interoperability==&lt;br /&gt;
*The latest downloads for Linux and Windows may be found on [https://drive.google.com/folderview?id=0B7l10Bj_LprhQnpSRkpGMGV2eE0&amp;amp;usp=sharing GoogleDrive]. Older versions of Tesseract and its language packs may be found on the discontinued [https://code.google.com/p/tesseract-ocr/downloads/list Google Code download page].&lt;br /&gt;
*The easiest way to install Tesseract on Mac OSX is with [https://www.macports.org/ MacPorts]. Once it is installed, you can install Tesseract by running the command ''sudo port install tesseract'', and any language with ''sudo port install tesseract-&amp;lt;langcode&amp;gt;''. List of available langcodes can be found onMacPorts tesseract page. A list of available langcodes can be found on the [https://www.macports.org/ports.php?by=name&amp;amp;substr=tesseract- MacPorts Tesseract page].&lt;br /&gt;
*Dependencies for running Tesseract on the Linux system include Autotools and [http://www.leptonica.org/ Leptonica] . The Windows version requires installation of [http://msdn.microsoft.com/en-us/vstudio/aa718325.aspx Visual Studio]. More information about required Ubantu libraries and links to specific requirements are on the [https://code.google.com/p/tesseract-ocr/wiki/Compiling Tesseract Wiki].&lt;br /&gt;
*Other programs such as Scan Tailor, unpaper, ImageJ, Gimp or ImageMagick may be needed to properly prepare images for use in Tesseract.&lt;br /&gt;
&lt;br /&gt;
==Functional notes==&lt;br /&gt;
===Input supported===&lt;br /&gt;
Any image readable by Leptonica is supported in Tesseract including BMP, PNM, PNG, JFIF, JPEG, TIFF. &lt;br /&gt;
GIF is not supported [http://www.leptonica.com/library-overview.html http://www.leptonica.com/library-overview.html].  &lt;br /&gt;
===Output generated===&lt;br /&gt;
Tesseract outputs to TXT. PDF output was added in version 3.03.&lt;br /&gt;
&lt;br /&gt;
==Documentation and support==&lt;br /&gt;
*Smith, Ray (2007). An Overview of the Tesseract OCR Engine [http://static.googleusercontent.com/media/research.google.com/en/us/pubs/archive/33418.pdf http://static.googleusercontent.com/media/research.google.com/en/us/pubs/archive/33418.pdf ]&lt;br /&gt;
*Installation information is found on the [https://code.google.com/p/tesseract-ocr/wiki/ReadMe ReadMe] page of the project site.&lt;br /&gt;
*Support is offered and issues are addressed on the [https://code.google.com/p/tesseract-ocr/issues/list Issues] page of project site.&lt;br /&gt;
&lt;br /&gt;
==Usability==&lt;br /&gt;
*Tesseract was primarily developed for English OCR capability, but 47 language packs have been developed for use with other languages [https://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseract.1.html https://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseract.1.html]. Tesseract 2.0x and 3.0x are [https://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3 trainable] for other languages.&lt;br /&gt;
*There is no built-in GUI, but there are several available from the [https://code.google.com/p/tesseract-ocr/wiki/3rdParty 3rdParty] page.&lt;br /&gt;
&lt;br /&gt;
= User Experiences =&lt;br /&gt;
Applied in an AQuA Mashup that resulted in the Solution page: [http://wiki.opf-labs.org/display/AQuA/Compare+OCR+results+of+the+same+source+material+in+different+formats+%28TIFF%2C+JP2%29 Compare OCR results of the same source material in different formats (TIFF, JP2)]&lt;br /&gt;
&lt;br /&gt;
= Development Activity =&lt;br /&gt;
&lt;br /&gt;
{{Infobox_tool_details&lt;br /&gt;
|ohloh_id=tesseract-ocr&lt;br /&gt;
}}&lt;/div&gt;</summary>
		<author><name>Lreilly</name></author>
	</entry>
	<entry>
		<id>https://coptr.digipres.org/index.php?title=Tesseract-ocr&amp;diff=1988</id>
		<title>Tesseract-ocr</title>
		<link rel="alternate" type="text/html" href="https://coptr.digipres.org/index.php?title=Tesseract-ocr&amp;diff=1988"/>
		<updated>2014-10-04T11:57:05Z</updated>

		<summary type="html">&lt;p&gt;Lreilly: updated infobox&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Infobox_tool&lt;br /&gt;
|purpose=Open source OCR engine, accepting uncompressed TIFF files as input&lt;br /&gt;
|image=Tesseract.png&lt;br /&gt;
|homepage=http://code.google.com/p/tesseract-ocr/&lt;br /&gt;
|license=Apache 2.0 License EXCEPT the tesseractTrainer.py, which is licensed with GPL&lt;br /&gt;
|platforms=LInux, Windows, MacOSX&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- Delete the Categories that do not apply --&amp;gt;&lt;br /&gt;
[[Category:OCR]]&lt;br /&gt;
[[Category:Quality Assurance]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Description =&lt;br /&gt;
Tesseract is probably the most accurate open source OCR engine available. Combined with the [http://leptonica.com/ Leptonica Image Processing Library] it can read a wide variety of image formats and convert them to text in over 60 languages. It was one of the top 3 engines in the 1995 UNLV Accuracy test. Between 1995 and 2006 it had little work done on it, but since then it has been improved extensively by Google.&lt;br /&gt;
&lt;br /&gt;
= User Experiences =&lt;br /&gt;
Applied in an AQuA Mashup that resulted in the Solution page: [http://wiki.opf-labs.org/display/AQuA/Compare+OCR+results+of+the+same+source+material+in+different+formats+%28TIFF%2C+JP2%29 Compare OCR results of the same source material in different formats (TIFF, JP2)]&lt;br /&gt;
&lt;br /&gt;
= Development Activity =&lt;br /&gt;
&lt;br /&gt;
{{Infobox_tool_details&lt;br /&gt;
|ohloh_id=tesseract-ocr&lt;br /&gt;
}}&lt;/div&gt;</summary>
		<author><name>Lreilly</name></author>
	</entry>
	<entry>
		<id>https://coptr.digipres.org/index.php?title=Tesseract-ocr&amp;diff=1987</id>
		<title>Tesseract-ocr</title>
		<link rel="alternate" type="text/html" href="https://coptr.digipres.org/index.php?title=Tesseract-ocr&amp;diff=1987"/>
		<updated>2014-10-03T19:25:26Z</updated>

		<summary type="html">&lt;p&gt;Lreilly: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Infobox_tool&lt;br /&gt;
|purpose=Open source OCR engine, accepting uncompressed TIFF files as input&lt;br /&gt;
|image=&lt;br /&gt;
|homepage=http://code.google.com/p/tesseract-ocr/&lt;br /&gt;
|license=Apache 2.0 License EXCEPT the tesseractTrainer.py, which is licensed with GPL&lt;br /&gt;
|platforms=&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- Delete the Categories that do not apply --&amp;gt;&lt;br /&gt;
[[Category:OCR]]&lt;br /&gt;
[[Category:Quality Assurance]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Description =&lt;br /&gt;
Tesseract is probably the most accurate open source OCR engine available. Combined with the [http://leptonica.com/ Leptonica Image Processing Library] it can read a wide variety of image formats and convert them to text in over 60 languages. It was one of the top 3 engines in the 1995 UNLV Accuracy test. Between 1995 and 2006 it had little work done on it, but since then it has been improved extensively by Google.&lt;br /&gt;
&lt;br /&gt;
= User Experiences =&lt;br /&gt;
Applied in an AQuA Mashup that resulted in the Solution page: [http://wiki.opf-labs.org/display/AQuA/Compare+OCR+results+of+the+same+source+material+in+different+formats+%28TIFF%2C+JP2%29 Compare OCR results of the same source material in different formats (TIFF, JP2)]&lt;br /&gt;
&lt;br /&gt;
= Development Activity =&lt;br /&gt;
&lt;br /&gt;
{{Infobox_tool_details&lt;br /&gt;
|ohloh_id=tesseract-ocr&lt;br /&gt;
}}&lt;/div&gt;</summary>
		<author><name>Lreilly</name></author>
	</entry>
	<entry>
		<id>https://coptr.digipres.org/index.php?title=Tesseract-ocr&amp;diff=1986</id>
		<title>Tesseract-ocr</title>
		<link rel="alternate" type="text/html" href="https://coptr.digipres.org/index.php?title=Tesseract-ocr&amp;diff=1986"/>
		<updated>2014-10-03T19:24:43Z</updated>

		<summary type="html">&lt;p&gt;Lreilly: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Infobox_tool&lt;br /&gt;
|purpose=Open source OCR engine, accepting uncompressed TIFF files as input&lt;br /&gt;
|image=[[[File:Tesseract.png]]]&lt;br /&gt;
|homepage=http://code.google.com/p/tesseract-ocr/&lt;br /&gt;
|license=Apache 2.0 License EXCEPT the tesseractTrainer.py, which is licensed with GPL&lt;br /&gt;
|platforms=&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- Delete the Categories that do not apply --&amp;gt;&lt;br /&gt;
[[Category:OCR]]&lt;br /&gt;
[[Category:Quality Assurance]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Description =&lt;br /&gt;
Tesseract is probably the most accurate open source OCR engine available. Combined with the [http://leptonica.com/ Leptonica Image Processing Library] it can read a wide variety of image formats and convert them to text in over 60 languages. It was one of the top 3 engines in the 1995 UNLV Accuracy test. Between 1995 and 2006 it had little work done on it, but since then it has been improved extensively by Google.&lt;br /&gt;
&lt;br /&gt;
= User Experiences =&lt;br /&gt;
Applied in an AQuA Mashup that resulted in the Solution page: [http://wiki.opf-labs.org/display/AQuA/Compare+OCR+results+of+the+same+source+material+in+different+formats+%28TIFF%2C+JP2%29 Compare OCR results of the same source material in different formats (TIFF, JP2)]&lt;br /&gt;
&lt;br /&gt;
= Development Activity =&lt;br /&gt;
&lt;br /&gt;
{{Infobox_tool_details&lt;br /&gt;
|ohloh_id=tesseract-ocr&lt;br /&gt;
}}&lt;/div&gt;</summary>
		<author><name>Lreilly</name></author>
	</entry>
	<entry>
		<id>https://coptr.digipres.org/index.php?title=File:Tesseract.png&amp;diff=1985</id>
		<title>File:Tesseract.png</title>
		<link rel="alternate" type="text/html" href="https://coptr.digipres.org/index.php?title=File:Tesseract.png&amp;diff=1985"/>
		<updated>2014-10-03T19:17:46Z</updated>

		<summary type="html">&lt;p&gt;Lreilly: Lreilly uploaded a new version of &amp;amp;quot;File:Tesseract.png&amp;amp;quot;&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Title image including a tesserarct image found in the creative commons at http://commons.wikimedia.org/wiki/File:Tesseract_frame.png.&lt;/div&gt;</summary>
		<author><name>Lreilly</name></author>
	</entry>
	<entry>
		<id>https://coptr.digipres.org/index.php?title=File:Tesseract.png&amp;diff=1984</id>
		<title>File:Tesseract.png</title>
		<link rel="alternate" type="text/html" href="https://coptr.digipres.org/index.php?title=File:Tesseract.png&amp;diff=1984"/>
		<updated>2014-10-03T19:05:40Z</updated>

		<summary type="html">&lt;p&gt;Lreilly: Title image including a tesserarct image found in the creative commons at http://commons.wikimedia.org/wiki/File:Tesseract_frame.png.&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Title image including a tesserarct image found in the creative commons at http://commons.wikimedia.org/wiki/File:Tesseract_frame.png.&lt;/div&gt;</summary>
		<author><name>Lreilly</name></author>
	</entry>
	<entry>
		<id>https://coptr.digipres.org/index.php?title=Tesseract-ocr&amp;diff=1983</id>
		<title>Tesseract-ocr</title>
		<link rel="alternate" type="text/html" href="https://coptr.digipres.org/index.php?title=Tesseract-ocr&amp;diff=1983"/>
		<updated>2014-10-03T18:47:39Z</updated>

		<summary type="html">&lt;p&gt;Lreilly: /* Description */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Infobox_tool&lt;br /&gt;
|purpose=Open source OCR engine, accepting uncompressed TIFF files as input&lt;br /&gt;
|image=&lt;br /&gt;
|homepage=http://code.google.com/p/tesseract-ocr/&lt;br /&gt;
|license=Apache 2.0 License EXCEPT the tesseractTrainer.py, which is licensed with GPL&lt;br /&gt;
|platforms=&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- Delete the Categories that do not apply --&amp;gt;&lt;br /&gt;
[[Category:OCR]]&lt;br /&gt;
[[Category:Quality Assurance]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Description =&lt;br /&gt;
Tesseract is probably the most accurate open source OCR engine available. Combined with the [http://leptonica.com/ Leptonica Image Processing Library] it can read a wide variety of image formats and convert them to text in over 60 languages. It was one of the top 3 engines in the 1995 UNLV Accuracy test. Between 1995 and 2006 it had little work done on it, but since then it has been improved extensively by Google.&lt;br /&gt;
&lt;br /&gt;
= User Experiences =&lt;br /&gt;
Applied in an AQuA Mashup that resulted in the Solution page: [http://wiki.opf-labs.org/display/AQuA/Compare+OCR+results+of+the+same+source+material+in+different+formats+%28TIFF%2C+JP2%29 Compare OCR results of the same source material in different formats (TIFF, JP2)]&lt;br /&gt;
&lt;br /&gt;
= Development Activity =&lt;br /&gt;
&lt;br /&gt;
{{Infobox_tool_details&lt;br /&gt;
|ohloh_id=tesseract-ocr&lt;br /&gt;
}}&lt;/div&gt;</summary>
		<author><name>Lreilly</name></author>
	</entry>
</feed>