Difference between revisions of "ALTO (Analyzed Layout and Text Object)"
Prwheatley (talk | contribs) (Created page with "{{Infobox format |Wikidata ID=Q2819247 }}") |
Prwheatley (talk | contribs) |
||
Line 2: | Line 2: | ||
|Wikidata ID=Q2819247 | |Wikidata ID=Q2819247 | ||
}} | }} | ||
+ | ALTO (Analyzed Layout and Text Object) is a XML Schema that details technical metadata for describing the layout and content of physical text resources, such as pages of a book or a newspaper. It most commonly serves as an extension schema [https://www.loc.gov/standards/alto/techcenter/use-with-mets.html used with METS]. However, ALTO instances can also exist as a standalone document used independently of METS. | ||
+ | |||
+ | Each ALTO file contains a style section where different styles (for paragraphs and fonts) are listed. The layout section contains what’s on the page. A page is divided into several regions (Print space, left margin, right margin, top margin and bottom margin). For each region all objects are listed which have been detected inside. | ||
+ | |||
+ | More information at [https://www.loc.gov/standards/alto/ the official ALTO website] and [https://github.com/altoxml the official ALTO Github website]. |
Revision as of 15:02, 8 June 2021
Tools that have this format as input
Tool | Purpose |
---|---|
BnL Mets Exporter | Command Line Interface (CLI) to export METS/ALTO documents to other formats. |
Bnlviewer | METS / ALTO viewer written in Java and Javascript |
Namalysator | Tool for METS/ALTO validation and quality control |
Veridian | Online search, discovery, and display of digitized newspaper collections |
Tools that have this format as output
Tool | Purpose |
---|---|
Docworks | Document digitization workflow software |
Goobi | Workflow Management Tool |
Kraken | Open Source turn-key OCR system forked from ocropus |
Limb Processing | Software for processing, enhancing and converting cultural heritage into digital cultural heritage |
Namalysator | Tool for METS/ALTO validation and quality control |
NumaHOP | Platform for digitization projects management |
Tesseract-ocr | Open source OCR engine, accepting uncompressed TIFF files as input |
ALTO (Analyzed Layout and Text Object) is a XML Schema that details technical metadata for describing the layout and content of physical text resources, such as pages of a book or a newspaper. It most commonly serves as an extension schema used with METS. However, ALTO instances can also exist as a standalone document used independently of METS.
Each ALTO file contains a style section where different styles (for paragraphs and fonts) are listed. The layout section contains what’s on the page. A page is divided into several regions (Print space, left margin, right margin, top margin and bottom margin). For each region all objects are listed which have been detected inside.
More information at the official ALTO website and the official ALTO Github website.