ALTO (Analyzed Layout and Text Object)
ALTO (Analyzed Layout and Text Object) is a XML Schema that details technical metadata for describing the layout and content of physical text resources, such as pages of a book or a newspaper. It most commonly serves as an extension schema used with METS. However, ALTO instances can also exist as a standalone document used independently of METS.
Each ALTO file contains a style section where different styles (for paragraphs and fonts) are listed. The layout section contains what’s on the page. A page is divided into several regions (Print space, left margin, right margin, top margin and bottom margin). For each region all objects are listed which have been detected inside.
More information at the official ALTO website and the official ALTO Github website.
Tools that have this format as input[edit]
Tool | Purpose |
---|---|
BnL Mets Exporter | Command Line Interface (CLI) to export METS/ALTO documents to other formats. |
Bnlviewer | METS / ALTO viewer written in Java and Javascript |
Namalysator | Tool for METS/ALTO validation and quality control |
Veridian | Online search, discovery, and display of digitized newspaper collections |
Tools that have this format as output[edit]
Tool | Purpose |
---|---|
Docworks | Document digitization workflow software |
Goobi | Workflow Management Tool |
Kraken | Open Source turn-key OCR system forked from ocropus |
Limb Processing | Software for processing, enhancing and converting cultural heritage into digital cultural heritage |
Namalysator | Tool for METS/ALTO validation and quality control |
NumaHOP | Platform for digitization projects management |
Tesseract-ocr | Open source OCR engine, accepting uncompressed TIFF files as input |