Difference between revisions of "ALTO (Analyzed Layout and Text Object)"

From COPTR
Jump to navigation Jump to search
(Created page with "{{Infobox format |Wikidata ID=Q2819247 }}")
 
Line 2: Line 2:
 
|Wikidata ID=Q2819247
 
|Wikidata ID=Q2819247
 
}}
 
}}
 +
ALTO (Analyzed Layout and Text Object) is a XML Schema that details technical metadata for describing the layout and content of physical text resources, such as pages of a book or a newspaper. It most commonly serves as an extension schema [https://www.loc.gov/standards/alto/techcenter/use-with-mets.html used with METS]. However, ALTO instances can also exist as a standalone document used independently of METS.
 +
 +
Each ALTO file contains a style section where different styles (for paragraphs and fonts) are listed. The layout section contains what’s on the page. A page is divided into several regions (Print space, left margin, right margin, top margin and bottom margin). For each region all objects are listed which have been detected inside.
 +
 +
More information at [https://www.loc.gov/standards/alto/ the official ALTO website] and [https://github.com/altoxml the official ALTO Github website].

Revision as of 15:02, 8 June 2021


ALTO (Analyzed Layout and Text Object)
Wikidata:Q2819247

Tools that have this format as input

ToolPurpose
BnL Mets ExporterCommand Line Interface (CLI) to export METS/ALTO documents to other formats.
BnlviewerMETS / ALTO viewer written in Java and Javascript
NamalysatorTool for METS/ALTO validation and quality control
VeridianOnline search, discovery, and display of digitized newspaper collections

Tools that have this format as output

ToolPurpose
DocworksDocument digitization workflow software
GoobiWorkflow Management Tool
KrakenOpen Source turn-key OCR system forked from ocropus
Limb ProcessingSoftware for processing, enhancing and converting cultural heritage into digital cultural heritage
NamalysatorTool for METS/ALTO validation and quality control
NumaHOPPlatform for digitization projects management
Tesseract-ocrOpen source OCR engine, accepting uncompressed TIFF files as input

ALTO (Analyzed Layout and Text Object) is a XML Schema that details technical metadata for describing the layout and content of physical text resources, such as pages of a book or a newspaper. It most commonly serves as an extension schema used with METS. However, ALTO instances can also exist as a standalone document used independently of METS.

Each ALTO file contains a style section where different styles (for paragraphs and fonts) are listed. The layout section contains what’s on the page. A page is divided into several regions (Print space, left margin, right margin, top margin and bottom margin). For each region all objects are listed which have been detected inside.

More information at the official ALTO website and the official ALTO Github website.