Jump to navigation Jump to search
Content definition: Tools that support the preservation of text based data formats such as MS Word or PDF.

Tools for this content type

3-Heights(TM) PDF Validator 3-Heights(TM) PDF Validator from PDF-Tools AG. X
ADIGRES ADIGRES is a powerful cross-platform Document Management System written in Java. XXX
Antiword Antiword is a free MS Word reader for Linux and RISC OS. X
Apache PDFBox JAVA PDF library for creation, manipulation, validation and content extraction of PDF documents XXX
Apache POI - the Java API for Microsoft Documents The Apache POI Project's mission is to create and maintain Java APIs for manipulating various file formats based upon the Office Open XML standards (OOXML) and Microsoft's OLE 2 Compound Document format (OLE2). XXX
CSV Validator Validation of CSV files against user-defined schema X
Calibre An e-book management tool, including viewer, migration, and file conversion features among others. X
Catdoc & xls2csv catdoc is a program that reads one or more Microsoft Word files and outputs text to standard output. X
Converseen A GUI for ImageMagick supporting mass operations X
DFG Viewer Browser-based viewer for digital objects X
Dependency Discovery Tool The Dependency Discovery Tool searches through binary office files (.doc, .xls and .ppt) and tries to find any documents or files that are linked to the document. X
EpubCheck Validator for EPUB files XX
Exempi Exempi is a library for handling XMP metadata, based on the Adobe XMP SDK XX
Flint Validates a file against a policy, using common validation tools X
GImageReader A customisable GUI for Tesseract X
IText PDF library for manipulation, content extraction and creation XX
KOST-Val KOST-Val is an open source validator for different file formats (TIFF, SIARD, PDF/A, JP2, JPEG) and Submission Information Package (SIP). XX
Kraken Open Source turn-key OCR system forked from ocropus X
Library of Congress Newspaper Viewer The Library of Congress Newspaper Viewer is a web application used to ingest and view digitized newspaper pages meeting the National Digital Newspaper Program specification. X
Libreoffice An office suite with command line options for PDF/A conversions X
Libsafe libsafe allows the organizations to create a full OAIS compliant Archive, including active and passive digital preservation workflows and is particularly suited for master image files of digitizing processes. X
Limb Processing Software for processing, enhancing and converting cultural heritage into digital cultural heritage XX
LuraDocument PDF Compressor LuraDocument PDF Compressor is a document conversion engine. X
METS Navigator METS-based system for displaying and navigating sets of page images or other multi-part digital objects. XX
MPP Viewer MPP Viewer is a viewer for Microsoft Project files XX
Metadata Interrogator The Metadata Interrogator is a standalone, offline GUI tool for extracting and analysing metadata from a wide variety of file formats. XX
Metadata++ Freeware tool to view, edit, modify, extract, copy metadata of various formats. XX
Nitro Pro A PDF handling tool including PDF/A X
ODF Validator ODF Validator is a tool that validates OpenDocument files and checks them for certain conformance criteria. XX is a python script that parses the format of OLE compound documents used by Microsoft Office applications. XX
PDF Tools (by Didier Stevens) Tools for parsing and analysing PDF documents XX
PDFTron PDF-A Manager PDF/A Manager is a PDF/A (ISO 19005) validation and conversion software. XX
PDFsam PDFsam splits and merges PDF files XXX
Pandoc A universal converter that converts files from one markup format into another X
PdfaPilot pdfaPilot: Conversion of documents and emails into robust, searchable PDF or PDF/A files XXX
Pdfcpu A Go library and command line tool for PDF processing incl. validation XX
Pdftk PDF manipulation tool XXX
Peepdf peepdf is a Python tool to explore PDF files in order to find out if the file can be harmful or not. XX
Python XMP Toolkit Library for working with XMP metadata, as well as reading/writing XMP metadata stored in many different file formats XX
Qpdf QPDF is a command-line program that does structural, content-preserving transformations on PDF files XXX
Rescarta The ResCarta Tools software empowers users to create non-proprietary digital objects with LOC standard METS, MODS, MIX and AudioMD metadata from existing TIFF, JPEG, PDF and WAV data through user-friendly interfaces. XX
Tabula Extract tabular data from PDF files X
VeraPDF PDF/A validation tool X
Veridian Online search, discovery, and display of digitized newspaper collections X
WordHoard WordHoard is an application for the close reading and scholarly analysis of deeply tagged texts. XXX
Xpdf Open source PDF viewer that includes PDF information extractor and font analyzer XXX
Yara Pattern matching tool XX