Metadata Extraction
Jump to navigation
Jump to search
Tools for this function
Tool | Purpose |
---|---|
ALTAG3D | An open source archive software |
Aaru Data Preservation Suite | Media dump software and disc image manager |
Adobe Photoshop Elements | A commercial image editor with a metadata module (Organizer). |
Apache PDFBox | JAVA PDF library for creation, manipulation, validation and content extraction of PDF documents |
Apache POI - the Java API for Microsoft Documents | The Apache POI Project's mission is to create and maintain Java APIs for manipulating various file formats based upon the Office Open XML standards (OOXML) and Microsoft's OLE 2 Compound Document format (OLE2). |
Apache Tika | Java based tool for identifying file formats using signatures and extracting metadata and text content from documents. |
BWF MetaEdit | BWF MetaEdit permits embedding, validating, and exporting of metadata in Broadcast WAVE Format (BWF) files. |
BitCurator | The BitCurator Environment is an Ubuntu Linux distribution geared to the needs of archivists and librarians. It includes a suite of open source digital forensics and data analysis tools to help collecting institutions process born-digital materials. |
Brunnhilde | Siegfried-based characterization of directories and disk images |
C3PO | C3PO is a content profiling tool for visualization and preservation analysis |
CloudCompare | CloudCompare is a tool for editing and processing 3D point clouds and triangular meshes. |
CyberChef | A forensic tool with workflow capabilities to analyse files and containers |
DIMAG | A software suite supporting archives with preservation of digital information for eternity |
DIMAG IngestList | Accompanies ingest process from donor to archive, logs process steps. |
DROID (Digital Record Object Identification) | DROID (Digital Record Object Identification) is a software tool developed to perform automated batch identification of file formats. |
DUMPBIN Utility | The DUMPBIN utility, which is provided with the 32-bit version of Microsoft Visual C++, combines the abilities of the LINK, LIB, and EXEHDR utilities. |
Demystify | Format Identification Analysis and Reporting |
DiPS (Digital Preservation Solution) | DiPS (OAIS compliant Digital Preservation Solution) |
Directory List & Print | A universal metadata extractor |
Disktype | Tool for detecting the content format of a disk or disk image. It knows about common file systems, partition tables, and boot codes. |
Duke Data Accessioner | Data Accessioner provides a graphical user interface to aid in migrating data from physical media to a dedicated file server, documenting the process and using MD5 checksums to identify any errors introduced in transfer. |
EMET (Embedded Metadata Extraction Tool) | EMET is a stand-alone tool designed to extract metadata embedded in JPEG and TIFF files. |
EPADD | ePADD is a software package developed by Stanford University's Special Collections & University Archives that supports archival processes around the appraisal, ingest, processing, discovery, and delivery of email archives. |
EXE Explorer | EXE Explorer reads and displays executable file properties and structure. |
EXIF to DC XML normaliser | Extract EXIF data and normalise it to DC XML. |
Easy CD-DA Extractor | Easy CD-DA Extractor is CD Ripper, Music Converter, Audio Converter, Metadata Editor, and CD/DVD burning software. |
EpubCheck | Validator for EPUB files |
Exact Audio Copy | Exact Audio Copy is an audio grabber for audio CDs using standard CD and DVD-ROM drives on Windows only. |
Exempi | Exempi is a library for handling XMP metadata, based on the Adobe XMP SDK |
ExifTool | Properties extraction, identification, metadata editing |
FFAStrans | Task automation engine, mostly used in audio and video visual content management. |
FIDO (Format Identification for Digital Objects) | A PRONOM based, command line, file format identification tool written in Python |
FITS (File Information Tool Set) | FITS allows data curators to identify, validate, and extract technical metadata for the objects in their digital repository. |
File Analyzer and Metadata Harvester V2 | The File Analyzer is a general purpose desktop (and command line) tool designed to automate simple, file-based operations. The File Analyzer assembles a toolkit of tasks a user can perform. The tasks that have been written into the File Analyzer code base have been optimized for use by libraries, archives, and other cultural heritage institutions. |
FileAlyzer | FileAlyzer allows a basic analysis of files (showing file properties and file contents in hex dump form) and is able to interpret common file contents like resources structures (like text, graphics, HTML, media and PE). |
FileTrove | FileTrove indexes files and creates metadata from them. The single binary application walks a directory tree and identifies all regular files by type with Siegfried. |
Filestar | Universal file converter for 900+ file types. |
Fq | Tool, language and decoders for working with binary data. |
GNU libextractor | GNU libextractor is a library used to extract meta data from files of arbitrary type. |
Geosetter | A tool that sets coordinates and edits all kind of embedded image metadata. |
GetID3() | Extracts technical and embedded descriptive metadata from common multimedia file formats. |
IText | PDF library for manipulation, content extraction and creation |
InBoxer | InBoxer is a next generation email archiving, IM archiving, e-discovery, and policy management system. |
Index.dat Analyzer v2.5 | Index.dat Analyzer is a tool to view, examine and delete contents of index.dat files. |
JHOVE (Harvard Object Validation Environment) | JHOVE provides functions to perform format-specific identification, validation, and characterization of digital objects. |
JHOVE2 | JHOVE2 allows data curators to characterise the digital objects in their repositories. |
JWAT | Java Web Archive Toolkit |
Jp2StructCheck | Simple JP2 file structure checker |
Jpylyzer | JP2 validation + properties extraction |
Keith Humphreys' PhraseRate | PhraseRate is a program, developed by Keith Humphreys, for extracting a set of meaningful, attractive keywords and key phrases from a web page describing the content of that page. |
Lingfo | Lingfo provides a library for developers to use to extract information from Microsoft Excel spreadsheet files. |
METS Reader Writer | Python library for processing and outputting METS/PREMIS XML according to the Archivematica METS profile. |
MP3::Tag | MP3::Tag is a module for reading tags of MP3 audio files. |
MailStore Home | Unifies your private emails into one searchable, platform-independent repository |
Mdqc | Tool for managing and comparing digital asset metadata |
MediaInfo | Supplies technical and tag information about a video or audio file. |
Metadata Extraction Tool | Metadata Extraction Tool automatically extracts a limited set of metadata from the headers of digital files. |
Metadata Interrogator | The Metadata Interrogator is a standalone, offline GUI tool for extracting and analysing metadata from a wide variety of file formats. |
Metadata transformer | A simple tool for creating new CSV and HTML reports based on the metadata files generated by the Data Accessioner |
Metadata++ | Freeware tool to view, edit, modify, extract, copy metadata of various formats. |
Metadata2Go | Web-based EXIF data viewer |
NARA File Analyzer and Metadata Harvester | NARA File Analyzer and Metadata Harvester allows a user to analyze the contents of a file system or external drive and generates statistics about the contents of the contained directories. |
NARA Video Frame Analyzer | NARA Video Frame Analyzer analyzes technical properties of individual frames of a video file in order to detect quality issues within digitized video files. |
Nanite | A friendly swarm of format-identifying robots |
ODF Validator | ODF Validator is a tool that validates OpenDocument files and checks them for certain conformance criteria. |
Officeparser.py | officerparser.py is a python script that parses the format of OLE compound documents used by Microsoft Office applications. |
OpenJPEG | The OpenJPEG library is an open-source JPEG 2000 codec written in C language. |
PDF Tools (by Didier Stevens) | Tools for parsing and analysing PDF documents |
PET (PERICLES Extraction Tool) | A tool to capture contextual information in a sheer curation scenario |
Pagelyzer | Suite of tools for detecting changes in web pages and their rendering |
PdfaPilot | pdfaPilot: Conversion of documents and emails into robust, searchable PDF or PDF/A files |
Pdfcpu | A Go library and command line tool for PDF processing incl. validation |
Pdftk | PDF manipulation tool |
Peepdf | peepdf is a Python tool to explore PDF files in order to find out if the file can be harmful or not. |
Pre-Ingest Tool | A tool for generating an OAIS SIP for digital preservation. It produces METS document that contains metadata for digital preservation. |
Premissh | Premissh is a simple prototype tool for automatically creating PREMIS XML from a file, using DROID, BASH and XSLT. |
Python XMP Toolkit | Library for working with XMP metadata, as well as reading/writing XMP metadata stored in many different file formats |
Qpdf | QPDF is a command-line program that does structural, content-preserving transformations on PDF files |
RATOM | Review, Appraisal, and Triage of Mail (RATOM) is software to assist archives and other collecting organizations with email analysis, selection, and appraisal tasks |
Sheeko | Machine learning implementation package to generate descriptive metadata for digitized historical images. |
Smithsonian Cook | Processing of 3D model, mesh, and texture data including the option to define custom processing workflows, where a set of files is processed by multiple tools. |
Warctools | Command line tools and libraries for handling and manipulating WARC files (and HTTP contents) |
Web Archive Discovery | Indexing and discovery tools for web archives. |
WordHoard | WordHoard is an application for the close reading and scholarly analysis of deeply tagged texts. |
Xpdf | Open source PDF viewer that includes PDF information extractor and font analyzer |