File Format Identification
Jump to navigation
Jump to search
Tools for this function
Tool | Purpose |
---|---|
Apache Tika | Java based tool for identifying file formats using signatures and extracting metadata and text content from documents. |
BorgFormat | A web application and service that combines multiple tools for format identification and validation. |
Cloc | Cloc (Count Lines of Code) serves not only to count the lines of Code,but also guesses the programming language, thus can be used to identify files. It is a command line tool which is easy to use. |
Crazy-fast-image-scan | A script to scan media very quickly to find out what kind of content it contains |
DROID (Digital Record Object Identification) | DROID (Digital Record Object Identification) is a software tool developed to perform automated batch identification of file formats. |
DUMPBIN Utility | The DUMPBIN utility, which is provided with the 32-bit version of Microsoft Visual C++, combines the abilities of the LINK, LIB, and EXEHDR utilities. |
DiPS (Digital Preservation Solution) | DiPS (OAIS compliant Digital Preservation Solution) |
DiskFormatID | Identify floppy disk formats from kryoflux stream files |
Duke Data Accessioner | Data Accessioner provides a graphical user interface to aid in migrating data from physical media to a dedicated file server, documenting the process and using MD5 checksums to identify any errors introduced in transfer. |
FFAStrans | Task automation engine, mostly used in audio and video visual content management. |
FIDO (Format Identification for Digital Objects) | A PRONOM based, command line, file format identification tool written in Python |
FITS (File Information Tool Set) | FITS allows data curators to identify, validate, and extract technical metadata for the objects in their digital repository. |
File Format Identification Pronom | Perl API to analyze and handle droid (PRONOM) signatures |
Filestar | Universal file converter for 900+ file types. |
Fine Free File Command | This is the home page for the open source implementation of the file(1) command that ships with every free operating system (OpenBSD, Linux, NetBSD, FreeBSD, etc. |
Fq | Tool, language and decoders for working with binary data. |
Gvfs-info | gvfs-info - print information about files and directories |
JHOVE (Harvard Object Validation Environment) | JHOVE provides functions to perform format-specific identification, validation, and characterization of digital objects. |
JHOVE2 | JHOVE2 allows data curators to characterise the digital objects in their repositories. |
Libmagic-dev | This library can be used to classify files according to magic number tests. |
Libsharedmime | This is an implementation for libsharedmime. |
MediaConch | MediaConch is a file validation software. |
NARA File Analyzer and Metadata Harvester | NARA File Analyzer and Metadata Harvester allows a user to analyze the contents of a file system or external drive and generates statistics about the contents of the contained directories. |
Nanite | A friendly swarm of format-identifying robots |
Officeparser.py | officerparser.py is a python script that parses the format of OLE compound documents used by Microsoft Office applications. |
Ohcount | Analyses plain text files, looking for code (scripting languages etc.) |
PRONOM Signature Development Utility | Output DROID compatible file format signature files using PRONOM syntax |
Puremagic | Puremagic is a cross-platform pure python module that will identify a file based off it's magic numbers |
Siegfried | A PRONOM based, command line, file format identification tool using Aho Corasick matching and no buffer limits. |
TrID File Identifier | TrID is a utility designed to identify file types from their binary signatures. |
Web Archive Discovery | Indexing and discovery tools for web archives. |