File Format Identification

Jump to navigation Jump to search
Function definition: Tools that enable the automatic identification of the file format of a particular file, typically by examining characteristic codes (often termed file format magic) in the file header.
Lifecycle stage: Ingest

Tools for this function

Apache TikaJava based tool for identifying file formats using signatures and extracting metadata and text content from documents.
ClocCloc (Count Lines of Code) serves not only to count the lines of Code,but also guesses the programming language, thus can be used to identify files. It is a command line tool which is easy to use.
Crazy-fast-image-scanA script to scan media very quickly to find out what kind of content it contains
DROID (Digital Record Object Identification)DROID (Digital Record Object Identification) is a software tool developed to perform automated batch identification of file formats.
DUMPBIN UtilityThe DUMPBIN utility, which is provided with the 32-bit version of Microsoft Visual C++, combines the abilities of the LINK, LIB, and EXEHDR utilities.
DiPS (Digital Preservation Solution)DiPS (OAIS compliant Digital Preservation Solution)
DiskFormatIDIdentify floppy disk formats from kryoflux stream files
Duke Data AccessionerData Accessioner provides a graphical user interface to aid in migrating data from physical media to a dedicated file server, documenting the process and using MD5 checksums to identify any errors introduced in transfer.
FFAStransTask automation engine, mostly used in audio and video visual content management.
FIDO (Format Identification for Digital Objects)A PRONOM based, command line, file format identification tool written in Python
FITS (File Information Tool Set)FITS allows data curators to identify, validate, and extract technical metadata for the objects in their digital repository.
File Format Identification PronomPerl API to analyze and handle droid (PRONOM) signatures
FilestarUniversal file converter for 900+ file types.
Fine Free File CommandThis is the home page for the open source implementation of the file(1) command that ships with every free operating system (OpenBSD, Linux, NetBSD, FreeBSD, etc.
FqTool, language and decoders for working with binary data.
Gvfs-infogvfs-info - print information about files and directories
JHOVE (Harvard Object Validation Environment)JHOVE provides functions to perform format-specific identification, validation, and characterization of digital objects.
JHOVE2JHOVE2 allows data curators to characterise the digital objects in their repositories.
Libmagic-devThis library can be used to classify files according to magic number tests.
LibsharedmimeThis is an implementation for libsharedmime.
MediaConchMediaConch is a file validation software.
NARA File Analyzer and Metadata HarvesterNARA File Analyzer and Metadata Harvester allows a user to analyze the contents of a file system or external drive and generates statistics about the contents of the contained directories.
NaniteA friendly swarm of format-identifying robots is a python script that parses the format of OLE compound documents used by Microsoft Office applications.
OhcountAnalyses plain text files, looking for code (scripting languages etc.)
PRONOM Signature Development UtilityOutput DROID compatible file format signature files using PRONOM syntax
PuremagicPuremagic is a cross-platform pure python module that will identify a file based off it's magic numbers
SiegfriedA PRONOM based, command line, file format identification tool using Aho Corasick matching and no buffer limits.
TrID File IdentifierTrID is a utility designed to identify file types from their binary signatures.
Web Archive DiscoveryIndexing and discovery tools for web archives.