JSONID

From COPTR
Jump to navigation Jump to search
Identification of JSON, YAML, and TOML document types
Homepage:https://github.com/ffdev-info/jsonid
Source Code:https://github.com/ffdev-info/jsonid
License:Apache 2.0
Cost:Free and Open Source (FOSS)
Platforms:Windows, Linux, MacOS
Language:Python
Input Formats:JSON, JSONL, YAML, TOML
Function:File Format Identification



Description

Identification of JSON, YAML, and TOML document types.

Functionality

JSONID parses serialization/deserialization formats ("serde") such as JSON, YAML, and TOML to provide unambiguous identification. JSONID also introduces a declarative syntax for writing document type signatures to enable identification of specific serde document types. Key-value attributes can be shared across formats, and so signatures for JSON and YAML, for example, need only be written once.

Registry

As a temporary placeholder JSONID signatures are available in a registry. The long-term goal of this project is to enable other registries to delivery JSONID compatible signatures, e.g. PRONOM, Wikidata, and remove the need for a centralized resource like this.

Technical characteristics

JSONID explores potential technical characteristics that can be attributed to serde formats. An example for a basic JSON object might look as follows:

{
  "content_length": 82,
  "number_of_lines": 9,
  "line_warning": false,
  "top_level_keys_count": 1,
  "top_level_keys": [
    "key1"
  ],
  "top_level_types": [
    "map"
  ],
  "depth": 4,
  "heterogeneous_list_types": false,
  "fingerprint": {
    "unf": "UNF:6:YEKQWBGm75JsN6H+8SzYRg==",
    "cid": "bafkreidexnd3r76r5h3invwvu554573px5z4fg4uglw4pextmqc765kz64"
  },
  "doctype": "JSON",
  "encoding": "UTF-8",
  "agent": "jsonid/0.12.0 (ffdev-info)"
}

The use of these technical characteristics will be explored in the documentation and future writing.

Universal fingerprint

JSONID exports two universal fingerprints enabling the assertion of equivalence between different data objects. Universal Numerical Fingerprint (UNF) is also used in the Dataverse project. Content Identifiers (CIDs) come from the IPFS project and enable content-addressed storage within that ecosystem and others.

The significance of these fingerprinting techniques is their application to identical data structures stored in different file formats.

The checksum of different file formats will always evaluate differently, but analysed as data structures, we can begin to appraise data beyond its presentation.

Fingerprinting example

Content
{
    "hello": "world",
    "goodbye": false,
    "values": [
        1,
        2,
        3.142
    ]
}
goodbye: false
hello: world
values:
- 1
- 2
- 3.142


hello = "world"
values = [1, 2, 3.142]
goodbye = false





Type JSON YAML TOML
Checksum (MD5) bcd5a37f36ada2e4b72144d90a1427d5 b5e75fdc100032f2744eff1e1bdf5b88 04b33e24e0cc208c6bd70fabaef3a9c5
UNF UNF:6:97EfAWBIQlObVCVwa7kc0g== UNF:6:97EfAWBIQlObVCVwa7kc0g== UNF:6:97EfAWBIQlObVCVwa7kc0g==
CID bafkreiawsimwdn4blnb7scz2cfwtdksifrayccsl3z6gmxam6uxddctkoy bafkreiawsimwdn4blnb7scz2cfwtdksifrayccsl3z6gmxam6uxddctkoy bafkreiawsimwdn4blnb7scz2cfwtdksifrayccsl3z6gmxam6uxddctkoy


Example identification output

JSONID has currently settled on a line-by-line, Fine Free File Command (File) and MIME-Like output. Fields are tab delineated and are human readable.

/media/govdocs/govdocs_selected/LOG_1/074500.log	[1]	application/json; charset=UTF-8; doctype="JavaScript Object Notation (JSON)"; ref=jrid:JSON
/media/govdocs/govdocs_selected/HTML_24/009970.html	[1]	application/json; charset=UTF-8; doctype="JavaScript Object Notation (JSON)"; ref=jrid:JSON
/media/govdocs/govdocs_selected/TEXT_6/131522.txt	[1]	application/json; charset=UTF-8; doctype="JavaScript Object Notation (JSON)"; ref=jrid:JSON

It is anticipated that users will help guide the ongoing development of JSONID's output to better support both user and repository workflows.

PRONOM signature development

JSONID provides a high-level language for output of PRONOM compatible signatures. The feature set is still in its BETA phase but JSONID provides two distinct capabilities:

Registry output

JSONID's registry can be output using the `--pronom` flag. A signature file will be created under `jsonid_pronom.xml` which can be imported into DROID for identification of document types registered with JSONID.

JSONID's registry is output alongisde a handful of baseline JSON signatures designed to capture "plain"-JSON that is not yet encoded in the registry.

Signature development

A standalone `json2pronom` utility is provided for creation of potentially robust DROID compatible signatures.

As a high-level language, signatures can be defined in easy to understand syntax and then output consistently via the `json2pronom` utility. Signatures include sensible defaults for whitespace and other aspects that are difficult for signature developers to consistently anticipate when writing JSON based signatures.

See the JSONID docs for more information.

Sample files

Sample files used for the development of JSONID signatures can be found in their own repository.

User experiences

Development Activity

All development activity is visible on GitHub: https://github.com/ffdev-info/jsonid/commits

Release Feed

Below the last 3 release feeds:

2026-04-08 07:17:21
[tag:github.com,2008:Repository/964720703/0.12.4 0.12.4]
by github-actions[bot]
2026-01-15 15:59:23
[tag:github.com,2008:Repository/964720703/0.12.3 0.12.3]
by github-actions[bot]
2026-01-04 23:05:32
[tag:github.com,2008:Repository/964720703/0.12.2 0.12.2]
by github-actions[bot]

Activity Feed

Below the last 5 commits:

2026-04-08 07:10:46
[tag:github.com,2008:Grit::Commit/6b874c6a3ce1831df92294b52a92243eb29ccbf2 Update registry entries]
by ross-spencer https://github.com/ross-spencer
2026-04-08 07:10:29
[tag:github.com,2008:Grit::Commit/497ed60a1175a555cbbc371195af337eb6543831 Update ToC]
by ross-spencer https://github.com/ross-spencer
2026-04-08 07:09:11
[tag:github.com,2008:Grit::Commit/33e773cfdf5d187c9b31cf4e6dc48d43d6ed3c21 Update readme and apply linting]
by ross-spencer https://github.com/ross-spencer
2026-04-08 07:01:47
[tag:github.com,2008:Grit::Commit/bf8dd4d8b5a79adebbbd0738f4c37665c48a952e Add DROID reference file]
by ross-spencer https://github.com/ross-spencer
2026-04-08 06:56:27
[tag:github.com,2008:Grit::Commit/2a17f07b8d288d3b1c418bb614f715d2fc0a7550 Add pylint exception]
by ross-spencer https://github.com/ross-spencer