Skip to content

JData Specification Draft-4

Latest

Choose a tag to compare

@fangq fangq released this 21 Apr 03:10

image

JData Specification Draft-4 Release Notes

Version: V1 (Draft-4)
Release date: 2026-04-08
URL: https://neurojson.org/jdata/draft4
Previous release: Draft-3 (2025-03-24)


Overview

Draft-4 extends the JData specification with new array annotation keywords, a
formal schema attachment mechanism, richer table and enumeration semantics,
expanded compression codec support, and alignment with BJData Draft-4. The
introduction has been substantially rewritten to frame JData as a
"source-code format for scientific data" with explicit attention to FAIR
principles and AI-tool interoperability.


New Keywords

Metadata

  • _DataSchema_ — associate a JSON Schema object or URI with any data
    node for formal structure validation. Parsers that do not recognize it
    must silently ignore it. The CouchDB-compatible alias ".dataschema" is
    also accepted (alongside the existing ".datainfo" alias for _DataInfo_).

N-D Array

  • _ArrayCoords_ — map dimension names (from _ArrayLabel_) to 1-D
    coordinate arrays, mirroring the coordinate-variable model of xarray and
    NetCDF for lossless round-trip conversion.
  • _ArrayUnits_ — specify physical units per dimension (UDUNITS-2
    convention, e.g. "mm", "ms", "kg/m^3").
  • _ArrayFillValue_ — scalar sentinel for missing/invalid entries in
    integer-typed arrays where IEEE 754 NaN cannot be represented.
  • _ArrayChunks_ — tile shape for partitioning the pre-processed array
    into independently compressible blocks; when present _ArrayData_ /
    _ArrayZipData_ becomes a 1-D array of per-chunk payloads.

Table

  • _TableIndex_ — name one or more columns as the unique row index
    (analogous to a SQL primary key or pandas.DataFrame.index).
  • _TableSortOrder_ — declare the stored sort order of records; a
    leading "-" prefix on a column name denotes descending order.

Enumeration

  • _EnumOrdered_ — boolean flag marking a categorical array as ordered
    (ordinal), equivalent to ordered=True in a pandas CategoricalDtype or
    an R ordered factor.

Extended Keywords

N-D Array — _ArrayShape_ new shape IDs

The following structured-matrix and compact-range shapes are added alongside
the existing "toeplitz" shape:

Shape ID Description
"circulant" Circulant matrix; only first row stored.
"hankel" Hankel matrix; first row and last column stored.
"identity" Identity (or scaled-identity) matrix; scalar diagonal value stored.
"zero" All-zeros matrix; no data storage required beyond _ArraySize_.
"range" Uniformly spaced 1-D vector or N-D separable grid; compact [start, end] pair per dimension.

N-D Array — float type aliases

float16, float32, and float64 are now accepted as aliases for half,
single, and double, respectively, to improve interoperability with NumPy
and Apache Arrow toolchains. Canonical names remain preferred for new files.

_ArrayZipType_ — expanded codec table

_ArrayZipType_ identifiers now follow the
Numcodecs codec registry
(also adopted by Zarr). Newly recognized identifiers include:

Identifier Description
"zstd" Zstandard (RFC 8878)
"lz4" LZ4 block compression
"blosc2" Blosc2 meta-compressor (default: BloscLZ)
"blosc2lz4" Blosc2 with LZ4
"blosc2lz4hc" Blosc2 with LZ4-HC
"blosc2blosclz" Blosc2 with BloscLZ
"blosc2zstd" Blosc2 with Zstandard
"blosc2zlib" Blosc2 with zlib
"base64" Base64 encoding only (no compression)

Note: "zlib" and "gzip" are distinct formats and must not be treated as
synonyms. Only Blosc2 (not Blosc v1) is supported.

Table — "datetime" DataType

The "datetime" value is added to the DataType field in _TableCols_ /
_TableRows_, representing ISO 8601 date-time strings. Values without an
explicit time-zone offset are interpreted as UTC.

_DataInfo_ — additional recommended properties

License, GeneratedBy, DerivedFrom, and SourceFormat are added to the
list of recommended _DataInfo_ properties, with standardized definitions
(SPDX identifiers, URI references, ISO 8601 timestamps).


BJData Draft-4 Alignment

JData Draft-4 is aligned with
BJData Specification Draft-4:

  • New binary data type marker [B]: byte (bringing the total new markers to 5).
  • New Extended data type [E] for custom binary type extensions.
  • New optimized structure-of-array (SOA) container for packed object storage.
  • References updated from BJData Draft-3 to Draft-4.

Clarifications and Editorial Changes

  • Linked list: corrected keyword names from _LinkNext_/_LinkPrior_ to
    _ListNext_/_ListPrior_ throughout; added inline metadata Length=N on
    _LinkedList_ to declare node count for parser pre-allocation.
  • Undirected graphs: clarified that _GraphEdges0_ encodes undirected
    edges (each [A,B] entry implies [B,A]; parsers must not double-store)
    and that _GraphMatrix_ is symmetric when paired with _GraphEdges0_.
  • Enumeration indexing: explicitly documented the 1-based convention of
    _EnumValue_ and the adjustment required when converting to 0-based
    formats such as JSONPath or Apache Arrow DictionaryArray.
  • _ArrayShuffle_: corrected the example byte-stream reordering and
    clarified bit-wise shuffle semantics.
  • Bug fixes in examples: corrected typos _ArrayTye__ArrayType_,
    _ArraySize_ArraySize_, _ArrayIsSparse_ArrayIsSparse_ in the
    graph section examples.
  • toeplitz shape: minor prose fix (removed stray space before comma).

New Artifacts

  • schema/jdata_format_schema.json — a JSON Schema (Draft-07) document
    formally describing all JData annotation keywords, enabling automated
    validation of JData documents.

Rewritten Introduction

The introduction has been reorganised into four focused subsections:

  1. Background — motivates JData through the lens of FAIR principles and
    the fragility of opaque binary formats for long-term data sharing.
  2. JData as the source-code format for scientific data — positions JData
    as a human-readable, AI-tool-compatible representation analogous to
    source code for data.
  3. Building on the JSON ecosystem — highlights JSON Schema, JSONPath,
    JSON-LD, and NoSQL database integration.
  4. Binary JData for performance-sensitive applications — concise summary
    of BJData advantages.

ChangeLog

2026-04-19 [b43a1d0] [bug] restrict ArrayCoord to only array of arrays
2026-04-17 [06f842c] [schema] remove incorrect slash-separator note from ArrayZipType
2026-04-17 [ffac6af] [schema] fix ArrayZipSize, ArrayIsComplex, ArrayChunks, ArrayZipData
2026-04-17 [b708d71] [bug] fix ArrayChunks and ArrayZipSize
2026-04-08 [a1b14fc] [doc] update README, tag Draft-4
2026-04-08 [a88fca8] [schema] update json schema
2026-04-08 [e0dfdd5] [doc] update header info, tag Draft-4
2026-04-08 [c8bfe9f] [spellcheck] fix errors in spellcheck ci
2026-04-08 [a800056] spec: extend annotation keywords, array shapes, and fix editorial issues
2026-04-08 [57de727] Update introduction
2025-08-24 [05c6e04] [schema] add json schema for jdata annotations