Skip to content

Commit ffac6af

Browse files
fangqclaude
andcommitted
[schema] fix _ArrayZipSize_, _ArrayIsComplex_, _ArrayChunks_, _ArrayZipData_
- _ArrayZipSize_ description: was wrong ("shape of a full chunk"); now correctly states it stores the shape of the FULL pre-processed array and that decoders use ceil(ZipSize/Chunks) for tile counts - _ArrayIsComplex_ description: was wrong ("interleaved"); now states that _ArrayData_ stores real parts in row 1 and imaginary parts in row 2 (separate rows, not byte-interleaved) - _ArrayChunks_: add minItems:1; update description to cross-reference _ArrayZipSize_ requirement - _ArrayZipType_: clarify that slash-separated form (blosc2/lz4) is an implementation alias for the concatenated form (blosc2lz4) - _ArrayZipData_: chunk items now allow _DataLink_ objects in addition to base64 strings, enabling distributed/lazy-loaded chunk storage Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1 parent b708d71 commit ffac6af

1 file changed

Lines changed: 22 additions & 9 deletions

File tree

schema/jdata_format_schema.json

Lines changed: 22 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -300,7 +300,7 @@
300300
},
301301
"_ArrayIsComplex_": {
302302
"type": "boolean",
303-
"description": "True if array contains complex numbers (real and imaginary parts interleaved)"
303+
"description": "True if array contains complex numbers. _ArrayData_ is a 2-D array whose first row holds the serialized real part and second row holds the serialized imaginary part of the complex array (separate rows, not byte-interleaved)."
304304
},
305305
"_ArrayIsSparse_": {
306306
"type": "boolean",
@@ -382,7 +382,8 @@
382382
"type": "integer",
383383
"minimum": 1
384384
},
385-
"description": "Tile (chunk) shape for partitioning the pre-processed array into independently compressible blocks. Length must equal the number of dimensions of the pre-processed array. When present, _ArrayData_ or _ArrayZipData_ becomes a 1-D array of per-chunk payloads in row-major order. The last chunk along any dimension may be smaller than the declared shape."
385+
"minItems": 1,
386+
"description": "Tile (chunk) shape for partitioning the pre-processed array into independently compressible blocks. Length must equal the number of dimensions of the pre-processed array. When present, _ArrayData_ or _ArrayZipData_ becomes a 1-D array of per-chunk payloads in row-major order. The last chunk along any dimension may be smaller than the declared shape. _ArrayZipSize_ must also be present and stores the shape of the full pre-processed array (not the chunk shape)."
386387
},
387388
"_ArrayZipType_": {
388389
"type": "string",
@@ -393,7 +394,7 @@
393394
"blosc2blosclz", "blosc2zstd", "blosc2zlib",
394395
"base64"
395396
],
396-
"description": "Compression codec identifier following the Numcodecs registry (also used by Zarr). Note: zlib (RFC 1950) and gzip (RFC 1952) are distinct formats. Only Blosc2 (not Blosc v1) is supported. blosc2 defaults to BloscLZ internal codec."
397+
"description": "Compression codec identifier. Note: zlib (RFC 1950) and gzip (RFC 1952) are distinct formats. Only Blosc2 (not Blosc v1) is supported. 'blosc2' defaults to the BloscLZ internal codec; 'blosc2lz4' selects LZ4, 'blosc2lz4hc' LZ4-HC, 'blosc2zstd' Zstandard, 'blosc2zlib' zlib. Additional codec-specific parameters (typesize, clevel, shuffle, nthreads) may be passed via _ArrayZipOptions_. Note: some implementations use a slash separator (e.g. 'blosc2/lz4') as an alternative to the concatenated form ('blosc2lz4'); both refer to the same codec."
397398
},
398399
"_ArrayZipSize_": {
399400
"oneOf": [
@@ -403,15 +404,16 @@
403404
"type": "integer",
404405
"minimum": 0
405406
},
406-
"description": "Dimensions of the pre-processed array (multi-dimensional)"
407+
"minItems": 1,
408+
"description": "Shape of the pre-processed array (multi-dimensional form)"
407409
},
408410
{
409411
"type": "integer",
410412
"minimum": 0,
411-
"description": "Total element count of the pre-processed array (scalar shorthand)"
413+
"description": "Total element count of the pre-processed array (scalar shorthand for 1-D case)"
412414
}
413415
],
414-
"description": "Dimensions of the pre-processed array before compression. When _ArrayChunks_ is present, gives the shape of a full (non-boundary) chunk."
416+
"description": "Shape of the full pre-processed array before compression. When _ArrayChunks_ is present, this field MUST store the shape of the complete pre-processed array (NOT the chunk shape). The decoder uses ceil(_ArrayZipSize_ / _ArrayChunks_) to determine the number of chunks per dimension and the size of boundary tiles."
415417
},
416418
"_ArrayZipData_": {
417419
"oneOf": [
@@ -421,11 +423,22 @@
421423
},
422424
{
423425
"type": "array",
424-
"items": { "type": "string" },
425-
"description": "1-D array of Base64-encoded per-chunk compressed payloads (chunked array, when _ArrayChunks_ is present)"
426+
"items": {
427+
"oneOf": [
428+
{
429+
"type": "string",
430+
"description": "Base64-encoded compressed payload for one chunk"
431+
},
432+
{
433+
"$ref": "#/definitions/DataLinkDef",
434+
"description": "_DataLink_ reference to an externally stored chunk payload (enables distributed/lazy-loaded chunk storage)"
435+
}
436+
]
437+
},
438+
"description": "1-D array of per-chunk compressed payloads in row-major order (when _ArrayChunks_ is present). Each element is either a Base64-encoded string or a _DataLink_ pointing to an external chunk."
426439
}
427440
],
428-
"description": "Compressed and Base64-encoded array data"
441+
"description": "Compressed and Base64-encoded array data, or a 1-D cell of per-chunk payloads when _ArrayChunks_ is present"
429442
},
430443
"_ArrayZipEndian_": {
431444
"type": "string",

0 commit comments

Comments
 (0)