Skip to content

feat: Add LargeList support for JavaScript bindings#438

Open
Karakatiza666 wants to merge 1 commit into
apache:mainfrom
Karakatiza666:main
Open

feat: Add LargeList support for JavaScript bindings#438
Karakatiza666 wants to merge 1 commit into
apache:mainfrom
Karakatiza666:main

Conversation

@Karakatiza666
Copy link
Copy Markdown

@Karakatiza666 Karakatiza666 commented May 19, 2026

This PR was co-authored with Claude Code.


Summary

This PR builds on an unresolved #299 to implement full support for the LargeList data type in Apache Arrow JavaScript bindings. LargeList uses 64-bit offsets (BigInt64Array) instead of 32-bit offsets, enabling list values larger than 2GB.

Where possible, the code size was reduced by distilling helpers used in both List and LargeList.

Related Issues

Closes #70

Implementation Details

Core Type System

  • Added Type.LargeList = 21 enum value
  • Implemented LargeList<T> class with BigInt64Array offset support
  • Added DataType.isLargeList() type guard
  • Added LargeListDataProps interface and MakeDataVisitor.visitLargeList (widens 32-bit offsets via toBigInt64Array)
  • Mapped LargeList and LargeListBuilder into TypeToDataType, TypeToBuilder, and DataTypeToBuilder in interfaces.ts

Visitor Pattern Implementation

Wired visitLargeList() across every visitor, factoring shared helpers where the offset width was the only difference:

  • GetVisitor / SetVisitor: merged getList / setList into single helpers using bigIntToNumber at the offset boundary — one implementation covers both List and LargeList
  • IteratorVisitor, IndexOfVisitor: register visitLargeList (the generic implementations are offset-width agnostic)
  • TypeComparator: widened compareList to List | LargeList (structural comparison only)
  • VectorAssembler: generalized assembleListVector to coerce begin/end via bigIntToNumber; registers visitLargeList
  • VectorLoader: visitLargeList mirrors visitList; base readOffsets already honors OffsetArrayType (BigInt64Array)
  • JSONVectorAssembler: emits OFFSET via bigNumsToStrings, matching the LargeUtf8 / LargeBinary pattern
  • TypeAssembler / JSONTypeAssembler: FlatBuffers + JSON type serialization

IPC Support

  • ipc/metadata/message.ts: decodeFieldType handles Type.LargeList
  • Read and write paths both round-trip via the assembler/loader registrations above

Latent Bug Fix

  • util/buffer.ts: rebaseValueOffsets now coerces its number offset to BigInt when the offsets array is BigInt64Array. Previously a non-zero offset on a 64-bit offsets array would TypeError on bigint += number — required for LargeList IPC writes on sliced data, and also fixes the same latent issue for LargeUtf8 / LargeBinary.

Builders

  • New src/builder/largelist.ts (LargeListBuilder), mirroring ListBuilder with BigInt() for offset accumulation and Number() coercion when passing the start index to child.set
  • Widened VariableWidthBuilder bound to include LargeList in builder.ts
  • GetBuilderCtor.visitLargeList returns LargeListBuilder

Testing

  • test/generate-test-data.ts:
    • Factored a shared generateListLike helper used by both generateList (Int32) and generateLargeList (BigInt64)
    • Added createVariableWidthOffsets64; truncates min / max at entry so fractional stride from childVec.length / (length - nullCount) doesn't RangeError in BigInt()
  • test/unit/generated-data-tests.ts: LargeList added to the matrix
  • test/unit/builders/builder-tests.ts: LargeListBuilder entry added alongside ListBuilder / FixedSizeListBuilder / MapBuilder
  • test/unit/visitor-tests.ts: visitLargeList added to BasicVisitor / FeatureVisitor and to both describe matrices

Public API

  • Exported LargeList and LargeListBuilder from src/Arrow.ts and src/Arrow.dom.ts

Test Plan

All existing tests continue to pass, plus the LargeList path is exercised by:

  • ✅ Generated-data matrix: get / set / iterator / indexOf / slice / concat / IPC round-trip
  • ✅ Builder matrix: no-nulls / with-nulls / length=518
  • ✅ Visitor dispatch (BasicVisitor + FeatureVisitor)
  • ✅ IPC stream round-trip (16 IPC suites green, including JSON form via JSONVectorAssembler / JSONVectorLoader)

All tests across 45 suites pass.

The tests were run with:

npx jest --config jestconfigs/jest.src.config.js

Checklist

  • Implementation follows existing code patterns
  • All visitor methods implemented (get / set / iterator / indexOf / TypeComparator / VectorAssembler / VectorLoader / JSONVectorAssembler / TypeAssembler / JSONTypeAssembler)
  • IPC serialization/deserialization support added (binary + JSON form)
  • LargeListBuilder added and wired through GetBuilderCtor + interfaces.ts
  • Latent rebaseValueOffsets bigint bug fixed
  • Comprehensive tests added using existing test framework
  • All tests passing
  • Public API exports added
  • No breaking changes

Notes

  • This implementation provides full LargeList support: IPC read/write (binary + JSON form), in-memory access and mutation, type comparison, and construction via LargeListBuilder — parallel to the existing List type, just with 64-bit offsets.
  • Storage and wire format are honest 64-bit (BigInt64Array end-to-end). The only narrowing happens at JS-runtime boundaries where Data.slice accepts number — identical to the LargeUtf8 / LargeBinary policy upstream
  • Helpers were merged across List/LargeList only where the offset width was the sole difference and bigIntToNumber coercion at the boundary made the merge non-confusing; LargeListBuilder stays separate because the BigInt() / Number() coercions in _flushPending would obscure a merged version

@Karakatiza666
Copy link
Copy Markdown
Author

Karakatiza666 commented May 19, 2026

As for the ASF Generative Tooling Guidance:

Anthropic's Commercial Terms still state:

Anthropic agrees that Customer (a) retains all rights to its Inputs, and (b) owns its Outputs.

So, I can confirm that:

  • The terms and conditions of the generative AI tool do not place any restrictions on use of the output that would be inconsistent with the Open Source Definition.
  • The output is not copyrightable subject matter (and would not be even if produced by a human).
  • No third party materials are included in the output.

@kou
Copy link
Copy Markdown
Member

kou commented May 20, 2026

@kylebarron @supermacro @pmaciolek @GeorgeLeePatterson Could you review this?
(You're in related issue/PR: #70 #299)

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds end-to-end support for the Arrow LargeList data type (64-bit offsets via BigInt64Array) to the JavaScript bindings, including IPC round-tripping, visitor dispatch, builders, and test coverage.

Changes:

  • Introduces Type.LargeList and LargeList<T> with 64-bit offset handling, plus DataType.isLargeList() and makeData() support.
  • Wires visitLargeList() across core visitors (get/set/iterator/indexOf, assemblers/loaders, type comparator, JSON + FlatBuffers type/vec assembly) and IPC field-type decoding.
  • Adds LargeListBuilder and expands tests to include LargeList in generated-data, builder, and visitor matrices.

Reviewed changes

Copilot reviewed 26 out of 26 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
src/type.ts Adds LargeList<T> type and DataType.isLargeList() guard.
src/enum.ts Adds Type.LargeList = 21.
src/data.ts Adds LargeListDataProps + MakeDataVisitor.visitLargeList (BigInt64 offsets).
src/visitor.ts Adds visitLargeList dispatch and dtype inference support.
src/ipc/metadata/message.ts Decodes LargeList field types from IPC metadata.
src/visitor/* Implements/registrations for visitLargeList across visitors (loader/assembler/get/set/etc.).
src/util/buffer.ts Fixes rebaseValueOffsets to work with BigInt64Array.
src/builder/largelist.ts Introduces LargeListBuilder.
src/builder.ts / src/interfaces.ts / src/visitor/builderctor.ts Wires LargeListBuilder into builder/type mappings and ctor selection.
src/Arrow.ts / src/Arrow.dom.ts Exports LargeList and LargeListBuilder as public API.
test/generate-test-data.ts + unit tests Adds LargeList generators and includes LargeList in test matrices.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/builder/largelist.ts
Comment on lines +45 to +50
const v = value as T['TValue'];
const n = v.length;
const start = Number(offsets.set(index, BigInt(n)).buffer[index]);
for (let i = -1; ++i < n;) {
child.set(start + i, v[i]);
}
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, fixed

Implement full support for the LargeList data type, which uses 64-bit
offsets (BigInt64Array) instead of 32-bit offsets, enabling list values
larger than 2GB.

Type and data:
- Add LargeList type class with BigInt64Array offset support,
  DataType.isLargeList guard, and Type.LargeList enum entry
- Add MakeDataVisitor.visitLargeList and LargeListDataProps overload in
  data.ts (widens 32-bit offsets via toBigInt64Array)
- Map LargeList through TypeToDataType, TypeToBuilder, and
  DataTypeToBuilder in interfaces.ts

Visitors (read/write/compare/mutate):
- visitor.ts: base visitLargeList + Type.LargeList dispatch in
  getVisitFnByTypeId and inferDType
- get.ts: merge getList/getLargeList into a single helper using
  bigIntToNumber at the offset boundary (works for both Int32Array and
  BigInt64Array offsets)
- set.ts: same merge for setList; register visitLargeList
- iterator.ts, indexof.ts: register visitLargeList (vectorIterator /
  indexOfValue work unchanged)
- typecomparator.ts: widen compareList to List | LargeList and register
  for visitLargeList (structural comparison is offset-width agnostic)
- typeassembler.ts, jsontypeassembler.ts: emit LargeList flatbuffer
  node and JSON name
- vectorloader.ts: visitLargeList mirrors visitList; base readOffsets
  honors OffsetArrayType (BigInt64Array for LargeList)
- vectorassembler.ts: generalize assembleListVector to cover LargeList
  via bigIntToNumber, register visitLargeList
- jsonvectorassembler.ts: visitLargeList emits OFFSET via
  bigNumsToStrings, matching LargeUtf8 / LargeBinary
- ipc/metadata/message.ts: decodeFieldType handles Type.LargeList

Builders:
- New src/builder/largelist.ts (LargeListBuilder), mirroring ListBuilder
  with BigInt() for offset accumulation and Number() coercion when
  passing the start index to child.set
- Widen VariableWidthBuilder bound to include LargeList in builder.ts
- builderctor.ts: GetBuilderCtor.visitLargeList returns LargeListBuilder
- Export LargeListBuilder from Arrow.ts and Arrow.dom.ts

Latent bug fix:
- util/buffer.ts: rebaseValueOffsets now coerces its number offset to
  BigInt when the offsets array is BigInt64Array. Previously a non-zero
  offset on a 64-bit offsets array would TypeError on bigint += number;
  fix is required for LargeList IPC writes with non-zero slice offsets
  and also fixes the same latent issue on LargeUtf8 / LargeBinary

Tests:
- generate-test-data.ts: factor a shared generateListLike helper used
  by both generateList (Int32Array offsets) and generateLargeList
  (BigInt64Array offsets); truncate min/max in
  createVariableWidthOffsets64 so fractional stride values don't
  RangeError in BigInt()
- generated-data-tests.ts: LargeList case added to the matrix
- builders/builder-tests.ts: LargeListBuilder entry added alongside
  ListBuilder / FixedSizeListBuilder / MapBuilder
- visitor-tests.ts: visitLargeList added to BasicVisitor / FeatureVisitor
  plus describe entries; fix missing comma in the import list that
  would have broken compilation

API surface:
- Export LargeList and LargeListBuilder from Arrow.ts and Arrow.dom.ts

The implementation follows existing code patterns. All tests pass.

Closes apache#70

Co-Authored-By: Claude Code <noreply@anthropic.com>
Copy link
Copy Markdown
Member

@domoritz domoritz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you compare with #299 and explain what's different?

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 26 out of 26 changed files in this pull request and generated 1 comment.

}
public visitLargeList<T extends LargeList>(data: Data<T>) {
return {
'OFFSET': [...bigNumsToStrings(data.valueOffsets, 2)],
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[JS] Support LargeList

4 participants