feat: Add LargeList support for JavaScript bindings#438
Conversation
|
As for the ASF Generative Tooling Guidance: Anthropic's Commercial Terms still state:
So, I can confirm that:
|
|
@kylebarron @supermacro @pmaciolek @GeorgeLeePatterson Could you review this? |
There was a problem hiding this comment.
Pull request overview
This PR adds end-to-end support for the Arrow LargeList data type (64-bit offsets via BigInt64Array) to the JavaScript bindings, including IPC round-tripping, visitor dispatch, builders, and test coverage.
Changes:
- Introduces
Type.LargeListandLargeList<T>with 64-bit offset handling, plusDataType.isLargeList()andmakeData()support. - Wires
visitLargeList()across core visitors (get/set/iterator/indexOf, assemblers/loaders, type comparator, JSON + FlatBuffers type/vec assembly) and IPC field-type decoding. - Adds
LargeListBuilderand expands tests to include LargeList in generated-data, builder, and visitor matrices.
Reviewed changes
Copilot reviewed 26 out of 26 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
src/type.ts |
Adds LargeList<T> type and DataType.isLargeList() guard. |
src/enum.ts |
Adds Type.LargeList = 21. |
src/data.ts |
Adds LargeListDataProps + MakeDataVisitor.visitLargeList (BigInt64 offsets). |
src/visitor.ts |
Adds visitLargeList dispatch and dtype inference support. |
src/ipc/metadata/message.ts |
Decodes LargeList field types from IPC metadata. |
src/visitor/* |
Implements/registrations for visitLargeList across visitors (loader/assembler/get/set/etc.). |
src/util/buffer.ts |
Fixes rebaseValueOffsets to work with BigInt64Array. |
src/builder/largelist.ts |
Introduces LargeListBuilder. |
src/builder.ts / src/interfaces.ts / src/visitor/builderctor.ts |
Wires LargeListBuilder into builder/type mappings and ctor selection. |
src/Arrow.ts / src/Arrow.dom.ts |
Exports LargeList and LargeListBuilder as public API. |
test/generate-test-data.ts + unit tests |
Adds LargeList generators and includes LargeList in test matrices. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| const v = value as T['TValue']; | ||
| const n = v.length; | ||
| const start = Number(offsets.set(index, BigInt(n)).buffer[index]); | ||
| for (let i = -1; ++i < n;) { | ||
| child.set(start + i, v[i]); | ||
| } |
Implement full support for the LargeList data type, which uses 64-bit offsets (BigInt64Array) instead of 32-bit offsets, enabling list values larger than 2GB. Type and data: - Add LargeList type class with BigInt64Array offset support, DataType.isLargeList guard, and Type.LargeList enum entry - Add MakeDataVisitor.visitLargeList and LargeListDataProps overload in data.ts (widens 32-bit offsets via toBigInt64Array) - Map LargeList through TypeToDataType, TypeToBuilder, and DataTypeToBuilder in interfaces.ts Visitors (read/write/compare/mutate): - visitor.ts: base visitLargeList + Type.LargeList dispatch in getVisitFnByTypeId and inferDType - get.ts: merge getList/getLargeList into a single helper using bigIntToNumber at the offset boundary (works for both Int32Array and BigInt64Array offsets) - set.ts: same merge for setList; register visitLargeList - iterator.ts, indexof.ts: register visitLargeList (vectorIterator / indexOfValue work unchanged) - typecomparator.ts: widen compareList to List | LargeList and register for visitLargeList (structural comparison is offset-width agnostic) - typeassembler.ts, jsontypeassembler.ts: emit LargeList flatbuffer node and JSON name - vectorloader.ts: visitLargeList mirrors visitList; base readOffsets honors OffsetArrayType (BigInt64Array for LargeList) - vectorassembler.ts: generalize assembleListVector to cover LargeList via bigIntToNumber, register visitLargeList - jsonvectorassembler.ts: visitLargeList emits OFFSET via bigNumsToStrings, matching LargeUtf8 / LargeBinary - ipc/metadata/message.ts: decodeFieldType handles Type.LargeList Builders: - New src/builder/largelist.ts (LargeListBuilder), mirroring ListBuilder with BigInt() for offset accumulation and Number() coercion when passing the start index to child.set - Widen VariableWidthBuilder bound to include LargeList in builder.ts - builderctor.ts: GetBuilderCtor.visitLargeList returns LargeListBuilder - Export LargeListBuilder from Arrow.ts and Arrow.dom.ts Latent bug fix: - util/buffer.ts: rebaseValueOffsets now coerces its number offset to BigInt when the offsets array is BigInt64Array. Previously a non-zero offset on a 64-bit offsets array would TypeError on bigint += number; fix is required for LargeList IPC writes with non-zero slice offsets and also fixes the same latent issue on LargeUtf8 / LargeBinary Tests: - generate-test-data.ts: factor a shared generateListLike helper used by both generateList (Int32Array offsets) and generateLargeList (BigInt64Array offsets); truncate min/max in createVariableWidthOffsets64 so fractional stride values don't RangeError in BigInt() - generated-data-tests.ts: LargeList case added to the matrix - builders/builder-tests.ts: LargeListBuilder entry added alongside ListBuilder / FixedSizeListBuilder / MapBuilder - visitor-tests.ts: visitLargeList added to BasicVisitor / FeatureVisitor plus describe entries; fix missing comma in the import list that would have broken compilation API surface: - Export LargeList and LargeListBuilder from Arrow.ts and Arrow.dom.ts The implementation follows existing code patterns. All tests pass. Closes apache#70 Co-Authored-By: Claude Code <noreply@anthropic.com>
| } | ||
| public visitLargeList<T extends LargeList>(data: Data<T>) { | ||
| return { | ||
| 'OFFSET': [...bigNumsToStrings(data.valueOffsets, 2)], |
This PR was co-authored with Claude Code.
Summary
This PR builds on an unresolved #299 to implement full support for the
LargeListdata type in Apache Arrow JavaScript bindings.LargeListuses 64-bit offsets (BigInt64Array) instead of 32-bit offsets, enabling list values larger than 2GB.Where possible, the code size was reduced by distilling helpers used in both
ListandLargeList.Related Issues
Closes #70
Implementation Details
Core Type System
Type.LargeList = 21enum valueLargeList<T>class withBigInt64Arrayoffset supportDataType.isLargeList()type guardLargeListDataPropsinterface andMakeDataVisitor.visitLargeList(widens 32-bit offsets viatoBigInt64Array)LargeListandLargeListBuilderintoTypeToDataType,TypeToBuilder, andDataTypeToBuilderininterfaces.tsVisitor Pattern Implementation
Wired
visitLargeList()across every visitor, factoring shared helpers where the offset width was the only difference:GetVisitor/SetVisitor: mergedgetList/setListinto single helpers usingbigIntToNumberat the offset boundary — one implementation covers both List and LargeListIteratorVisitor,IndexOfVisitor: registervisitLargeList(the generic implementations are offset-width agnostic)TypeComparator: widened compareList toList | LargeList(structural comparison only)VectorAssembler: generalizedassembleListVectorto coerce begin/end viabigIntToNumber; registersvisitLargeListVectorLoader:visitLargeListmirrorsvisitList; basereadOffsetsalready honorsOffsetArrayType(BigInt64Array)JSONVectorAssembler: emitsOFFSETviabigNumsToStrings, matching theLargeUtf8/LargeBinarypatternTypeAssembler/JSONTypeAssembler:FlatBuffers+ JSON type serializationIPC Support
ipc/metadata/message.ts:decodeFieldTypehandlesType.LargeListLatent Bug Fix
util/buffer.ts:rebaseValueOffsetsnow coerces its number offset toBigIntwhen the offsets array isBigInt64Array. Previously a non-zero offset on a 64-bit offsets array wouldTypeErroron bigint += number — required forLargeListIPC writes on sliced data, and also fixes the same latent issue forLargeUtf8/LargeBinary.Builders
src/builder/largelist.ts(LargeListBuilder), mirroringListBuilderwithBigInt()for offset accumulation andNumber()coercion when passing the start index tochild.setVariableWidthBuilderbound to includeLargeListinbuilder.tsGetBuilderCtor.visitLargeListreturnsLargeListBuilderTesting
test/generate-test-data.ts:generateListLikehelper used by bothgenerateList(Int32) andgenerateLargeList(BigInt64)createVariableWidthOffsets64; truncatesmin/maxat entry so fractional stride fromchildVec.length / (length - nullCount)doesn'tRangeErrorinBigInt()test/unit/generated-data-tests.ts:LargeListadded to the matrixtest/unit/builders/builder-tests.ts:LargeListBuilderentry added alongsideListBuilder/FixedSizeListBuilder/MapBuildertest/unit/visitor-tests.ts:visitLargeListadded toBasicVisitor/FeatureVisitorand to both describe matricesPublic API
LargeListandLargeListBuilderfromsrc/Arrow.tsandsrc/Arrow.dom.tsTest Plan
All existing tests continue to pass, plus the
LargeListpath is exercised by:get/set/iterator/indexOf/slice/concat/ IPC round-tripBasicVisitor+FeatureVisitor)JSONVectorAssembler/JSONVectorLoader)All tests across 45 suites pass.
The tests were run with:
Checklist
get/set/iterator/indexOf/TypeComparator/VectorAssembler/VectorLoader/JSONVectorAssembler/TypeAssembler/JSONTypeAssembler)LargeListBuilderadded and wired throughGetBuilderCtor+interfaces.tsrebaseValueOffsetsbigint bug fixedNotes
LargeListsupport: IPC read/write (binary + JSON form), in-memory access and mutation, type comparison, and construction viaLargeListBuilder— parallel to the existingListtype, just with 64-bit offsets.BigInt64Arrayend-to-end). The only narrowing happens at JS-runtime boundaries whereData.sliceaccepts number — identical to theLargeUtf8/LargeBinarypolicy upstreamList/LargeListonly where the offset width was the sole difference andbigIntToNumbercoercion at the boundary made the merge non-confusing;LargeListBuilderstays separate because theBigInt()/Number()coercions in_flushPendingwould obscure a merged version