Arena ToT: incremental multi-page construction#549
Conversation
Replace the one-shot Arena (a single slab sized by a pre-walk of every inner range) with a multi-page bump allocator, so an arena-backed ToT outer tile no longer has to know all inner-tensor sizes before construction. Arena (arena.h): - A growing list of pages; each page buffer is a stable heap block, so the raw Cell* of every ArenaTensor view stays valid as pages are added. claim_bytes() bumps the current page and appends a fresh one on a miss; a request larger than a page (or needing finer alignment) gets its own dedicated, exactly-sized page. - reserve_page() lays down a single exact page -- the up-front path uses it so kernel/einsum-built tiles keep their one contiguous slab, byte-identical to before. - Page size is a knob (TILEDARRAY_ARENA_PAGE_BYTES). Construction is single-threaded by contract (one task per outer tile); the bump path is intentionally unsynchronized. - Drop the unused plan()/ArenaPlan helpers. arena_kernels.h: - arena_outer_init reimplemented against the new Arena (reserve_page + sequential claim_bytes); signature unchanged, so einsum/tensor.h callers are untouched. - ArenaToTBuilder: one-pass incremental construction -- the caller discovers each inner range and fills the returned cell in a single step, driving its own loop. A cell larger than a page and a single-cell tile both route to an exactly-sized dedicated page. - arena_compact: coalesce a multi-page incrementally-built tile into one contiguous slab. Tests: rewrite tests/arena.cpp for the new API (page rollover, oversized/dedicated pages, single exact page, aliasing survival); add ArenaToTBuilder + arena_compact coverage for both TA::Tensor and ArenaTensor inner cells.
Add a test that builds a TA::DistArray<Tensor<ArenaTensor>> by calling ArenaToTBuilder inside the init_tiles callback -- each outer tile's inner cells are sized (jagged) and filled one at a time, with no up-front range_fn. Confirms the incremental builder composes with init_tiles and needs no new DistArray API.
The arena-ToT construction paths pre-walked their cells twice: the two-pass make_nested_tile invoked its source once to size each cell and again to fill it, so callers with a single-pass source materialized the whole outer tile into a temporary vector first. ArenaToTBuilder makes a single ascending pass possible everywhere. - make_nested_tile (arena_kernels.h): rebuilt on ArenaToTBuilder -- inner_range_fn and inner_fill_fn are now interleaved per cell instead of two full passes; no separate all-ranges walk. Cells stay zero-initialized so the no-op-fill (shape-only) path is unchanged. - DistArray::make_arena_nested_tile: rebuilt on ArenaToTBuilder; cell_source is invoked exactly once per cell in ascending order. - DistArray::init_elements (arena branch): drops the std::vector<R> that collected every inner tensor of the outer tile before building. - DistArray::set(i, InIter) (arena branch): drops the std::vector that buffered the single-pass iterator; it now feeds straight through. - ArrayImpl retile (arena-ToT branch): builds each target tile with ArenaToTBuilder, one source-cell lookup per cell instead of two. Eliminates a peak-memory doubling during construction (the temporary held the whole tile's data alongside the arena slab). foreach / make_array were also reviewed: both are tile-type-agnostic (the result tile is default-constructed and the user op populates it) -- no two-pass machinery there, nothing to relax.
There was a problem hiding this comment.
Pull request overview
This PR enables incremental construction of arena-backed tensor-of-tensors tiles by replacing one-shot slab allocation assumptions with a multi-page arena and a new builder API.
Changes:
- Adds multi-page
Arenaallocation with standard/dedicated pages and allocation accounting. - Introduces
ArenaToTBuilderandarena_compactfor one-pass ToT construction and compaction. - Updates DistArray/retile paths and tests to exercise incremental arena-backed construction.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
src/TiledArray/tensor/arena.h |
Reworks Arena into a multi-page bump allocator and updates ArenaResource. |
src/TiledArray/tensor/arena_kernels.h |
Adds ArenaToTBuilder, updates nested tile construction, and adds compaction support. |
src/TiledArray/dist_array.h |
Removes buffering in arena ToT set/init_elements paths via one-pass construction. |
src/TiledArray/array_impl.h |
Updates arena ToT retile construction to use the incremental builder. |
tests/arena.cpp |
Updates arena unit tests for multi-page allocation behavior. |
tests/arena_kernels.cpp |
Adds builder/compaction tests for Tensor<Tensor<...>> inners. |
tests/arena_tensor_kernels.cpp |
Adds builder/compaction and DistArray incremental construction tests for ArenaTensor inners. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| /// fill. A zero-volume range leaves the cell null. Outer element indices | ||
| /// translate via `outer_range().ordinal(idx)`. | ||
| inner_t& emplace(std::size_t ord, inner_range_t inner_range) { | ||
| TA_ASSERT(ord < n_cells_); | ||
| inner_t& cell = data_[ord]; | ||
| const std::size_t vol = inner_range.volume(); | ||
| if (vol == 0) return cell; // stays null | ||
| constexpr bool arena = is_arena_tensor_v<inner_t>; |
There was a problem hiding this comment.
Good catch — fixed in a02f28e. ArenaToTBuilder::emplace now mirrors arena_outer_init's non-arena zero-volume handling: an owning (non-view) inner given a rank>0 zero-volume range is built as inner_t(range) (empty but rank-preserving); a rank-0 range — and any arena view inner, which cannot carry a standalone range — stays null. Added a regression test builder_zero_volume_nonscalar_range_keeps_rank.
ArenaToTBuilder::emplace, given a zero-volume range, used to leave the
cell default/null. For an owning (non-view) inner that drops the range
metadata -- arena_outer_init keeps a rank>0 zero-volume range as an
empty-but-ranked tensor and only collapses a rank-0 range to null. Since
make_nested_tile now routes through the builder, mirror that handling so
a TA::Tensor inner with e.g. Range{0} stays an empty rank-1 tensor.
Arena view inners (which cannot carry a standalone range) still go null.
Adds a regression test.
Also drops a stale type_traits.h comment that listed TensorInterface as
an is_tensor_view specialization -- it is deliberately not a view.
Summary
Lets an arena-backed tensor-of-tensors outer tile (
Tensor<ArenaTensor>)be built incrementally — inner cells sized and filled one at a time —
instead of requiring every inner range up front via a
range_fn.Stacks on #548 (base is
feature/arena_tensor); retarget tomasteronce #548 merges.
Arena→ multi-page bump allocator. A growing list of pages; eachpage buffer is a stable heap block, so the raw
Cell*of everyArenaTensorview stays valid as pages are added.claim_bytes()bumps the current page and appends a fresh one on a miss; a request
larger than a page gets its own dedicated, exactly-sized page.
reserve_page()lays down a single exact page — the up-front pathuses it, so kernel/einsum-built tiles keep their one contiguous slab,
byte-identical to before. Page size is a knob
(
TILEDARRAY_ARENA_PAGE_BYTES).ArenaToTBuilder— one-pass incremental construction: the callerdiscovers each inner range and fills the returned cell in a single
step.
arena_compactcoalesces a multi-page tile into one slab.DistArrayconstruction.make_nested_tile,make_arena_nested_tile,init_elements,set(i, InIter), and theArrayImplretile path are rebuilt on the builder: the two-passsize-then-fill walk and the
std::vectortemporaries that buffered awhole outer tile's worth of inner tensors are gone.
foreach/make_arraywere reviewed — tile-type-agnostic, no two-pass machinery,no change needed.
Test plan
Arenaunit suite — page rollover, oversized/dedicated pages,single exact page, aliasing survival.
ArenaToTBuilder+arena_compactcoverage for bothTA::Tensorand
ArenaTensorinner cells.DistArray-level incremental construction viainit_tiles+ArenaToTBuilder.tiledarray/unit/run-np-1) passes 100%.