From f330d7336ad00a85275b614836497a70bb8c4d18 Mon Sep 17 00:00:00 2001 From: Weiyao Luo <9347182+SeliMeli@users.noreply.github.com> Date: Mon, 11 May 2026 08:40:35 +0000 Subject: [PATCH 01/14] docs(rfc): integrate PiPNN as an alternative graph-index build algorithm Adds an RFC proposing PiPNN (arXiv:2602.21247) as a second graph-index build algorithm for DiskANN's disk index. Integration is two-stage: Stage 1 lands PiPNN behind a build-algorithm selector with Vamana as default; Stage 2 (conditional on Stage 1 milestones) retires Vamana's full-rebuild path while keeping it for incremental inserts via the hybrid update model. Co-Authored-By: Claude Opus 4.7 (1M context) --- rfcs/00000-pipnn-integration.md | 312 ++++++++++++++++++++++++++++++++ 1 file changed, 312 insertions(+) create mode 100644 rfcs/00000-pipnn-integration.md diff --git a/rfcs/00000-pipnn-integration.md b/rfcs/00000-pipnn-integration.md new file mode 100644 index 000000000..65752c9b7 --- /dev/null +++ b/rfcs/00000-pipnn-integration.md @@ -0,0 +1,312 @@ +# Integrate PiPNN as an Alternative Graph-Index Build Algorithm + +| | | +|---|---| +| **Authors** | Weiyao Luo | +| **Contributors** | DiskANN team | +| **Created** | 2026-05-11 | +| **Updated** | 2026-05-11 | + +## Summary + +Add **PiPNN** (Pick-in-Partitions Nearest Neighbors, arXiv:2602.21247) as a second graph-construction algorithm for DiskANN's disk index. PiPNN produces a graph byte-compatible with Vamana's disk format and search API, at **up to 6.3× lower build time** on the workloads we have measured. Vamana remains the default and the only algorithm supported for incremental inserts; PiPNN is the proposed faster path for full rebuilds. + +## Motivation + +### Background + +DiskANN currently builds the disk index with a single algorithm — **Vamana** (`diskann-disk/src/build/builder/`). Vamana incrementally inserts each point into a graph, running a greedy search + `RobustPrune` for each insertion, producing the on-disk format documented in `diskann-disk/src/storage/`. + +**PiPNN** (Pick-in-Partitions Nearest Neighbors, arXiv:2602.21247) is a partition-based **batch** graph builder, in contrast to Vamana's **incremental** insert + prune. The construction has four phases: + +1. **Partition** — Randomized Ball Carving (RBC) recursively splits the dataset into small *overlapping* leaf clusters. Each point lands in `fanout` of its nearest cluster leaders at every recursion level, so every point appears in multiple leaves. Recursion stops when a cluster fits a configured leaf-size cap (`c_max`, typically 256–1024 points). +2. **Local k-NN per leaf** — For each leaf, compute the full pairwise distance matrix in one batched GEMM call, then extract each point's `leaf_k` nearest neighbors inside the leaf. GEMM batching is the source of most of PiPNN's wall-clock advantage over per-point greedy search. +3. **HashPrune merge** — Edges from all leaves are merged into a per-point reservoir of bounded size (`l_max`, ~64–128). The pruner is keyed by an LSH **angular bucket** of each candidate neighbor: at most one candidate per bucket is retained, and on collision the closer candidate wins. This produces a diverse short-list per point using O(`l_max`) memory per node and O(1) amortized insert work. +4. **Optional final prune** — A single RobustPrune-style pass (same algorithm Vamana uses, with a configurable `alpha`) applies geometric occlusion to the HashPrune candidates. Used when the workload benefits from explicit graph diversification. + +The output is `Vec>` adjacency lists in the same shape Vamana produces, then handed to the existing disk-layout writer. PQ training and search-side data structures are unchanged. + +The structural trade-off: Vamana is sequential per insert with fine-grained parallelism and memory-efficient; PiPNN is batch-parallel across leaves with higher peak working memory in exchange for far shorter wall-clock builds. + +### Problem Statement + +Vamana's incremental design scales linearly in points × per-insert search cost, which makes full rebuilds expensive at the scales we operate. Measured baselines: + +| Dataset | Vamana build time | +|---|---:| +| Enron 1M (1.087M × 384, fp16, cosine_normalized) | 70s | +| BigANN 10M (10M × 128, fp16, squared_l2) | 358s | +| Enron 10M (10M × 384, fp16, cosine_normalized) | 844s | + +Frequent rebuilds (driven by data churn or parameter sweeps) and full rebuilds at 10M-scale and above are the bottleneck. PiPNN's offline benchmarks at matching recall budgets complete the same builds **up to 6.3× faster** while writing the same disk format (full numbers in the Benchmark Results section). This RFC proposes landing PiPNN so teams can opt into faster builds and so we can collect production-relevant signal on whether PiPNN can eventually replace Vamana's full-rebuild path. + +#### Hybrid update model (Stage 2 direction) + +Vamana and PiPNN write the same on-disk graph format, so a graph built by either algorithm can be *read* (and incrementally edited) by either. We exploit this for the production update story: + +- **Bulk / full rebuild → PiPNN.** When data churn is large enough to justify a full rebuild, PiPNN is used because it is several times faster than Vamana at this job. +- **Incremental insert → Vamana.** Between full rebuilds, individual inserts use Vamana's existing greedy-search + RobustPrune insert path. PiPNN's batch design has no natural single-point-insert API and we do not plan to build one. +- **Quality decay → trigger PiPNN rebuild.** When recall on the live graph degrades past a configured threshold (driven by accumulated incremental inserts), the system schedules a PiPNN full rebuild from the current dataset snapshot. + +Because both algorithms produce the same disk format, switching between "fresh PiPNN build" and "Vamana-edited delta" is transparent to search-side consumers. This answers "should PiPNN implement incremental inserts?" — no, we keep Vamana for that, and use the disk index format as the integration point. + +#### Two-stage rollout + +- **Stage 1 (this RFC):** Land PiPNN behind a build-algorithm selector. Vamana stays default; PiPNN is opt-in. Stage 1 has explicit milestones (in Future Work) that gate readiness for Stage 2. +- **Stage 2 (separate proposal, conditional on Stage 1 milestones):** Retire the Vamana **full-rebuild** path. Vamana remains the implementation for incremental inserts via the hybrid model above. + +### Goals + +1. **Algorithm-level pluggability**: introduce a build-algorithm selector to the build pipeline that routes between Vamana (existing) and PiPNN (new). Existing build sites continue to default to Vamana with no behavior change. +2. **Disk format compatibility**: the PiPNN-built index is byte-compatible with Vamana-built indexes on disk — search, PQ, and storage layouts are unchanged. This is the foundation for the hybrid update model. +3. **Public API compatibility**: the disk-index public API surface (`DiskIndexBuilder::new`, `IndexConfiguration`, `DiskIndexWriter`, JSON config schema) remains backward-compatible. PiPNN configuration is added under a new tagged enum variant. +4. **Feature-parity milestones**: deliver the Vamana capabilities PiPNN needs for a full-rebuild role in production (see Future Work below). +5. **Documented memory mitigation**: provide a configuration knob (three-tier build) that brings PiPNN's peak RSS to or below Vamana's at the cost of build time. + +## Proposal + +### Workspace structure + +Add a new crate, `diskann-pipnn`, that depends on the existing `diskann`, `diskann-disk`, `diskann-linalg`, `diskann-vector`, `diskann-quantization`, and `diskann-utils` crates. PiPNN lives outside `diskann-disk` so the core disk path has no compile-time dependency on PiPNN; the disk builder takes a typed `BuildAlgorithm` and only depends on PiPNN behind a feature flag. + +```text +diskann/ # core types, traits, search +diskann-disk/ # disk index layout, builder, search + └── feature "pipnn" # opt-in dependency on diskann-pipnn +diskann-pipnn/ # new: PiPNN builder +diskann-linalg/ # GEMM/SVD (used by both Vamana and PiPNN) +diskann-quantization/ # PQ/SQ training (used by both) +``` + +### `BuildAlgorithm` enum + +Introduce a tagged enum in `diskann-disk/src/build/configuration/build_algorithm.rs`: + +```rust +#[derive(Debug, Clone, Default, PartialEq, Serialize, Deserialize)] +#[serde(tag = "algorithm")] +pub enum BuildAlgorithm { + /// Default Vamana graph construction. + #[default] + Vamana, + + /// PiPNN: Pick-in-Partitions Nearest Neighbors. + #[cfg(feature = "pipnn")] + PiPNN { + c_max: usize, // maximum leaf partition size + c_min: usize, // minimum cluster size before merging + p_samp: f64, // RBC leader sampling fraction + fanout: Vec, // per-level fanout + leaf_k: usize, // k-NN within each leaf + replicas: usize, // independent partitioning passes + l_max: usize, // HashPrune reservoir cap + num_hash_planes: usize, // LSH hyperplane count + final_prune: bool, // optional RobustPrune final pass + leader_cap: usize, // hard cap on leaders per level + saturate_after_prune: bool, + }, +} +``` + +`Vamana` is the `Default` so every existing call site that constructs `DiskIndexBuildParameters` without specifying an algorithm keeps the existing behavior. + +`DiskIndexBuildParameters` gains a `build_algorithm: BuildAlgorithm` field and a constructor pair: `new` (defaults to Vamana, no PiPNN dep) and `new_with_algorithm` (explicit). The JSON schema for benchmark configs gains an optional `build_algorithm` block that, when present, deserializes via `#[serde(tag = "algorithm")]` into one of the variants above. + +### Builder dispatch + +In `DiskIndexBuilder::build()` (or the new equivalent), dispatch on `BuildAlgorithm`: + +```rust +match build_parameters.build_algorithm() { + BuildAlgorithm::Vamana => + self.build_inmem_vamana_index().await, + #[cfg(feature = "pipnn")] + BuildAlgorithm::PiPNN { .. } => + self.build_inmem_pipnn_index().await, +} +``` + +The PiPNN path produces a `Vec>` adjacency list using `diskann_pipnn::builder::build_typed`, then hands it to the existing disk-layout writer (`DiskIndexWriter`) which emits the same format Vamana does (header, per-node adjacency, frozen start-point block). PQ training and disk-sector layout are reused unchanged. + +### Compatibility surface + +| Surface | Status | +|---|---| +| On-disk graph format (header + adjacency + frozen start point) | unchanged | +| PQ codes / SQ codes on disk | unchanged (trained the same way) | +| Search API (`DiskANNIndex::search`, beam_width, search_list, recall_at, num_nodes_to_cache, search_io_limit, filters API) | unchanged | +| Public Rust types (`IndexConfiguration`, `DiskIndexWriter`, `DiskIndexBuildParameters`) | additive only (new field with default) | +| Benchmark JSON config | additive only (new optional `build_algorithm` field) | +| C/C++ FFI (if any) | unchanged | + +Since the produced graph and PQ/SQ artifacts are byte-identical in format, a search-only consumer cannot tell which builder wrote the index. + +### Feature gating + +- The `diskann-disk` crate gains a `pipnn` Cargo feature. With it disabled, `BuildAlgorithm::PiPNN` does not exist at the type level — no runtime branch, no extra binary size, no dependency on `diskann-pipnn`. +- The benchmark binary and any production binary that wants PiPNN must enable the `pipnn` feature on `diskann-disk` (or transitively). +- The default features set continues to not include `pipnn`, matching the principle that the existing Vamana path is what ships unchanged. + +### What this RFC does *not* change + +- Distance metrics, vector representations, storage layouts. +- The greedy-search / RobustPrune logic used by Vamana — both stay as-is for the Vamana path. PiPNN brings its own equivalents internally (HashPrune + optional final RobustPrune). +- PQ training, search-time decoders, and the disk layout. +- Public traits, types, or method signatures outside the new optional fields/variants described above. + +## Trade-offs + +### PiPNN is algorithmically batch-only + +This is a property of the algorithm, not of our implementation. The PiPNN paper (arXiv:2602.21247) is explicit that the design departs from incremental methods by "eliminating search from the graph-building process altogether": instead of running a greedy search for each new point's neighbors, PiPNN partitions the dataset, then computes neighbors for all points within each leaf as a single batched operation. The paper describes no per-point insertion algorithm and reports no streaming results. The framing throughout is "fast one-shot construction on a static dataset." + +Where this batch assumption is load-bearing: + +- **Partition (RBC)** samples leaders from the global dataset distribution and recursively splits into overlapping leaves. Leader quality depends on representativeness of the full data. Adding new points to an existing partition works mechanically (assign to fanout nearest existing leaders), but the *partition itself* is a one-shot decision — the cluster structure can drift as the data distribution shifts. +- **Leaf k-NN via GEMM** is where PiPNN gets its speed. A leaf's pairwise distance matrix is computed in one batched matrix multiplication and amortizes per-leaf overhead across `c_max²` distance evaluations. **This is the algorithm's central optimization, and it requires knowing the leaf membership before computing distances.** Inserting one point against an existing leaf reduces to `c_max` individual distance computations, which is no faster than what Vamana already does per insert — the batching advantage evaporates at batch size 1. +- **HashPrune** is the one PiPNN component that *is* online — it accepts an arbitrary stream of `(point, neighbor, distance)` edges and maintains a bounded reservoir per point. So the merge stage doesn't structurally object to incremental updates. But by the time you have edges to feed it, you've already paid for the partition assignment and the per-leaf distance work. +- **Final RobustPrune** is per-point and naturally re-runnable. + +In other words: of PiPNN's four phases, two (partition, leaf k-NN) are batch-by-design and would need to be replaced for true incremental construction. Replacing them defeats the purpose — the algorithm degenerates into something more like Vamana but without Vamana's online-friendly graph-search structure. + +The realistic alternatives for "PiPNN-like incremental" are all mini-batch variants (accumulate N new points → run a partial partition + leaf-build), which works fine but isn't really an incremental algorithm. Vamana already does per-point online inserts correctly; we keep it for that role. + +This is why the Motivation section's hybrid update model exists: **PiPNN for full rebuilds, Vamana for inserts**, with the disk format as the integration point. PiPNN is not a drop-in replacement for code paths that rely on `insert(point)` semantics — and the limitation is the algorithm, not just our crate's API surface. + +### Memory vs build speed + +PiPNN's batch design holds more working memory during build than Vamana's incremental design. The dominant overhead is the **HashPrune reservoir** — a bounded per-point candidate list (`l_max × 8 bytes` per point) that PiPNN needs to merge edges from overlapping leaves. Vamana has no equivalent: it writes neighbors directly into the final adjacency list as it inserts each point. + +For example, on BigANN 10M (10M × 128 fp16, `c_max=256, fanout=[10,3], leaf_k=3, l_max=64`): + +| | PiPNN one-shot | Vamana | +|---|---:|---:| +| Peak RSS | 10.8 GB | 6.3 GB | + +That delta — roughly **+4.5 GB**, dominated by HashPrune (`10M × 64 × 8 ≈ 5 GB`) plus smaller PiPNN-only working buffers (LSH sketches, partition leaf indices) — is the cost of the batch design and not a bug. It is the working set the algorithm explicitly needs. The next subsection describes the mitigation. + +### Memory mitigation: three-tier build + +For deployments that need PiPNN's build speed but cannot afford its working memory, we reuse the same **`MemoryBudget`** parameter Vamana already uses for sharded builds. When `build_ram_limit_gb` is below a threshold, PiPNN switches to a chunked path that spills HashPrune reservoirs to disk between leaf batches. Measurements on the same dataset as the table above (BigANN 10M): + +| Strategy | Peak RSS | Build time | Recall@10 L=50 | Trigger | +|---|---:|---:|---:|---| +| **One-shot** (in-memory) | 10.8 GB | 133s | 95.00% | RAM ≥ ~32 GB | +| **Disk-edges** (per-batch reservoir flush) | 6.4 GB | 126s | 95.00% | RAM 8-32 GB | +| **Merged shards** (per-shard graph, then merge) | 3.3 GB | 332s | 95.31% | RAM 4-8 GB | + +The merged-shards path **uses less peak RSS than Vamana** (3.3 GB vs Vamana's 6.3 GB on this same dataset) at a 2.5× build-time cost. The disk-edges path matches Vamana on RAM at 3× the build speed. + +The control knob is the existing `build_ram_limit_gb` config; no new parameter is introduced. The dispatch happens inside `build_inmem_pipnn_index()`. + +### Stage-1 separate path vs immediate-replace + +We considered three options: + +**A. (Chosen) Add PiPNN as an alternative behind a feature flag.** Default is Vamana, opt-in for PiPNN. Existing users see no change. Lets us collect production validation signal without risk. + +**B. Replace Vamana with PiPNN immediately.** Cleaner code, smaller binary. Rejected because: (1) PiPNN lacks checkpoint, full quantization, and label-filtered search support today — replacing now is a regression; (2) we have not validated PiPNN under the full production workload mix; (3) recall behavior on edge-case datasets is not yet characterized at production scale. + +**C. Maintain PiPNN as a fully separate top-level binary/crate.** Rejected because it would duplicate the PQ training, disk-layout writer, search pipeline, and benchmark harness — adding maintenance burden with no compatibility benefit. + +### Algorithm risks + +PiPNN's recall depends on partition overlap (controlled by `fanout`) and reservoir size (`l_max`). On the workloads in the benchmark section recall matches or beats Vamana at the chosen settings, but the parameter space is larger than Vamana's `R`/`L_build`. Stage-1 mitigates by keeping Vamana as the default and providing reference parameter sets in code comments and benchmark configs. + +## Benchmark Results + +All benchmarks run on Azure `Standard_L16s_v3` (Intel Xeon Platinum 8370C, 16 threads, NVMe), with `RUSTFLAGS=-C target-cpu=native`. + +### Build time + +| Dataset | Vamana | PiPNN (one-shot) | Speedup | +|---|---:|---:|---:| +| Enron 1M (1.087M × 384, fp16, cosine_normalized) | 70s | 13s | 5.4× | +| BigANN 10M (10M × 128, fp16, squared_l2) | 358s | 80.2s | 4.5× | +| Enron 10M (10M × 384, fp16, cosine_normalized) | 844s | 133s | 6.3× | + +### Recall / QPS — BigANN 10M + +Config: PiPNN `c_max=256, fanout=[10,3], leaf_k=3, l_max=64, hp=12, pq_chunks=64, no final_prune`. Vamana `R=64, L=64, pq_chunks=64`. + +| L | PiPNN Recall@10 | PiPNN QPS | Vamana Recall@10 | Vamana QPS | +|---|---:|---:|---:|---:| +| 10 | 77.76% | 10,670 | 79.23% | 11,618 | +| 50 | 96.31% | 5,574 | 97.10% | 5,940 | +| 100 | 98.61% | 3,430 | 99.01% | 3,568 | + +With higher-recall PiPNN config (`c_max=512, fanout=[10,4], leaf_k=3, l_max=128, final_prune`), PiPNN exceeds Vamana on recall at L=50 (97.22% vs 97.10%) and L=100 (99.21% vs 99.01%) at the cost of 143s build time (still 2.5× faster than Vamana's 358s). + +### Recall / QPS — Enron 10M (384d) + +Config: PiPNN `c_max=256, fanout=[8,3], leaf_k=2, l_max=64, hp=14, pq_chunks=192`. Vamana `R=64, L=72, pq_chunks=192`. + +| L | PiPNN Recall@1000 | PiPNN QPS | Vamana Recall@1000 | Vamana QPS | +|---|---:|---:|---:|---:| +| 1000 | 89.99% | 378 | 89.33% | 384 | +| 1500 | 95.19% | 255 | 94.12% | 258 | +| 2000 | 96.46% | 192 | 95.36% | 195 | +| 2500 | 97.23% | 154 | 96.15% | 155 | +| 3000 | 97.74% | 129 | 96.68% | 130 | + +PiPNN beats Vamana on recall at every L on the 384d Enron 10M workload, at parity QPS and 6.3× faster build. + +## Future Work + +The Stage 1 milestones below are gating items for Stage 2 (retiring Vamana's full-rebuild path). Each must be addressed before that proposal is credible. M0 is the foundation shipped by this RFC; M1–M7 are deferred to follow-on work and ordered by dependency, not strict calendar sequence — some can run in parallel. + +### M0 — Skeleton integration + +The foundation that ships first: introduce the `diskann-pipnn` crate, the `BuildAlgorithm` enum, and the dispatch in `DiskIndexBuilder` behind a `pipnn` Cargo feature. The JSON config gains an optional `build_algorithm` block; default behavior is unchanged. PiPNN-built indexes are read by the existing search pipeline unchanged (the on-disk format is identical) and produce recall numbers within the tolerances the existing disk-index test suite enforces. CI runs the benchmark binary with `--features pipnn` on a small smoke test (SIFT-1M). + +This milestone delivers the opt-in alternative described in this RFC. M1-M4 close the feature-parity gaps; M5-M7 are validation. + +### M1 — Feature parity: checkpoint / resume + +Add checkpoint/resume to the PiPNN build pipeline using the existing `CheckpointManager` / `ChunkingConfig` infrastructure in `diskann-disk/src/build/chunking/`. The natural checkpoint boundaries are the partition output (`Vec`), per-leaf HashPrune flush, and post-extract graph. Behavior matches Vamana's: a killed build resumes from the last checkpoint instead of starting over. Validation is a kill-and-resume test on BigANN 10M at three different checkpoint phases; final graph is byte-identical to a non-interrupted build given the same seeds. + +### M2 — Feature parity: quantized vector support + +PiPNN currently has only a `SQ1` (1-bit) build path. Extend the build to accept `QuantizationType::SQ { nbits, standard_deviation }` for the same `nbits` values Vamana supports (`SQ_2`, `SQ_4`, `SQ_8`). Reuse the trained `ScalarQuantizer` from `diskann-quantization` rather than duplicating quantizer training. The leaf-build distance kernel needs an `nbits`-aware path; the current implementation is either FP (GEMM) or 1-bit Hamming. Validation: PiPNN at `SQ_8` produces recall within 0.5% of FP for BigANN 10M and Enron 10M, matching the Vamana SQ_8 baseline. + +*Note: Build-time Product Quantization (PQ-distance during graph construction) is not currently used by Vamana in any production path and is out of scope.* + +### M3 — Feature parity: label-filtered indexes + +PiPNN-built graphs already work with the existing search-time filter pipeline (`diskann-label-filter`) because the disk format is the same. The build-time flow for filter-aware indexes (`FilteredIndex`, `vector_filter_file`) has not been exercised end-to-end. M3 runs the filter benchmark JSON configs with `BuildAlgorithm::PiPNN` and confirms filter-recall numbers match Vamana's. If gaps surface — for example, the partition phase needing label-aware leaf assignment for high-cardinality labels — they are documented as M3 follow-ups. + +### M4 — Memory mitigation: three-tier dispatch + +Implement two memory-constrained PiPNN paths and select among them via the existing `build_ram_limit_gb` knob: + +- **Disk-edges**: HashPrune reservoirs spill to disk between leaf batches when `MemoryBudget` is below a threshold (currently ~8 GB for 10M-scale workloads). +- **Merged-shards**: per-shard graphs built independently then merged, mirroring Vamana's `build_merged_vamana_index` pipeline at `diskann-disk/src/build/builder/build.rs:327`. The existing shard merger is reused. + +Dispatch happens inside `build_inmem_pipnn_index()` — no new public parameter. Validation: at `build_ram_limit_gb=4`, the PiPNN-merged path on BigANN 10M produces peak RSS ≤ 4 GB and recall within 1% of one-shot PiPNN. + +### M5 — Production validation: recall × QPS × dimensionality matrix + +End-to-end validation on the full production workload mix. At minimum three dataset families (BigANN, Enron, plus one production-representative), scales of 10M and 100M (and one billion-scale sample if hardware permits), and both `squared_l2` and `cosine_normalized` metrics. The pass criterion for each (dataset, scale, metric) cell: PiPNN recall@K is within Vamana's recall ±1% at matching QPS, *or* higher QPS at matching recall. Cells that fall outside the band are documented as "PiPNN not yet recommended for X" rather than blocking Stage 2 entirely. + +### M6 — Production validation: hybrid update model + +Validate the Stage-2 hybrid loop end-to-end: build a graph with PiPNN, apply N incremental Vamana inserts representing production churn, measure recall decay vs. graph age, trigger a PiPNN rebuild from the current snapshot, and confirm post-rebuild recall is restored. The output is a recommended "quality decay threshold" for production triggers based on the measured curve. M6 also confirms that Vamana's incremental-insert path reads the PiPNN-produced graph correctly — this is the disk-format compatibility test that matters most for the hybrid model. + +### M7 — Operational readiness + +Build-time telemetry: emit per-phase timing and peak RSS via the existing OpenTelemetry tracer, comparable to Vamana's spans. Documentation: replace the experimental notes in `CLAUDE.md` with a permanent doc covering recommended parameters per workload class (dim × scale × metric). Runbook: failure modes (OOM under one-shot, partition timeout, l_max saturation), how to diagnose, how to recover. Default parameter recommendations are baked into the JSON config builder so users don't hand-tune for common cases. + +### Out of scope (intentionally not on this list) + +- **Build-time PQ distance kernel.** Not used by Vamana in production paths today; deferred indefinitely. +- **PiPNN incremental insert API.** The hybrid model (PiPNN rebuild + Vamana inserts) removes the need. +- **PiPNN incremental delete API.** Same reason. +- **Frozen-point semantics differences.** PiPNN writes the dataset medoid as the single frozen start point, same as Vamana's default. Already byte-compatible; no work required. +- **Multi-vector index support.** Out of scope for Stage 1; revisit only if a production workload requires it. + +## References + +1. [PiPNN: Pick-in-Partitions Nearest Neighbors (arXiv:2602.21247)](https://arxiv.org/abs/2602.21247) +2. [Vamana / DiskANN (NeurIPS 2019)](https://papers.nips.cc/paper/9527-rand-nsg-fast-accurate-billion-point-nearest-neighbor-search-on-a-single-node.pdf) +3. Existing disk index layout: `diskann-disk/src/storage/` +4. Existing Vamana builder: `diskann-disk/src/build/builder/build.rs` From e8d1cd250ef7754c2794f09f6ff42b9ddf7376d1 Mon Sep 17 00:00:00 2001 From: Weiyao Luo <9347182+SeliMeli@users.noreply.github.com> Date: Mon, 11 May 2026 08:40:35 +0000 Subject: [PATCH 02/14] docs(rfc): rename to 01049-pipnn-integration.md after PR creation Per rfcs/README.md step 4: rename from 00000-short-title.md to NNNNN-short-title.md using the zero-padded PR number (#1049). Co-Authored-By: Claude Opus 4.7 (1M context) --- rfcs/{00000-pipnn-integration.md => 01049-pipnn-integration.md} | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename rfcs/{00000-pipnn-integration.md => 01049-pipnn-integration.md} (100%) diff --git a/rfcs/00000-pipnn-integration.md b/rfcs/01049-pipnn-integration.md similarity index 100% rename from rfcs/00000-pipnn-integration.md rename to rfcs/01049-pipnn-integration.md From 4ba6d4a600976d6c6a20c306476ad157fab59c9e Mon Sep 17 00:00:00 2001 From: Weiyao Luo <9347182+SeliMeli@users.noreply.github.com> Date: Mon, 11 May 2026 08:40:35 +0000 Subject: [PATCH 03/14] docs(rfc): add M1 in-memory build/search milestone, list-format milestones MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Add new M1 for in-memory build/search parity with Vamana (PiPNN today only feeds into DiskIndexWriter; a path that populates a DiskANNIndex directly for in-mem-only consumers is missing). - Renumber M1-M7 → M2-M8. - Convert each milestone's plain-text paragraph into bullet lists (Scope / Validation / etc.) for readability per RFC reviewer feedback. Co-Authored-By: Claude Opus 4.7 (1M context) --- rfcs/01049-pipnn-integration.md | 80 ++++++++++++++++++++++++--------- 1 file changed, 59 insertions(+), 21 deletions(-) diff --git a/rfcs/01049-pipnn-integration.md b/rfcs/01049-pipnn-integration.md index 65752c9b7..7f5f94acc 100644 --- a/rfcs/01049-pipnn-integration.md +++ b/rfcs/01049-pipnn-integration.md @@ -253,48 +253,86 @@ PiPNN beats Vamana on recall at every L on the 384d Enron 10M workload, at parit ## Future Work -The Stage 1 milestones below are gating items for Stage 2 (retiring Vamana's full-rebuild path). Each must be addressed before that proposal is credible. M0 is the foundation shipped by this RFC; M1–M7 are deferred to follow-on work and ordered by dependency, not strict calendar sequence — some can run in parallel. +The Stage 1 milestones below are gating items for Stage 2 (retiring Vamana's full-rebuild path). Each must be addressed before that proposal is credible. M0 is the foundation shipped by this RFC; M1–M8 are deferred to follow-on work and ordered by dependency, not strict calendar sequence — some can run in parallel. ### M0 — Skeleton integration -The foundation that ships first: introduce the `diskann-pipnn` crate, the `BuildAlgorithm` enum, and the dispatch in `DiskIndexBuilder` behind a `pipnn` Cargo feature. The JSON config gains an optional `build_algorithm` block; default behavior is unchanged. PiPNN-built indexes are read by the existing search pipeline unchanged (the on-disk format is identical) and produce recall numbers within the tolerances the existing disk-index test suite enforces. CI runs the benchmark binary with `--features pipnn` on a small smoke test (SIFT-1M). +The foundation that ships first. -This milestone delivers the opt-in alternative described in this RFC. M1-M4 close the feature-parity gaps; M5-M7 are validation. +- **Scope:** introduce the `diskann-pipnn` crate, the `BuildAlgorithm` enum, and the dispatch in `DiskIndexBuilder` behind a `pipnn` Cargo feature. +- **Config surface:** JSON config gains an optional `build_algorithm` block; default behavior unchanged. +- **Compatibility:** PiPNN-built indexes are read by the existing search pipeline unchanged (the on-disk format is identical) and produce recall numbers within the tolerances the existing disk-index test suite enforces. +- **CI:** benchmark binary runs with `--features pipnn` on a small smoke test (SIFT-1M). -### M1 — Feature parity: checkpoint / resume +M1–M5 close the feature-parity gaps; M6–M8 are validation and operational readiness. -Add checkpoint/resume to the PiPNN build pipeline using the existing `CheckpointManager` / `ChunkingConfig` infrastructure in `diskann-disk/src/build/chunking/`. The natural checkpoint boundaries are the partition output (`Vec`), per-leaf HashPrune flush, and post-extract graph. Behavior matches Vamana's: a killed build resumes from the last checkpoint instead of starting over. Validation is a kill-and-resume test on BigANN 10M at three different checkpoint phases; final graph is byte-identical to a non-interrupted build given the same seeds. +### M1 — Feature parity: in-memory build / search -### M2 — Feature parity: quantized vector support +Vamana supports both a **disk-resident** build/search path (via `diskann-disk`) and an **in-memory only** path (via `diskann::graph::index::DiskANNIndex`). PiPNN today only produces graphs handed to `DiskIndexWriter`; an in-mem-only consumer that wants PiPNN's speed has no entry point. -PiPNN currently has only a `SQ1` (1-bit) build path. Extend the build to accept `QuantizationType::SQ { nbits, standard_deviation }` for the same `nbits` values Vamana supports (`SQ_2`, `SQ_4`, `SQ_8`). Reuse the trained `ScalarQuantizer` from `diskann-quantization` rather than duplicating quantizer training. The leaf-build distance kernel needs an `nbits`-aware path; the current implementation is either FP (GEMM) or 1-bit Hamming. Validation: PiPNN at `SQ_8` produces recall within 0.5% of FP for BigANN 10M and Enron 10M, matching the Vamana SQ_8 baseline. +- **Scope:** expose `diskann_pipnn::build_typed` output (`Vec>`) as a populated in-memory `DiskANNIndex` so callers can build + search without touching disk. +- **API:** add `diskann_pipnn::build_into_inmem_index(...)` returning an in-memory index that is read by the existing `DiskANNIndex::search` path unchanged. +- **Validation:** in-mem search recall on Enron 1M with PiPNN-built graph matches the disk-build + load round-trip recall within noise. -*Note: Build-time Product Quantization (PQ-distance during graph construction) is not currently used by Vamana in any production path and is out of scope.* +### M2 — Feature parity: checkpoint / resume -### M3 — Feature parity: label-filtered indexes +- **Scope:** add checkpoint/resume to the PiPNN build pipeline using the existing `CheckpointManager` / `ChunkingConfig` infrastructure in `diskann-disk/src/build/chunking/`. +- **Boundaries:** natural checkpoint points are partition output (`Vec`), per-leaf HashPrune flush, post-extract graph. +- **Behavior:** matches Vamana's — a killed build resumes from the last checkpoint instead of starting over. +- **Validation:** kill-and-resume test on BigANN 10M at three different checkpoint phases; final graph byte-identical to a non-interrupted build given the same seeds. -PiPNN-built graphs already work with the existing search-time filter pipeline (`diskann-label-filter`) because the disk format is the same. The build-time flow for filter-aware indexes (`FilteredIndex`, `vector_filter_file`) has not been exercised end-to-end. M3 runs the filter benchmark JSON configs with `BuildAlgorithm::PiPNN` and confirms filter-recall numbers match Vamana's. If gaps surface — for example, the partition phase needing label-aware leaf assignment for high-cardinality labels — they are documented as M3 follow-ups. +### M3 — Feature parity: quantized vector support -### M4 — Memory mitigation: three-tier dispatch +PiPNN currently has only a `SQ1` (1-bit) build path. -Implement two memory-constrained PiPNN paths and select among them via the existing `build_ram_limit_gb` knob: +- **Scope:** extend the build to accept `QuantizationType::SQ { nbits, standard_deviation }` for the same `nbits` values Vamana supports (`SQ_2`, `SQ_4`, `SQ_8`). +- **Reuse:** trained `ScalarQuantizer` from `diskann-quantization`; do not duplicate quantizer training. +- **Implementation:** the leaf-build distance kernel needs an `nbits`-aware path. Today the kernel is either FP (GEMM) or 1-bit Hamming. +- **Validation:** PiPNN at `SQ_8` produces recall within 0.5% of FP for BigANN 10M and Enron 10M, matching the Vamana SQ_8 baseline. -- **Disk-edges**: HashPrune reservoirs spill to disk between leaf batches when `MemoryBudget` is below a threshold (currently ~8 GB for 10M-scale workloads). -- **Merged-shards**: per-shard graphs built independently then merged, mirroring Vamana's `build_merged_vamana_index` pipeline at `diskann-disk/src/build/builder/build.rs:327`. The existing shard merger is reused. +*Note: build-time Product Quantization (PQ-distance during graph construction) is not currently used by Vamana in any production path and is out of scope.* -Dispatch happens inside `build_inmem_pipnn_index()` — no new public parameter. Validation: at `build_ram_limit_gb=4`, the PiPNN-merged path on BigANN 10M produces peak RSS ≤ 4 GB and recall within 1% of one-shot PiPNN. +### M4 — Feature parity: label-filtered indexes -### M5 — Production validation: recall × QPS × dimensionality matrix +PiPNN-built graphs already work with the existing search-time filter pipeline (`diskann-label-filter`) because the disk format is the same. The build-time flow for filter-aware indexes has not been exercised end-to-end. -End-to-end validation on the full production workload mix. At minimum three dataset families (BigANN, Enron, plus one production-representative), scales of 10M and 100M (and one billion-scale sample if hardware permits), and both `squared_l2` and `cosine_normalized` metrics. The pass criterion for each (dataset, scale, metric) cell: PiPNN recall@K is within Vamana's recall ±1% at matching QPS, *or* higher QPS at matching recall. Cells that fall outside the band are documented as "PiPNN not yet recommended for X" rather than blocking Stage 2 entirely. +- **Scope:** run the filter benchmark JSON configs with `BuildAlgorithm::PiPNN`; confirm filter-recall numbers match Vamana's. +- **Risk:** the partition phase may need label-aware leaf assignment for high-cardinality labels. +- **Validation:** filter-recall on a representative labeled dataset within ±1% of Vamana's filter-recall. -### M6 — Production validation: hybrid update model +### M5 — Memory mitigation: three-tier dispatch -Validate the Stage-2 hybrid loop end-to-end: build a graph with PiPNN, apply N incremental Vamana inserts representing production churn, measure recall decay vs. graph age, trigger a PiPNN rebuild from the current snapshot, and confirm post-rebuild recall is restored. The output is a recommended "quality decay threshold" for production triggers based on the measured curve. M6 also confirms that Vamana's incremental-insert path reads the PiPNN-produced graph correctly — this is the disk-format compatibility test that matters most for the hybrid model. +Implement two memory-constrained PiPNN paths and select among them via the existing `build_ram_limit_gb` knob. -### M7 — Operational readiness +- **Disk-edges:** HashPrune reservoirs spill to disk between leaf batches when `MemoryBudget` is below a threshold (currently ~8 GB for 10M-scale workloads). +- **Merged-shards:** per-shard graphs built independently then merged, mirroring Vamana's `build_merged_vamana_index` pipeline at `diskann-disk/src/build/builder/build.rs:327`. The existing shard merger is reused. +- **Dispatch:** inside `build_inmem_pipnn_index()` — no new public parameter. +- **Validation:** at `build_ram_limit_gb=4`, the PiPNN-merged path on BigANN 10M produces peak RSS ≤ 4 GB and recall within 1% of one-shot PiPNN. -Build-time telemetry: emit per-phase timing and peak RSS via the existing OpenTelemetry tracer, comparable to Vamana's spans. Documentation: replace the experimental notes in `CLAUDE.md` with a permanent doc covering recommended parameters per workload class (dim × scale × metric). Runbook: failure modes (OOM under one-shot, partition timeout, l_max saturation), how to diagnose, how to recover. Default parameter recommendations are baked into the JSON config builder so users don't hand-tune for common cases. +### M6 — Production validation: recall × QPS × dimensionality matrix + +End-to-end validation on the full production workload mix. + +- **Datasets:** at minimum three families (BigANN, Enron, plus one production-representative). +- **Scales:** 10M and 100M; one billion-scale sample if hardware permits. +- **Metrics:** `squared_l2` and `cosine_normalized`. +- **Pass criterion:** for each (dataset, scale, metric) cell, PiPNN recall@K is within Vamana's recall ±1% at matching QPS, *or* higher QPS at matching recall. +- **Out-of-band cells** are documented as "PiPNN not yet recommended for X" rather than blocking Stage 2 entirely. + +### M7 — Production validation: hybrid update model + +Validate the Stage-2 hybrid loop end-to-end. + +- **Sequence:** PiPNN build → N incremental Vamana inserts representing production churn → measure recall decay vs. graph age → trigger PiPNN rebuild from snapshot → confirm post-rebuild recall restored. +- **Output:** a recommended "quality decay threshold" for production rebuild triggers, derived from the measured decay curve. +- **Disk-format compatibility test:** confirm Vamana's incremental-insert path reads PiPNN-produced graphs correctly. This is the load-bearing compatibility check for the hybrid model. + +### M8 — Operational readiness + +- **Telemetry:** emit per-phase timing and peak RSS via the existing OpenTelemetry tracer, comparable to Vamana's spans. +- **Documentation:** replace experimental notes in `CLAUDE.md` with a permanent doc covering recommended parameters per workload class (dim × scale × metric). +- **Runbook:** failure modes (OOM under one-shot, partition timeout, `l_max` saturation), diagnosis, recovery. +- **Defaults:** parameter recommendations baked into the JSON config builder so users don't hand-tune for common cases. ### Out of scope (intentionally not on this list) From 4fe210f51468b52288295c6db0561fe36a0cd087 Mon Sep 17 00:00:00 2001 From: Weiyao Luo <9347182+SeliMeli@users.noreply.github.com> Date: Mon, 11 May 2026 08:40:35 +0000 Subject: [PATCH 04/14] docs(rfc): address Copilot review comments - Explicitly document feature-gated deserialization behavior: configs with "algorithm": "PiPNN" fail at parse time in non-pipnn binaries with a serde unknown-variant error. Not a backward-compatibility regression; configs without build_algorithm parse identically across feature combinations. - Add explanation for disk-edges path being not-slower than one-shot despite extra I/O (smaller working set, sequential append spills overlap with compute). Co-Authored-By: Claude Opus 4.7 (1M context) --- rfcs/01049-pipnn-integration.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/rfcs/01049-pipnn-integration.md b/rfcs/01049-pipnn-integration.md index 7f5f94acc..0a0ebc6cc 100644 --- a/rfcs/01049-pipnn-integration.md +++ b/rfcs/01049-pipnn-integration.md @@ -112,6 +112,8 @@ pub enum BuildAlgorithm { `DiskIndexBuildParameters` gains a `build_algorithm: BuildAlgorithm` field and a constructor pair: `new` (defaults to Vamana, no PiPNN dep) and `new_with_algorithm` (explicit). The JSON schema for benchmark configs gains an optional `build_algorithm` block that, when present, deserializes via `#[serde(tag = "algorithm")]` into one of the variants above. +**Deserialization behavior when the `pipnn` feature is disabled**: because `BuildAlgorithm::PiPNN` is gated by `#[cfg(feature = "pipnn")]`, a binary built without the feature does not see that variant. A JSON config containing `"algorithm": "PiPNN"` fed to such a binary fails at parse time with a serde error along the lines of `unknown variant 'PiPNN', expected 'Vamana'`. This is a clear, fail-fast diagnostic — not a backward-compatibility regression. Configs that omit `build_algorithm` (or set `"algorithm": "Vamana"`) parse identically across feature combinations. Documentation alongside the config schema will call this out so users know that PiPNN configs require a PiPNN-enabled build. + ### Builder dispatch In `DiskIndexBuilder::build()` (or the new equivalent), dispatch on `BuildAlgorithm`: @@ -195,6 +197,8 @@ For deployments that need PiPNN's build speed but cannot afford its working memo | **Disk-edges** (per-batch reservoir flush) | 6.4 GB | 126s | 95.00% | RAM 8-32 GB | | **Merged shards** (per-shard graph, then merge) | 3.3 GB | 332s | 95.31% | RAM 4-8 GB | +Note on disk-edges build time (~126s vs one-shot's ~133s): the disk-edges path is not slower despite the extra I/O. The smaller resident working set means HashPrune inserts touch fewer cache lines per operation, and the spill to disk is sequential append-only and overlaps with leaf-build compute. Net: roughly the same wall-clock as one-shot in this benchmark, with significantly lower peak RSS. + The merged-shards path **uses less peak RSS than Vamana** (3.3 GB vs Vamana's 6.3 GB on this same dataset) at a 2.5× build-time cost. The disk-edges path matches Vamana on RAM at 3× the build speed. The control knob is the existing `build_ram_limit_gb` config; no new parameter is introduced. The dispatch happens inside `build_inmem_pipnn_index()`. From 77950b3e5d623f4e94a87467bf3e625001460082 Mon Sep 17 00:00:00 2001 From: Weiyao Luo <9347182+SeliMeli@users.noreply.github.com> Date: Thu, 14 May 2026 02:56:27 +0000 Subject: [PATCH 05/14] docs(rfc): address reviewer comments from PR #1049 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Add three index-update modes (incremental/full/partitioned) to background - Clarify leaf k-NN is pairwise GEMM (N×N), distinct from 1×N flat scan - Note HashPrune is streamable; clarify M5 disk-edges variants - Add concrete trade-off hypothesis with fixed-resource build-time table; state honestly that neither algorithm converts surplus RAM into faster builds - Reframe hybrid update model: Vamana inserts apply to in-memory graph, disk index is rebuilt rather than mutated - Replace "quality decay → rebuild" with operational triggers (embedding rotation, schema/param retuning, batch insert, safety rebuilds) - Add num_threads to BuildAlgorithm::PiPNN with bounded RAM-impact note - Scope feature-gated deserialization restriction to JSON config only; index files are byte-identical and load without the pipnn feature - Move checkpoint/resume (was M1's M2) to Stage 2 with rationale — streaming checkpoint doesn't fit PiPNN's batch phases - Use full hyperlinked arXiv URL throughout Co-Authored-By: Claude Opus 4.7 (1M context) --- rfcs/01049-pipnn-integration.md | 72 +++++++++++++++++++++++---------- 1 file changed, 51 insertions(+), 21 deletions(-) diff --git a/rfcs/01049-pipnn-integration.md b/rfcs/01049-pipnn-integration.md index 0a0ebc6cc..3ca747466 100644 --- a/rfcs/01049-pipnn-integration.md +++ b/rfcs/01049-pipnn-integration.md @@ -5,11 +5,11 @@ | **Authors** | Weiyao Luo | | **Contributors** | DiskANN team | | **Created** | 2026-05-11 | -| **Updated** | 2026-05-11 | +| **Updated** | 2026-05-14 | ## Summary -Add **PiPNN** (Pick-in-Partitions Nearest Neighbors, arXiv:2602.21247) as a second graph-construction algorithm for DiskANN's disk index. PiPNN produces a graph byte-compatible with Vamana's disk format and search API, at **up to 6.3× lower build time** on the workloads we have measured. Vamana remains the default and the only algorithm supported for incremental inserts; PiPNN is the proposed faster path for full rebuilds. +Add **PiPNN** (Pick-in-Partitions Nearest Neighbors, [arXiv:2602.21247](https://arxiv.org/abs/2602.21247)) as a second graph-construction algorithm for DiskANN's disk index. PiPNN produces a graph byte-compatible with Vamana's disk format and search API, at **up to 6.3× lower build time** on the workloads we have measured. Vamana remains the default and the only algorithm supported for incremental inserts; PiPNN is the proposed faster path for full rebuilds. ## Motivation @@ -17,11 +17,19 @@ Add **PiPNN** (Pick-in-Partitions Nearest Neighbors, arXiv:2602.21247) as a seco DiskANN currently builds the disk index with a single algorithm — **Vamana** (`diskann-disk/src/build/builder/`). Vamana incrementally inserts each point into a graph, running a greedy search + `RobustPrune` for each insertion, producing the on-disk format documented in `diskann-disk/src/storage/`. -**PiPNN** (Pick-in-Partitions Nearest Neighbors, arXiv:2602.21247) is a partition-based **batch** graph builder, in contrast to Vamana's **incremental** insert + prune. The construction has four phases: +Clients today update indexes in three main ways: + +1. **Incremental** — continuously insert and delete vectors in an existing in-memory graph (Vamana's per-point greedy-search + `RobustPrune` path). The disk index itself is not mutated in place. +2. **Full rebuild** — rebuild the entire graph from scratch on a static snapshot, producing an immutable disk index. +3. **Partitioned full rebuild** — split points into N clusters, build N separate graphs in parallel, then stitch them together with a lightweight merge step to bound peak build-time memory (Vamana's `build_merged_vamana_index` path). + +PiPNN, as proposed here, is a faster substitute for paths (2) and (3). Path (1) remains Vamana's responsibility for the foreseeable future (see "PiPNN is algorithmically batch-only" below). + +**PiPNN** (Pick-in-Partitions Nearest Neighbors, [arXiv:2602.21247](https://arxiv.org/abs/2602.21247)) is a partition-based **batch** graph builder, in contrast to Vamana's **incremental** insert + prune. The construction has four phases: 1. **Partition** — Randomized Ball Carving (RBC) recursively splits the dataset into small *overlapping* leaf clusters. Each point lands in `fanout` of its nearest cluster leaders at every recursion level, so every point appears in multiple leaves. Recursion stops when a cluster fits a configured leaf-size cap (`c_max`, typically 256–1024 points). -2. **Local k-NN per leaf** — For each leaf, compute the full pairwise distance matrix in one batched GEMM call, then extract each point's `leaf_k` nearest neighbors inside the leaf. GEMM batching is the source of most of PiPNN's wall-clock advantage over per-point greedy search. -3. **HashPrune merge** — Edges from all leaves are merged into a per-point reservoir of bounded size (`l_max`, ~64–128). The pruner is keyed by an LSH **angular bucket** of each candidate neighbor: at most one candidate per bucket is retained, and on collision the closer candidate wins. This produces a diverse short-list per point using O(`l_max`) memory per node and O(1) amortized insert work. +2. **Local k-NN per leaf** — For each leaf, compute the full pairwise distance matrix as a single batched GEMM (an N×N intra-leaf computation, where N ≈ `c_max`), then extract each point's `leaf_k` nearest neighbors inside the leaf. This is structurally different from a flat scan (1×N query against the whole dataset, e.g. work item [#1036](https://github.com/microsoft/DiskANN/issues/1036)) — every column of the GEMM contributes to every row's top-k, so the cost is amortized across `c_max²` distance evaluations. GEMM batching is the source of most of PiPNN's wall-clock advantage over per-point greedy search. +3. **HashPrune merge** — Edges from all leaves are merged into a per-point reservoir of bounded size (`l_max`, ~64–128). The pruner is keyed by an LSH **angular bucket** of each candidate neighbor: at most one candidate per bucket is retained, and on collision the closer candidate wins. This produces a diverse short-list per point using O(`l_max`) memory per node and O(1) amortized insert work. The merge stage is naturally streamable — edges can be fed in chunks (either generated all at once and replayed from disk, or generated leaf-batch-by-leaf-batch interleaved with HashPrune inserts) to bound peak RAM; see M5 below. 4. **Optional final prune** — A single RobustPrune-style pass (same algorithm Vamana uses, with a configurable `alpha`) applies geometric occlusion to the HashPrune candidates. Used when the workload benefits from explicit graph diversification. The output is `Vec>` adjacency lists in the same shape Vamana produces, then handed to the existing disk-layout writer. PQ training and search-side data structures are unchanged. @@ -40,15 +48,37 @@ Vamana's incremental design scales linearly in points × per-insert search cost, Frequent rebuilds (driven by data churn or parameter sweeps) and full rebuilds at 10M-scale and above are the bottleneck. PiPNN's offline benchmarks at matching recall budgets complete the same builds **up to 6.3× faster** while writing the same disk format (full numbers in the Benchmark Results section). This RFC proposes landing PiPNN so teams can opt into faster builds and so we can collect production-relevant signal on whether PiPNN can eventually replace Vamana's full-rebuild path. +#### Concrete trade-off hypothesis + +To make the comparison precise rather than headline-only, we frame Stage-1 validation around a fixed-resource hypothesis: + +> Given a worker with fixed CPU cores, RAM budget, and SSD throughput, PiPNN delivers higher index-build throughput (vectors per minute per worker) than Vamana at matching recall, *provided* its working set fits within the RAM budget. When the RAM budget is below PiPNN's one-shot working set, the three-tier dispatch (disk-edges, then merged-shards) keeps PiPNN within or below Vamana's RAM footprint at a documented build-time cost. + +Concretely, on BigANN 10M with the same 16-thread / NVMe worker: + +| RAM budget | PiPNN strategy | PiPNN build | Vamana build | +|---:|---|---:|---:| +| ≥ ~12 GB | one-shot | 80–133s | 358s | +| 6–12 GB | disk-edges | ~126s | 358s | +| 3–6 GB | merged-shards | ~332s | 358s (partitioned: similar) | +| < 3 GB | merged-shards w/ smaller shards | further degrades | further degrades | + +Two important things this table is **not** claiming: + +- **PiPNN does not auto-scale build time downward when given more RAM than its working set needs.** PiPNN's wall-clock is dominated by HashPrune inserts + leaf-build GEMM. Once the dataset, HashPrune reservoir, and per-thread buffers fit comfortably in RAM, additional RAM headroom does not buy faster builds. (More memory *channels* / higher bandwidth do help, but that is a hardware property, not a budget knob.) +- **Vamana also does not have a "use more RAM to build faster" mode** — its peak RSS is largely set by the dataset + working graph, and giving it more RAM headroom past that does not accelerate the per-insert greedy search. + +So the honest framing is: PiPNN trades a higher minimum RAM budget for a substantially faster build at that budget. Neither algorithm currently converts surplus RAM into faster builds; both convert surplus RAM into "no pressure to use the chunked/shard fallbacks." + #### Hybrid update model (Stage 2 direction) -Vamana and PiPNN write the same on-disk graph format, so a graph built by either algorithm can be *read* (and incrementally edited) by either. We exploit this for the production update story: +Vamana and PiPNN write the same on-disk graph format, so a graph built by either algorithm can be loaded by the same search code and, once loaded into memory, can be incrementally edited by Vamana. We exploit this for the production update story: -- **Bulk / full rebuild → PiPNN.** When data churn is large enough to justify a full rebuild, PiPNN is used because it is several times faster than Vamana at this job. -- **Incremental insert → Vamana.** Between full rebuilds, individual inserts use Vamana's existing greedy-search + RobustPrune insert path. PiPNN's batch design has no natural single-point-insert API and we do not plan to build one. -- **Quality decay → trigger PiPNN rebuild.** When recall on the live graph degrades past a configured threshold (driven by accumulated incremental inserts), the system schedules a PiPNN full rebuild from the current dataset snapshot. +- **Bulk / full rebuild → PiPNN.** When a full rebuild is needed, PiPNN is used because it is several times faster than Vamana at this job. +- **Incremental insert → in-memory Vamana.** Between full rebuilds, individual inserts use Vamana's existing greedy-search + RobustPrune insert path **on the in-memory graph** (`diskann::graph::index::DiskANNIndex`). The on-disk index file is not mutated in place — the standing convention is that a refreshed disk index is produced by a full rebuild from the current dataset snapshot. PiPNN's batch design has no natural single-point-insert API and we do not plan to build one. +- **Triggers for a full PiPNN rebuild.** A rebuild is scheduled in response to operationally meaningful events, not just gradual recall drift. The expected triggers include: (a) embedding-model rotation (vectors are no longer comparable to existing ones), (b) schema/parameter retuning (`R`, `L`, `pq_chunks`, distance metric, quantization), (c) large batch inserts that exceed what the in-memory incremental path is sized for, and (d) periodic safety rebuilds on a cadence that depends on observed graph health. DiskANN's existing claim that incremental updates keep recall healthy still holds; PiPNN does not change that, it just makes the eventual rebuild cheaper. -Because both algorithms produce the same disk format, switching between "fresh PiPNN build" and "Vamana-edited delta" is transparent to search-side consumers. This answers "should PiPNN implement incremental inserts?" — no, we keep Vamana for that, and use the disk index format as the integration point. +Because both algorithms produce the same disk format, switching between "fresh PiPNN build" and "Vamana-edited in-mem graph reloaded from a fresh disk build" is transparent to search-side consumers. This answers "should PiPNN implement incremental inserts?" — no, we keep Vamana's in-memory insert path for that, and use the disk index format as the integration point between rebuilds. #### Two-stage rollout @@ -104,6 +134,7 @@ pub enum BuildAlgorithm { final_prune: bool, // optional RobustPrune final pass leader_cap: usize, // hard cap on leaders per level saturate_after_prune: bool, + num_threads: usize, // 0 = all logical CPUs (matches Vamana) }, } ``` @@ -112,7 +143,9 @@ pub enum BuildAlgorithm { `DiskIndexBuildParameters` gains a `build_algorithm: BuildAlgorithm` field and a constructor pair: `new` (defaults to Vamana, no PiPNN dep) and `new_with_algorithm` (explicit). The JSON schema for benchmark configs gains an optional `build_algorithm` block that, when present, deserializes via `#[serde(tag = "algorithm")]` into one of the variants above. -**Deserialization behavior when the `pipnn` feature is disabled**: because `BuildAlgorithm::PiPNN` is gated by `#[cfg(feature = "pipnn")]`, a binary built without the feature does not see that variant. A JSON config containing `"algorithm": "PiPNN"` fed to such a binary fails at parse time with a serde error along the lines of `unknown variant 'PiPNN', expected 'Vamana'`. This is a clear, fail-fast diagnostic — not a backward-compatibility regression. Configs that omit `build_algorithm` (or set `"algorithm": "Vamana"`) parse identically across feature combinations. Documentation alongside the config schema will call this out so users know that PiPNN configs require a PiPNN-enabled build. +**`num_threads`.** Like Vamana, PiPNN accepts `num_threads` as a build-time parameter (default `0` = all logical CPUs). Thread count has a small, bounded effect on peak RSS: each worker holds thread-local stripe buffers in the partition phase (~`stripe_kb`, typically 16 MB) and thread-local leaf-build scratch (~`c_max² × 4 B`, ≈ 256 KB at `c_max=256`). Total per-thread overhead is ~16–20 MB; at 48 threads this is ~960 MB of incremental resident set on top of the dataset and HashPrune reservoir, which dominate the peak. We do not consider `num_threads` a memory-budget knob — to bound RAM, use `build_ram_limit_gb` (see Memory mitigation). + +**Deserialization behavior when the `pipnn` feature is disabled — scope:** this affects only **JSON configs**, not the index files themselves. Because PiPNN and Vamana write byte-identical disk formats, an index *file* built by either algorithm is loaded by the same search code and does not require the `pipnn` feature at load time. The restriction below applies to the *build-time configuration* that selects which algorithm to invoke. Because `BuildAlgorithm::PiPNN` is gated by `#[cfg(feature = "pipnn")]`, a binary built without the feature does not see that variant. A JSON config containing `"algorithm": "PiPNN"` fed to such a binary fails at parse time with a serde error along the lines of `unknown variant 'PiPNN', expected 'Vamana'`. This is a clear, fail-fast diagnostic — not a backward-compatibility regression. Configs that omit `build_algorithm` (or set `"algorithm": "Vamana"`) parse identically across feature combinations. Documentation alongside the config schema will call this out so users know that PiPNN configs require a PiPNN-enabled build. ### Builder dispatch @@ -160,7 +193,7 @@ Since the produced graph and PQ/SQ artifacts are byte-identical in format, a sea ### PiPNN is algorithmically batch-only -This is a property of the algorithm, not of our implementation. The PiPNN paper (arXiv:2602.21247) is explicit that the design departs from incremental methods by "eliminating search from the graph-building process altogether": instead of running a greedy search for each new point's neighbors, PiPNN partitions the dataset, then computes neighbors for all points within each leaf as a single batched operation. The paper describes no per-point insertion algorithm and reports no streaming results. The framing throughout is "fast one-shot construction on a static dataset." +This is a property of the algorithm, not of our implementation. The PiPNN paper ([arXiv:2602.21247](https://arxiv.org/abs/2602.21247)) is explicit that the design departs from incremental methods by "eliminating search from the graph-building process altogether": instead of running a greedy search for each new point's neighbors, PiPNN partitions the dataset, then computes neighbors for all points within each leaf as a single batched operation. The paper describes no per-point insertion algorithm and reports no streaming results. The framing throughout is "fast one-shot construction on a static dataset." Where this batch assumption is load-bearing: @@ -268,7 +301,7 @@ The foundation that ships first. - **Compatibility:** PiPNN-built indexes are read by the existing search pipeline unchanged (the on-disk format is identical) and produce recall numbers within the tolerances the existing disk-index test suite enforces. - **CI:** benchmark binary runs with `--features pipnn` on a small smoke test (SIFT-1M). -M1–M5 close the feature-parity gaps; M6–M8 are validation and operational readiness. +M1, M3–M5 close the feature-parity gaps in Stage 1; M6–M8 are validation and operational readiness. Checkpoint/resume (previously M2) is deferred to Stage 2 — see "Deferred to Stage 2" below for the rationale. ### M1 — Feature parity: in-memory build / search @@ -278,13 +311,6 @@ Vamana supports both a **disk-resident** build/search path (via `diskann-disk`) - **API:** add `diskann_pipnn::build_into_inmem_index(...)` returning an in-memory index that is read by the existing `DiskANNIndex::search` path unchanged. - **Validation:** in-mem search recall on Enron 1M with PiPNN-built graph matches the disk-build + load round-trip recall within noise. -### M2 — Feature parity: checkpoint / resume - -- **Scope:** add checkpoint/resume to the PiPNN build pipeline using the existing `CheckpointManager` / `ChunkingConfig` infrastructure in `diskann-disk/src/build/chunking/`. -- **Boundaries:** natural checkpoint points are partition output (`Vec`), per-leaf HashPrune flush, post-extract graph. -- **Behavior:** matches Vamana's — a killed build resumes from the last checkpoint instead of starting over. -- **Validation:** kill-and-resume test on BigANN 10M at three different checkpoint phases; final graph byte-identical to a non-interrupted build given the same seeds. - ### M3 — Feature parity: quantized vector support PiPNN currently has only a `SQ1` (1-bit) build path. @@ -308,7 +334,7 @@ PiPNN-built graphs already work with the existing search-time filter pipeline (` Implement two memory-constrained PiPNN paths and select among them via the existing `build_ram_limit_gb` knob. -- **Disk-edges:** HashPrune reservoirs spill to disk between leaf batches when `MemoryBudget` is below a threshold (currently ~8 GB for 10M-scale workloads). +- **Disk-edges:** today's prototype generates all leaf edges first, spills them to disk, then streams chunks back into HashPrune. An alternative we plan to evaluate is to interleave the two — write partition metadata to disk and run leaf-build + HashPrune in chunks (build edges for the first N leaves' points, flush their adjacency lists, then move on). Both variants bound the resident HashPrune reservoir; the second avoids the full edge-set materialization at the cost of a second pass over the partition. - **Merged-shards:** per-shard graphs built independently then merged, mirroring Vamana's `build_merged_vamana_index` pipeline at `diskann-disk/src/build/builder/build.rs:327`. The existing shard merger is reused. - **Dispatch:** inside `build_inmem_pipnn_index()` — no new public parameter. - **Validation:** at `build_ram_limit_gb=4`, the PiPNN-merged path on BigANN 10M produces peak RSS ≤ 4 GB and recall within 1% of one-shot PiPNN. @@ -338,6 +364,10 @@ Validate the Stage-2 hybrid loop end-to-end. - **Runbook:** failure modes (OOM under one-shot, partition timeout, `l_max` saturation), diagnosis, recovery. - **Defaults:** parameter recommendations baked into the JSON config builder so users don't hand-tune for common cases. +### Deferred to Stage 2 + +- **Checkpoint / resume (was M2).** Vamana's checkpoint/resume is a *streaming* mechanism — it relies on the per-point incremental insert order to define natural checkpoint boundaries. PiPNN's batch design has no equivalent monotonic insertion sequence: partition output, per-leaf GEMM, and HashPrune merge are all coarse-grained whole-phase artifacts rather than fine-grained incremental progress. A useful PiPNN checkpoint scheme would therefore *not* mirror Vamana's; it would need new design choices about which phase boundaries to materialize, at what granularity, and whether the cost-benefit justifies the extra disk I/O. Empirically, PiPNN's full BigANN-10M build runs in ~80 s, so the operational value of resuming a partially completed build is materially lower than for Vamana's multi-hour rebuilds. We defer checkpoint design until Stage 2, when the production rebuild cadence and observed failure modes will tell us whether it is needed and what shape it should take. + ### Out of scope (intentionally not on this list) - **Build-time PQ distance kernel.** Not used by Vamana in production paths today; deferred indefinitely. From 018dbe98a01765a5757e5928d680ffd6f5fea1b8 Mon Sep 17 00:00:00 2001 From: Weiyao Luo <9347182+SeliMeli@users.noreply.github.com> Date: Thu, 14 May 2026 03:00:10 +0000 Subject: [PATCH 06/14] docs(rfc): add fixed-resource trade-off validation experiment (M6) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Address arrayka's "concrete hypothesis" comment on PR #1049 by adding an explicit Stage-1 experiment plan that validates the trade-off framing from the Problem Statement. - New M6 — Fixed-resource trade-off validation: cgroup-locked RAM budget sweep on BigANN 10M / Enron 10M / 100M-scale, with explicit cells for Vamana one-shot, Vamana partitioned, PiPNN one-shot, PiPNN disk-edges, PiPNN merged-shards. Captures wall-clock, peak RSS, CPU util, SSD bytes, recall, and QPS — reported as vectors/min/worker so different worker shapes compare directly. - Four explicit hypotheses to confirm or falsify, including the "surplus RAM doesn't buy speed" claim from the Problem Statement. - Pass criterion: documented matrix with a clearly-better algorithm (or tie) per budget at matching recall; surprises are Stage-1 blockers. - Renumber subsequent milestones: production-matrix → M7, hybrid-update → M8, operational readiness → M9. - Forward-link the Problem Statement hypothesis to M6. Co-Authored-By: Claude Opus 4.7 (1M context) --- rfcs/01049-pipnn-integration.md | 29 +++++++++++++++++++++++------ 1 file changed, 23 insertions(+), 6 deletions(-) diff --git a/rfcs/01049-pipnn-integration.md b/rfcs/01049-pipnn-integration.md index 3ca747466..7f352f79e 100644 --- a/rfcs/01049-pipnn-integration.md +++ b/rfcs/01049-pipnn-integration.md @@ -70,6 +70,8 @@ Two important things this table is **not** claiming: So the honest framing is: PiPNN trades a higher minimum RAM budget for a substantially faster build at that budget. Neither algorithm currently converts surplus RAM into faster builds; both convert surplus RAM into "no pressure to use the chunked/shard fallbacks." +The numbers above are from initial benchmarks on a single workload and configuration. A dedicated experiment to validate this hypothesis across RAM budgets and worker shapes is part of Stage 1 — see **M6 — Fixed-resource trade-off validation** in Future Work below. + #### Hybrid update model (Stage 2 direction) Vamana and PiPNN write the same on-disk graph format, so a graph built by either algorithm can be loaded by the same search code and, once loaded into memory, can be incrementally edited by Vamana. We exploit this for the production update story: @@ -290,7 +292,7 @@ PiPNN beats Vamana on recall at every L on the 384d Enron 10M workload, at parit ## Future Work -The Stage 1 milestones below are gating items for Stage 2 (retiring Vamana's full-rebuild path). Each must be addressed before that proposal is credible. M0 is the foundation shipped by this RFC; M1–M8 are deferred to follow-on work and ordered by dependency, not strict calendar sequence — some can run in parallel. +The Stage 1 milestones below are gating items for Stage 2 (retiring Vamana's full-rebuild path). Each must be addressed before that proposal is credible. M0 is the foundation shipped by this RFC; M1, M3–M9 are deferred to follow-on work and ordered by dependency, not strict calendar sequence — some can run in parallel. M2 (checkpoint/resume) is intentionally absent; see "Deferred to Stage 2" below. ### M0 — Skeleton integration @@ -301,7 +303,7 @@ The foundation that ships first. - **Compatibility:** PiPNN-built indexes are read by the existing search pipeline unchanged (the on-disk format is identical) and produce recall numbers within the tolerances the existing disk-index test suite enforces. - **CI:** benchmark binary runs with `--features pipnn` on a small smoke test (SIFT-1M). -M1, M3–M5 close the feature-parity gaps in Stage 1; M6–M8 are validation and operational readiness. Checkpoint/resume (previously M2) is deferred to Stage 2 — see "Deferred to Stage 2" below for the rationale. +M1, M3–M5 close the feature-parity gaps in Stage 1; M6–M9 are validation and operational readiness. Checkpoint/resume (previously M2) is deferred to Stage 2 — see "Deferred to Stage 2" below for the rationale. ### M1 — Feature parity: in-memory build / search @@ -339,9 +341,24 @@ Implement two memory-constrained PiPNN paths and select among them via the exist - **Dispatch:** inside `build_inmem_pipnn_index()` — no new public parameter. - **Validation:** at `build_ram_limit_gb=4`, the PiPNN-merged path on BigANN 10M produces peak RSS ≤ 4 GB and recall within 1% of one-shot PiPNN. -### M6 — Production validation: recall × QPS × dimensionality matrix +### M6 — Fixed-resource trade-off validation + +This milestone validates the **concrete trade-off hypothesis** stated in the Problem Statement: under a fixed worker shape (CPU cores, RAM budget, SSD throughput), PiPNN delivers higher build throughput than Vamana at matching recall when its working set fits, and remains competitive (via the three-tier dispatch in M5) when it does not. The output of this milestone is the evidence behind the per-budget recommendation in the Stage-1 deployment guide. + +- **Fixed worker shape per run.** Lock CPU cores (e.g. 16), SSD model/throughput, and a RAM ceiling enforced via cgroups (`memory.max`) so the build *cannot* exceed it. RAM-budget sweep on BigANN 10M: `{3, 6, 8, 12, 16, 24, 32}` GB at minimum. Include at least one row each for Enron 10M (higher dim, larger reservoir) and a 100M-scale dataset (one budget per algorithm sufficient to fit). +- **Algorithm × strategy cells.** For each RAM budget, run: Vamana one-shot, Vamana partitioned, PiPNN one-shot (if fits), PiPNN disk-edges, PiPNN merged-shards. Skip cells whose minimum working set exceeds the budget — those count as "OOM, not supported at this budget" and are part of the result, not a gap. +- **Metrics captured per cell.** Wall-clock build time, peak RSS (via heaptrack or `/usr/bin/time -v`), CPU utilization (`pidstat`), SSD bytes read/written, recall@K at L=50/100/L_target, and queries-per-second at matching recall. Throughput reported as **vectors per minute per worker** so different worker shapes compare directly. +- **Hypotheses to confirm or falsify.** + 1. PiPNN's wall-clock advantage over Vamana persists across all RAM budgets where its working set fits (one-shot or disk-edges variant). + 2. PiPNN's merged-shards path matches or beats Vamana's partitioned-rebuild at the same RAM ceiling on build time *and* recall. + 3. Neither algorithm reduces build time when given RAM headroom past its working-set requirement (validates the "surplus RAM doesn't buy speed" claim). + 4. PiPNN's per-thread overhead is bounded as stated (~16–20 MB/thread) and `num_threads` is not a hidden RAM knob. +- **Out-of-budget behavior.** Each (algorithm × budget) cell that cannot complete is recorded as such — explicit "PiPNN one-shot not supported at 6 GB on BigANN 10M" is a valid result, not a failed experiment. +- **Pass criterion for Stage 2 readiness.** A documented matrix where each budget has a clearly-better algorithm (or "tie") at matching recall, with no surprise cells that contradict the Problem Statement's hypothesis. Surprises must be either reproduced and explained, or treated as Stage-1 blockers. + +### M7 — Production validation: recall × QPS × dimensionality matrix -End-to-end validation on the full production workload mix. +End-to-end validation on the full production workload mix (independent of the resource matrix in M6). - **Datasets:** at minimum three families (BigANN, Enron, plus one production-representative). - **Scales:** 10M and 100M; one billion-scale sample if hardware permits. @@ -349,7 +366,7 @@ End-to-end validation on the full production workload mix. - **Pass criterion:** for each (dataset, scale, metric) cell, PiPNN recall@K is within Vamana's recall ±1% at matching QPS, *or* higher QPS at matching recall. - **Out-of-band cells** are documented as "PiPNN not yet recommended for X" rather than blocking Stage 2 entirely. -### M7 — Production validation: hybrid update model +### M8 — Production validation: hybrid update model Validate the Stage-2 hybrid loop end-to-end. @@ -357,7 +374,7 @@ Validate the Stage-2 hybrid loop end-to-end. - **Output:** a recommended "quality decay threshold" for production rebuild triggers, derived from the measured decay curve. - **Disk-format compatibility test:** confirm Vamana's incremental-insert path reads PiPNN-produced graphs correctly. This is the load-bearing compatibility check for the hybrid model. -### M8 — Operational readiness +### M9 — Operational readiness - **Telemetry:** emit per-phase timing and peak RSS via the existing OpenTelemetry tracer, comparable to Vamana's spans. - **Documentation:** replace experimental notes in `CLAUDE.md` with a permanent doc covering recommended parameters per workload class (dim × scale × metric). From 612111909d004ac32bfb8859a3d6506c0e9d37ce Mon Sep 17 00:00:00 2001 From: Weiyao Luo <9347182+SeliMeli@users.noreply.github.com> Date: Thu, 14 May 2026 03:19:05 +0000 Subject: [PATCH 07/14] docs(rfc): address wuw92's cycle + determinism comments MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Fix workspace structure: drop the spurious diskann-pipnn → diskann-disk dependency from the description; PiPNN produces Vec> consumed by diskann-disk behind its pipnn feature. The actual Cargo.toml never had this edge — the RFC text was wrong and described a dependency cycle. - Redraw the dep diagram showing the one-way produces/consumes flow. - Add a determinism note to the deferred checkpoint/resume section: PiPNN is rayon-parallel, so byte-identical output is not a free property and would require extra determinism work. Recall parity is the right validation criterion for any future resumed-build test, not byte-identity. Co-Authored-By: Claude Opus 4.7 (1M context) --- rfcs/01049-pipnn-integration.md | 20 ++++++++++++++------ 1 file changed, 14 insertions(+), 6 deletions(-) diff --git a/rfcs/01049-pipnn-integration.md b/rfcs/01049-pipnn-integration.md index 7f352f79e..7596ffcd4 100644 --- a/rfcs/01049-pipnn-integration.md +++ b/rfcs/01049-pipnn-integration.md @@ -99,15 +99,21 @@ Because both algorithms produce the same disk format, switching between "fresh P ### Workspace structure -Add a new crate, `diskann-pipnn`, that depends on the existing `diskann`, `diskann-disk`, `diskann-linalg`, `diskann-vector`, `diskann-quantization`, and `diskann-utils` crates. PiPNN lives outside `diskann-disk` so the core disk path has no compile-time dependency on PiPNN; the disk builder takes a typed `BuildAlgorithm` and only depends on PiPNN behind a feature flag. +Add a new crate, `diskann-pipnn`, that depends on the existing `diskann`, `diskann-linalg`, `diskann-vector`, `diskann-quantization`, and `diskann-utils` crates. **`diskann-pipnn` does not depend on `diskann-disk`.** The PiPNN builder produces a plain `Vec>` adjacency list (defined in terms of core types from `diskann`), and `diskann-disk` consumes that output behind its own `pipnn` Cargo feature. This is intentional: a `diskann-pipnn → diskann-disk → [feature] diskann-pipnn` edge would form a dependency cycle. Keeping the data-flow direction one-way (PiPNN produces, disk consumes) means PiPNN never imports any disk-layout symbols and the feature gate sits cleanly on the consumer side. ```text -diskann/ # core types, traits, search -diskann-disk/ # disk index layout, builder, search - └── feature "pipnn" # opt-in dependency on diskann-pipnn +diskann/ # core types, traits, search ←┐ used by both +diskann-linalg/ # GEMM/SVD ├─ shared deps +diskann-quantization/ # PQ/SQ training ├─ (no edges +diskann-vector/ # vector representations ├─ to either +diskann-utils/ # threading, file I/O ←┘ builder) + diskann-pipnn/ # new: PiPNN builder -diskann-linalg/ # GEMM/SVD (used by both Vamana and PiPNN) -diskann-quantization/ # PQ/SQ training (used by both) + ↑ produces Vec> + │ +diskann-disk/ # disk index layout, builder, search + └── feature "pipnn" # opt-in: takes Vec> from diskann-pipnn + # and hands it to DiskIndexWriter ``` ### `BuildAlgorithm` enum @@ -385,6 +391,8 @@ Validate the Stage-2 hybrid loop end-to-end. - **Checkpoint / resume (was M2).** Vamana's checkpoint/resume is a *streaming* mechanism — it relies on the per-point incremental insert order to define natural checkpoint boundaries. PiPNN's batch design has no equivalent monotonic insertion sequence: partition output, per-leaf GEMM, and HashPrune merge are all coarse-grained whole-phase artifacts rather than fine-grained incremental progress. A useful PiPNN checkpoint scheme would therefore *not* mirror Vamana's; it would need new design choices about which phase boundaries to materialize, at what granularity, and whether the cost-benefit justifies the extra disk I/O. Empirically, PiPNN's full BigANN-10M build runs in ~80 s, so the operational value of resuming a partially completed build is materially lower than for Vamana's multi-hour rebuilds. We defer checkpoint design until Stage 2, when the production rebuild cadence and observed failure modes will tell us whether it is needed and what shape it should take. + *Note on determinism for any future checkpoint validation:* PiPNN is a parallel algorithm (rayon-parallel partition, leaf-build GEMM, and HashPrune merge), so byte-identical output across runs — and therefore across "resumed vs. never-interrupted" runs — is **not** a free property. It would require extra determinism work (fixed thread schedule, deterministic reduction order in the HashPrune reservoir, seeded LSH hyperplanes). The right validation criterion for a resumed build is **recall parity with a non-resumed build**, not byte-identical adjacency lists. + ### Out of scope (intentionally not on this list) - **Build-time PQ distance kernel.** Not used by Vamana in production paths today; deferred indefinitely. From 61f1ea55f9eeb0c95731cb3555b870cfdfad6ab1 Mon Sep 17 00:00:00 2001 From: Weiyao Luo <9347182+SeliMeli@users.noreply.github.com> Date: Thu, 14 May 2026 03:33:25 +0000 Subject: [PATCH 08/14] docs(rfc): narrow Stage 1 scope to disk-index full-rebuild only MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Per reviewer suggestion, narrow the RFC's Stage-1 commitment so it covers only the disk index build path. The in-memory DiskANNIndex builder exists primarily to support streaming per-point construction, which is exactly what PiPNN's batch algorithm cannot do efficiently — adding a PiPNN in-mem builder would offer no incremental capability and duplicate the disk path's value. - Summary: PiPNN is "for the disk index full-rebuild path"; in-mem build explicitly out of scope. - Two-stage rollout: Stage 1 = disk-index full-rebuild only; in-memory PiPNN is not part of any stage. - Goals: scoped to the disk index full-rebuild path; in-memory construction is out of scope. - Future Work: remove M1 (in-memory build / search) from Stage 1 milestones; renumber introduction accordingly. - Out of scope: add an explicit "In-memory PiPNN build (was M1)" entry with the rationale (streaming use-case mismatch, no incremental capability, would force a runtime dep on the in-mem graph crate). - Rename "Out of scope (intentionally not on this list)" to "Out of scope: not part of any stage" for clarity. Co-Authored-By: Claude Opus 4.7 (1M context) --- rfcs/01049-pipnn-integration.md | 33 ++++++++++++++++----------------- 1 file changed, 16 insertions(+), 17 deletions(-) diff --git a/rfcs/01049-pipnn-integration.md b/rfcs/01049-pipnn-integration.md index 7596ffcd4..d51dc7041 100644 --- a/rfcs/01049-pipnn-integration.md +++ b/rfcs/01049-pipnn-integration.md @@ -9,7 +9,7 @@ ## Summary -Add **PiPNN** (Pick-in-Partitions Nearest Neighbors, [arXiv:2602.21247](https://arxiv.org/abs/2602.21247)) as a second graph-construction algorithm for DiskANN's disk index. PiPNN produces a graph byte-compatible with Vamana's disk format and search API, at **up to 6.3× lower build time** on the workloads we have measured. Vamana remains the default and the only algorithm supported for incremental inserts; PiPNN is the proposed faster path for full rebuilds. +Add **PiPNN** (Pick-in-Partitions Nearest Neighbors, [arXiv:2602.21247](https://arxiv.org/abs/2602.21247)) as a second graph-construction algorithm for DiskANN's **disk index full-rebuild path**. PiPNN produces a graph byte-compatible with Vamana's disk format and search API, at **up to 6.3× lower build time** on the workloads we have measured. Vamana remains the default for disk builds and the only algorithm supported for in-memory incremental inserts. In-memory PiPNN build is explicitly out of scope: DiskANN's in-mem path exists to support streaming construction, which PiPNN's batch algorithm cannot do efficiently. ## Motivation @@ -84,15 +84,19 @@ Because both algorithms produce the same disk format, switching between "fresh P #### Two-stage rollout -- **Stage 1 (this RFC):** Land PiPNN behind a build-algorithm selector. Vamana stays default; PiPNN is opt-in. Stage 1 has explicit milestones (in Future Work) that gate readiness for Stage 2. -- **Stage 2 (separate proposal, conditional on Stage 1 milestones):** Retire the Vamana **full-rebuild** path. Vamana remains the implementation for incremental inserts via the hybrid model above. +- **Stage 1 (this RFC):** Land PiPNN as an alternative builder for the **disk index full-rebuild path only**, behind a build-algorithm selector. Vamana stays default; PiPNN is opt-in. Stage 1 has explicit milestones (in Future Work) that gate readiness for Stage 2. +- **Stage 2 (separate proposal, conditional on Stage 1 milestones):** Retire the Vamana **disk-index full-rebuild** path. Vamana remains the implementation for incremental inserts on the in-memory graph via the hybrid model above. -### Goals +In-memory PiPNN build/search is **not part of any stage**. DiskANN's in-memory `DiskANNIndex` path exists primarily to support streaming (per-point) index construction, which is exactly the use case PiPNN's batch algorithm does not address (see "PiPNN is algorithmically batch-only"). Replacing or extending the in-memory builder with PiPNN would offer no incremental capability and duplicate the disk path's value. We therefore list it under unstaged future work rather than as a Stage 1 / Stage 2 milestone. -1. **Algorithm-level pluggability**: introduce a build-algorithm selector to the build pipeline that routes between Vamana (existing) and PiPNN (new). Existing build sites continue to default to Vamana with no behavior change. +### Goals (Stage 1) + +Stage 1 is scoped to the **disk index full-rebuild path**. In-memory index construction is explicitly out of scope. + +1. **Algorithm-level pluggability for the disk builder**: introduce a build-algorithm selector to the disk-index build pipeline that routes between Vamana (existing) and PiPNN (new). Existing build sites continue to default to Vamana with no behavior change. 2. **Disk format compatibility**: the PiPNN-built index is byte-compatible with Vamana-built indexes on disk — search, PQ, and storage layouts are unchanged. This is the foundation for the hybrid update model. 3. **Public API compatibility**: the disk-index public API surface (`DiskIndexBuilder::new`, `IndexConfiguration`, `DiskIndexWriter`, JSON config schema) remains backward-compatible. PiPNN configuration is added under a new tagged enum variant. -4. **Feature-parity milestones**: deliver the Vamana capabilities PiPNN needs for a full-rebuild role in production (see Future Work below). +4. **Feature-parity milestones (disk path only)**: deliver the Vamana disk-build capabilities PiPNN needs for a full-rebuild role in production (see Future Work below). 5. **Documented memory mitigation**: provide a configuration knob (three-tier build) that brings PiPNN's peak RSS to or below Vamana's at the cost of build time. ## Proposal @@ -298,7 +302,7 @@ PiPNN beats Vamana on recall at every L on the 384d Enron 10M workload, at parit ## Future Work -The Stage 1 milestones below are gating items for Stage 2 (retiring Vamana's full-rebuild path). Each must be addressed before that proposal is credible. M0 is the foundation shipped by this RFC; M1, M3–M9 are deferred to follow-on work and ordered by dependency, not strict calendar sequence — some can run in parallel. M2 (checkpoint/resume) is intentionally absent; see "Deferred to Stage 2" below. +The Stage 1 milestones below are gating items for Stage 2 (retiring Vamana's disk-index full-rebuild path). Each must be addressed before that proposal is credible. M0 is the foundation shipped by this RFC; M3–M9 are deferred to follow-on work and ordered by dependency, not strict calendar sequence — some can run in parallel. M1 (in-memory build/search) and M2 (checkpoint/resume) are intentionally absent — see "Out of scope: not part of any stage" and "Deferred to Stage 2" below. ### M0 — Skeleton integration @@ -311,14 +315,6 @@ The foundation that ships first. M1, M3–M5 close the feature-parity gaps in Stage 1; M6–M9 are validation and operational readiness. Checkpoint/resume (previously M2) is deferred to Stage 2 — see "Deferred to Stage 2" below for the rationale. -### M1 — Feature parity: in-memory build / search - -Vamana supports both a **disk-resident** build/search path (via `diskann-disk`) and an **in-memory only** path (via `diskann::graph::index::DiskANNIndex`). PiPNN today only produces graphs handed to `DiskIndexWriter`; an in-mem-only consumer that wants PiPNN's speed has no entry point. - -- **Scope:** expose `diskann_pipnn::build_typed` output (`Vec>`) as a populated in-memory `DiskANNIndex` so callers can build + search without touching disk. -- **API:** add `diskann_pipnn::build_into_inmem_index(...)` returning an in-memory index that is read by the existing `DiskANNIndex::search` path unchanged. -- **Validation:** in-mem search recall on Enron 1M with PiPNN-built graph matches the disk-build + load round-trip recall within noise. - ### M3 — Feature parity: quantized vector support PiPNN currently has only a `SQ1` (1-bit) build path. @@ -393,13 +389,16 @@ Validate the Stage-2 hybrid loop end-to-end. *Note on determinism for any future checkpoint validation:* PiPNN is a parallel algorithm (rayon-parallel partition, leaf-build GEMM, and HashPrune merge), so byte-identical output across runs — and therefore across "resumed vs. never-interrupted" runs — is **not** a free property. It would require extra determinism work (fixed thread schedule, deterministic reduction order in the HashPrune reservoir, seeded LSH hyperplanes). The right validation criterion for a resumed build is **recall parity with a non-resumed build**, not byte-identical adjacency lists. -### Out of scope (intentionally not on this list) +### Out of scope: not part of any stage + +These are explicitly *not* on a Stage 1 or Stage 2 roadmap. They may be revisited if a future workload demands them, but they are not gating items for either stage. +- **In-memory PiPNN build / in-memory index population (was M1).** DiskANN's in-memory `DiskANNIndex` exists primarily to support streaming per-point construction and online inserts — which is exactly what PiPNN's batch design cannot do efficiently (see "PiPNN is algorithmically batch-only"). Building a `DiskANNIndex` from PiPNN-produced adjacency lists is mechanically possible (the data structures are compatible) but offers no incremental capability, duplicates the disk path's value, and would force `diskann-pipnn` to take a runtime dependency on the in-mem graph crate. We defer indefinitely; if a non-streaming in-mem consumer ever needs PiPNN's build speed, the simpler answer is "build to disk, then load." - **Build-time PQ distance kernel.** Not used by Vamana in production paths today; deferred indefinitely. - **PiPNN incremental insert API.** The hybrid model (PiPNN rebuild + Vamana inserts) removes the need. - **PiPNN incremental delete API.** Same reason. - **Frozen-point semantics differences.** PiPNN writes the dataset medoid as the single frozen start point, same as Vamana's default. Already byte-compatible; no work required. -- **Multi-vector index support.** Out of scope for Stage 1; revisit only if a production workload requires it. +- **Multi-vector index support.** Revisit only if a production workload requires it. ## References From d090895dc0a4870aa02d9801cd4fb9d26a1f2985 Mon Sep 17 00:00:00 2001 From: Weiyao Luo <9347182+SeliMeli@users.noreply.github.com> Date: Thu, 14 May 2026 03:35:32 +0000 Subject: [PATCH 09/14] docs(rfc): clarify PiPNN covers both initial builds and rebuilds MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Previous wording said "full-rebuild path", which is misleading — PiPNN serves both initial builds (no prior index on disk) and full rebuilds (replacing an existing index). The algorithm doesn't care whether an older index exists; both produce a fresh disk graph from a dataset snapshot. Reword "full-rebuild path" to "full-build path" in the algorithm- capability statements (Summary, Two-stage rollout, Goals intro, Goal 4, Stage-1 milestones intro). Where the text refers to operational reruns of an existing index (hybrid update model triggers, M8 hybrid-loop validation), keep "rebuild". Co-Authored-By: Claude Opus 4.7 (1M context) --- rfcs/01049-pipnn-integration.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/rfcs/01049-pipnn-integration.md b/rfcs/01049-pipnn-integration.md index d51dc7041..2eb8b4cb8 100644 --- a/rfcs/01049-pipnn-integration.md +++ b/rfcs/01049-pipnn-integration.md @@ -9,7 +9,7 @@ ## Summary -Add **PiPNN** (Pick-in-Partitions Nearest Neighbors, [arXiv:2602.21247](https://arxiv.org/abs/2602.21247)) as a second graph-construction algorithm for DiskANN's **disk index full-rebuild path**. PiPNN produces a graph byte-compatible with Vamana's disk format and search API, at **up to 6.3× lower build time** on the workloads we have measured. Vamana remains the default for disk builds and the only algorithm supported for in-memory incremental inserts. In-memory PiPNN build is explicitly out of scope: DiskANN's in-mem path exists to support streaming construction, which PiPNN's batch algorithm cannot do efficiently. +Add **PiPNN** (Pick-in-Partitions Nearest Neighbors, [arXiv:2602.21247](https://arxiv.org/abs/2602.21247)) as a second graph-construction algorithm for **DiskANN's disk-index full-build path** — both initial builds from a fresh dataset and full rebuilds that replace an existing index. PiPNN produces a graph byte-compatible with Vamana's disk format and search API, at **up to 6.3× lower build time** on the workloads we have measured. Vamana remains the default for disk builds and the only algorithm supported for in-memory incremental inserts. In-memory PiPNN build is explicitly out of scope: DiskANN's in-mem path exists to support streaming construction, which PiPNN's batch algorithm cannot do efficiently. ## Motivation @@ -46,7 +46,7 @@ Vamana's incremental design scales linearly in points × per-insert search cost, | BigANN 10M (10M × 128, fp16, squared_l2) | 358s | | Enron 10M (10M × 384, fp16, cosine_normalized) | 844s | -Frequent rebuilds (driven by data churn or parameter sweeps) and full rebuilds at 10M-scale and above are the bottleneck. PiPNN's offline benchmarks at matching recall budgets complete the same builds **up to 6.3× faster** while writing the same disk format (full numbers in the Benchmark Results section). This RFC proposes landing PiPNN so teams can opt into faster builds and so we can collect production-relevant signal on whether PiPNN can eventually replace Vamana's full-rebuild path. +Initial builds at 10M-scale and above, and the frequent full rebuilds that follow them (driven by data churn or parameter sweeps), are the bottleneck. PiPNN's offline benchmarks at matching recall budgets complete the same builds **up to 6.3× faster** while writing the same disk format (full numbers in the Benchmark Results section). This RFC proposes landing PiPNN so teams can opt into faster builds and so we can collect production-relevant signal on whether PiPNN can eventually replace Vamana's full-build path (initial builds and rebuilds alike). #### Concrete trade-off hypothesis @@ -84,19 +84,19 @@ Because both algorithms produce the same disk format, switching between "fresh P #### Two-stage rollout -- **Stage 1 (this RFC):** Land PiPNN as an alternative builder for the **disk index full-rebuild path only**, behind a build-algorithm selector. Vamana stays default; PiPNN is opt-in. Stage 1 has explicit milestones (in Future Work) that gate readiness for Stage 2. -- **Stage 2 (separate proposal, conditional on Stage 1 milestones):** Retire the Vamana **disk-index full-rebuild** path. Vamana remains the implementation for incremental inserts on the in-memory graph via the hybrid model above. +- **Stage 1 (this RFC):** Land PiPNN as an alternative builder for the **disk-index full-build path** — covering both initial builds (no prior index) and full rebuilds (replacing an existing index) — behind a build-algorithm selector. Vamana stays default; PiPNN is opt-in. Stage 1 has explicit milestones (in Future Work) that gate readiness for Stage 2. +- **Stage 2 (separate proposal, conditional on Stage 1 milestones):** Retire the Vamana **disk-index full-build** path (initial builds and rebuilds). Vamana remains the implementation for incremental inserts on the in-memory graph via the hybrid model above. In-memory PiPNN build/search is **not part of any stage**. DiskANN's in-memory `DiskANNIndex` path exists primarily to support streaming (per-point) index construction, which is exactly the use case PiPNN's batch algorithm does not address (see "PiPNN is algorithmically batch-only"). Replacing or extending the in-memory builder with PiPNN would offer no incremental capability and duplicate the disk path's value. We therefore list it under unstaged future work rather than as a Stage 1 / Stage 2 milestone. ### Goals (Stage 1) -Stage 1 is scoped to the **disk index full-rebuild path**. In-memory index construction is explicitly out of scope. +Stage 1 is scoped to the **disk-index full-build path** — both initial builds (no prior index) and full rebuilds (replacing an existing index). In-memory index construction is explicitly out of scope. 1. **Algorithm-level pluggability for the disk builder**: introduce a build-algorithm selector to the disk-index build pipeline that routes between Vamana (existing) and PiPNN (new). Existing build sites continue to default to Vamana with no behavior change. 2. **Disk format compatibility**: the PiPNN-built index is byte-compatible with Vamana-built indexes on disk — search, PQ, and storage layouts are unchanged. This is the foundation for the hybrid update model. 3. **Public API compatibility**: the disk-index public API surface (`DiskIndexBuilder::new`, `IndexConfiguration`, `DiskIndexWriter`, JSON config schema) remains backward-compatible. PiPNN configuration is added under a new tagged enum variant. -4. **Feature-parity milestones (disk path only)**: deliver the Vamana disk-build capabilities PiPNN needs for a full-rebuild role in production (see Future Work below). +4. **Feature-parity milestones (disk path only)**: deliver the Vamana disk-build capabilities PiPNN needs to take over both initial builds and full rebuilds in production (see Future Work below). 5. **Documented memory mitigation**: provide a configuration knob (three-tier build) that brings PiPNN's peak RSS to or below Vamana's at the cost of build time. ## Proposal @@ -302,7 +302,7 @@ PiPNN beats Vamana on recall at every L on the 384d Enron 10M workload, at parit ## Future Work -The Stage 1 milestones below are gating items for Stage 2 (retiring Vamana's disk-index full-rebuild path). Each must be addressed before that proposal is credible. M0 is the foundation shipped by this RFC; M3–M9 are deferred to follow-on work and ordered by dependency, not strict calendar sequence — some can run in parallel. M1 (in-memory build/search) and M2 (checkpoint/resume) are intentionally absent — see "Out of scope: not part of any stage" and "Deferred to Stage 2" below. +The Stage 1 milestones below are gating items for Stage 2 (retiring Vamana's disk-index full-build path — initial builds and rebuilds). Each must be addressed before that proposal is credible. M0 is the foundation shipped by this RFC; M3–M9 are deferred to follow-on work and ordered by dependency, not strict calendar sequence — some can run in parallel. M1 (in-memory build/search) and M2 (checkpoint/resume) are intentionally absent — see "Out of scope: not part of any stage" and "Deferred to Stage 2" below. ### M0 — Skeleton integration From 7de117e20023198486bf838747fd86927b4a1a00 Mon Sep 17 00:00:00 2001 From: Weiyao Luo <9347182+SeliMeli@users.noreply.github.com> Date: Thu, 14 May 2026 03:40:03 +0000 Subject: [PATCH 10/14] docs(rfc): tighten prose, lift key points into tables MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Restructure the RFC for readability without losing substance. Every reviewer-comment fix from prior iterations is preserved. Changes: - Front-load the value proposition; replace verbose paragraphs with tables (update modes, RAM-budget trade-off, alternatives). - Compress the four-phase algorithm description from long bullets to one-line essences while keeping the GEMM-vs-flat-scan distinction and HashPrune-streamable note. - Collapse the "batch-only" trade-off section from a bullet list of per-phase observations into a single tight paragraph that keeps the load-bearing claim (batching needs leaf membership; batch-size-1 PiPNN ≈ Vamana, so Vamana keeps the incremental role). - Trim repetition where the same point is made in Summary, Background, Two-stage rollout, and Trade-offs — keep one canonical statement and reference it from the other sections. - Tighten the milestone section: each milestone is now a short paragraph or compact list rather than a Scope/Boundaries/Behavior/ Validation matrix. - Drop overly granular implementation notes (per-thread KB breakdowns, enumeration of every PQ training reuse point, etc.) — these belong in code comments, not the RFC body. Net effect: ~408 → ~287 lines. Co-Authored-By: Claude Opus 4.7 (1M context) --- rfcs/01049-pipnn-integration.md | 377 +++++++++++--------------------- 1 file changed, 128 insertions(+), 249 deletions(-) diff --git a/rfcs/01049-pipnn-integration.md b/rfcs/01049-pipnn-integration.md index 2eb8b4cb8..8994f6934 100644 --- a/rfcs/01049-pipnn-integration.md +++ b/rfcs/01049-pipnn-integration.md @@ -9,395 +9,274 @@ ## Summary -Add **PiPNN** (Pick-in-Partitions Nearest Neighbors, [arXiv:2602.21247](https://arxiv.org/abs/2602.21247)) as a second graph-construction algorithm for **DiskANN's disk-index full-build path** — both initial builds from a fresh dataset and full rebuilds that replace an existing index. PiPNN produces a graph byte-compatible with Vamana's disk format and search API, at **up to 6.3× lower build time** on the workloads we have measured. Vamana remains the default for disk builds and the only algorithm supported for in-memory incremental inserts. In-memory PiPNN build is explicitly out of scope: DiskANN's in-mem path exists to support streaming construction, which PiPNN's batch algorithm cannot do efficiently. +Add **PiPNN** ([Pick-in-Partitions Nearest Neighbors](https://arxiv.org/abs/2602.21247)) as a second algorithm for **DiskANN's disk-index full-build path** — both initial builds and full rebuilds. PiPNN writes a graph byte-compatible with Vamana's disk format and search API, at **up to 6.3× lower build time** on measured workloads. Vamana stays the default; PiPNN is opt-in behind a Cargo feature. In-memory PiPNN build is intentionally out of scope — DiskANN's in-mem path is for streaming per-point construction, which PiPNN's batch algorithm cannot do. ## Motivation -### Background +### How DiskANN builds today -DiskANN currently builds the disk index with a single algorithm — **Vamana** (`diskann-disk/src/build/builder/`). Vamana incrementally inserts each point into a graph, running a greedy search + `RobustPrune` for each insertion, producing the on-disk format documented in `diskann-disk/src/storage/`. +DiskANN uses one algorithm — **Vamana** — which inserts points one-by-one with greedy search + `RobustPrune`. Clients use it in three modes: -Clients today update indexes in three main ways: - -1. **Incremental** — continuously insert and delete vectors in an existing in-memory graph (Vamana's per-point greedy-search + `RobustPrune` path). The disk index itself is not mutated in place. -2. **Full rebuild** — rebuild the entire graph from scratch on a static snapshot, producing an immutable disk index. -3. **Partitioned full rebuild** — split points into N clusters, build N separate graphs in parallel, then stitch them together with a lightweight merge step to bound peak build-time memory (Vamana's `build_merged_vamana_index` path). +| Mode | Description | +|---|---| +| Incremental | Per-point inserts on an in-memory graph (Vamana). Disk index not mutated. | +| Full rebuild | Rebuild the entire graph from a snapshot; produces an immutable disk index. | +| Partitioned full rebuild | Shard, build, merge — bounds peak RAM (`build_merged_vamana_index`). | -PiPNN, as proposed here, is a faster substitute for paths (2) and (3). Path (1) remains Vamana's responsibility for the foreseeable future (see "PiPNN is algorithmically batch-only" below). +PiPNN is a faster substitute for modes 2 and 3. Mode 1 stays with Vamana — PiPNN's batch design has no efficient per-point insert (see "Batch-only" below). -**PiPNN** (Pick-in-Partitions Nearest Neighbors, [arXiv:2602.21247](https://arxiv.org/abs/2602.21247)) is a partition-based **batch** graph builder, in contrast to Vamana's **incremental** insert + prune. The construction has four phases: +### What PiPNN does -1. **Partition** — Randomized Ball Carving (RBC) recursively splits the dataset into small *overlapping* leaf clusters. Each point lands in `fanout` of its nearest cluster leaders at every recursion level, so every point appears in multiple leaves. Recursion stops when a cluster fits a configured leaf-size cap (`c_max`, typically 256–1024 points). -2. **Local k-NN per leaf** — For each leaf, compute the full pairwise distance matrix as a single batched GEMM (an N×N intra-leaf computation, where N ≈ `c_max`), then extract each point's `leaf_k` nearest neighbors inside the leaf. This is structurally different from a flat scan (1×N query against the whole dataset, e.g. work item [#1036](https://github.com/microsoft/DiskANN/issues/1036)) — every column of the GEMM contributes to every row's top-k, so the cost is amortized across `c_max²` distance evaluations. GEMM batching is the source of most of PiPNN's wall-clock advantage over per-point greedy search. -3. **HashPrune merge** — Edges from all leaves are merged into a per-point reservoir of bounded size (`l_max`, ~64–128). The pruner is keyed by an LSH **angular bucket** of each candidate neighbor: at most one candidate per bucket is retained, and on collision the closer candidate wins. This produces a diverse short-list per point using O(`l_max`) memory per node and O(1) amortized insert work. The merge stage is naturally streamable — edges can be fed in chunks (either generated all at once and replayed from disk, or generated leaf-batch-by-leaf-batch interleaved with HashPrune inserts) to bound peak RAM; see M5 below. -4. **Optional final prune** — A single RobustPrune-style pass (same algorithm Vamana uses, with a configurable `alpha`) applies geometric occlusion to the HashPrune candidates. Used when the workload benefits from explicit graph diversification. +A four-phase **batch** builder: -The output is `Vec>` adjacency lists in the same shape Vamana produces, then handed to the existing disk-layout writer. PQ training and search-side data structures are unchanged. +1. **Partition (RBC).** Randomized Ball Carving recursively splits the dataset into small *overlapping* leaf clusters; each point lands in `fanout` of its nearest leaders at every level. Recursion stops at a leaf-size cap (`c_max`, ~256–1024). +2. **Local k-NN per leaf (GEMM).** For each leaf, compute the full pairwise distance matrix as one batched GEMM (intra-leaf N×N where N ≈ `c_max`), then extract per-point top-`leaf_k`. This is structurally different from a 1×N flat scan ([#1036](https://github.com/microsoft/DiskANN/issues/1036)) — the batching across `c_max²` evaluations is where PiPNN's wall-clock advantage comes from. +3. **HashPrune merge.** Merge edges from all leaves into a per-point reservoir of bounded size (`l_max` ~64–128), keyed by an LSH angular bucket for diversity. Naturally streamable — see memory mitigation. +4. **Optional final RobustPrune.** Same algorithm Vamana uses, applied as a single pass when the workload wants more geometric diversification. -The structural trade-off: Vamana is sequential per insert with fine-grained parallelism and memory-efficient; PiPNN is batch-parallel across leaves with higher peak working memory in exchange for far shorter wall-clock builds. +Output: `Vec>` adjacency lists, handed to the existing disk writer. PQ training and search are unchanged. -### Problem Statement +### Problem statement -Vamana's incremental design scales linearly in points × per-insert search cost, which makes full rebuilds expensive at the scales we operate. Measured baselines: +Vamana's per-point cost scales linearly with point count, making 10M+ full builds the bottleneck: -| Dataset | Vamana build time | +| Dataset | Vamana build | |---|---:| -| Enron 1M (1.087M × 384, fp16, cosine_normalized) | 70s | -| BigANN 10M (10M × 128, fp16, squared_l2) | 358s | -| Enron 10M (10M × 384, fp16, cosine_normalized) | 844s | - -Initial builds at 10M-scale and above, and the frequent full rebuilds that follow them (driven by data churn or parameter sweeps), are the bottleneck. PiPNN's offline benchmarks at matching recall budgets complete the same builds **up to 6.3× faster** while writing the same disk format (full numbers in the Benchmark Results section). This RFC proposes landing PiPNN so teams can opt into faster builds and so we can collect production-relevant signal on whether PiPNN can eventually replace Vamana's full-build path (initial builds and rebuilds alike). +| Enron 1M (384d) | 70s | +| BigANN 10M (128d) | 358s | +| Enron 10M (384d) | 844s | -#### Concrete trade-off hypothesis +PiPNN completes the same builds **up to 6.3× faster** at matching recall (numbers below). -To make the comparison precise rather than headline-only, we frame Stage-1 validation around a fixed-resource hypothesis: +#### Trade-off hypothesis -> Given a worker with fixed CPU cores, RAM budget, and SSD throughput, PiPNN delivers higher index-build throughput (vectors per minute per worker) than Vamana at matching recall, *provided* its working set fits within the RAM budget. When the RAM budget is below PiPNN's one-shot working set, the three-tier dispatch (disk-edges, then merged-shards) keeps PiPNN within or below Vamana's RAM footprint at a documented build-time cost. +> Given a fixed worker (CPU/RAM/SSD), PiPNN delivers higher build throughput than Vamana at matching recall *when its working set fits*. Below that threshold, the three-tier dispatch (one-shot → disk-edges → merged-shards) keeps PiPNN at or under Vamana's RAM footprint. -Concretely, on BigANN 10M with the same 16-thread / NVMe worker: +On BigANN 10M, 16 threads: | RAM budget | PiPNN strategy | PiPNN build | Vamana build | |---:|---|---:|---:| -| ≥ ~12 GB | one-shot | 80–133s | 358s | +| ≥ 12 GB | one-shot | 80–133s | 358s | | 6–12 GB | disk-edges | ~126s | 358s | -| 3–6 GB | merged-shards | ~332s | 358s (partitioned: similar) | -| < 3 GB | merged-shards w/ smaller shards | further degrades | further degrades | - -Two important things this table is **not** claiming: - -- **PiPNN does not auto-scale build time downward when given more RAM than its working set needs.** PiPNN's wall-clock is dominated by HashPrune inserts + leaf-build GEMM. Once the dataset, HashPrune reservoir, and per-thread buffers fit comfortably in RAM, additional RAM headroom does not buy faster builds. (More memory *channels* / higher bandwidth do help, but that is a hardware property, not a budget knob.) -- **Vamana also does not have a "use more RAM to build faster" mode** — its peak RSS is largely set by the dataset + working graph, and giving it more RAM headroom past that does not accelerate the per-insert greedy search. - -So the honest framing is: PiPNN trades a higher minimum RAM budget for a substantially faster build at that budget. Neither algorithm currently converts surplus RAM into faster builds; both convert surplus RAM into "no pressure to use the chunked/shard fallbacks." +| 3–6 GB | merged-shards | ~332s | ~358s (partitioned) | -The numbers above are from initial benchmarks on a single workload and configuration. A dedicated experiment to validate this hypothesis across RAM budgets and worker shapes is part of Stage 1 — see **M6 — Fixed-resource trade-off validation** in Future Work below. +**Neither algorithm uses surplus RAM to build faster.** PiPNN's wall-clock is bottlenecked by HashPrune + GEMM; extra RAM headroom past the working set doesn't help (more memory *channels* / bandwidth does, but that's a hardware property, not a budget knob). The honest framing: PiPNN trades a higher *minimum* RAM budget for a substantially faster build at that budget. Validation: see **M6**. -#### Hybrid update model (Stage 2 direction) +### Hybrid update model (Stage 2 direction) -Vamana and PiPNN write the same on-disk graph format, so a graph built by either algorithm can be loaded by the same search code and, once loaded into memory, can be incrementally edited by Vamana. We exploit this for the production update story: +Both algorithms write the same disk format, so a graph built by either can be loaded and (once in memory) extended by Vamana. Production update story: -- **Bulk / full rebuild → PiPNN.** When a full rebuild is needed, PiPNN is used because it is several times faster than Vamana at this job. -- **Incremental insert → in-memory Vamana.** Between full rebuilds, individual inserts use Vamana's existing greedy-search + RobustPrune insert path **on the in-memory graph** (`diskann::graph::index::DiskANNIndex`). The on-disk index file is not mutated in place — the standing convention is that a refreshed disk index is produced by a full rebuild from the current dataset snapshot. PiPNN's batch design has no natural single-point-insert API and we do not plan to build one. -- **Triggers for a full PiPNN rebuild.** A rebuild is scheduled in response to operationally meaningful events, not just gradual recall drift. The expected triggers include: (a) embedding-model rotation (vectors are no longer comparable to existing ones), (b) schema/parameter retuning (`R`, `L`, `pq_chunks`, distance metric, quantization), (c) large batch inserts that exceed what the in-memory incremental path is sized for, and (d) periodic safety rebuilds on a cadence that depends on observed graph health. DiskANN's existing claim that incremental updates keep recall healthy still holds; PiPNN does not change that, it just makes the eventual rebuild cheaper. +- **Full rebuild → PiPNN.** Several times faster than Vamana. +- **Incremental insert → in-memory Vamana.** Unchanged — applies to the in-memory graph; the disk index file is not mutated in place. +- **Rebuild triggers.** Embedding rotation, schema/parameter retuning, large batch inserts, or periodic safety rebuilds — not just gradual recall drift. DiskANN's claim that incremental updates keep recall healthy still stands; PiPNN just makes the eventual rebuild cheaper. -Because both algorithms produce the same disk format, switching between "fresh PiPNN build" and "Vamana-edited in-mem graph reloaded from a fresh disk build" is transparent to search-side consumers. This answers "should PiPNN implement incremental inserts?" — no, we keep Vamana's in-memory insert path for that, and use the disk index format as the integration point between rebuilds. +This is why we don't need PiPNN to support `insert(point)` — the disk format is the integration point between batch and incremental. -#### Two-stage rollout +### Two-stage rollout -- **Stage 1 (this RFC):** Land PiPNN as an alternative builder for the **disk-index full-build path** — covering both initial builds (no prior index) and full rebuilds (replacing an existing index) — behind a build-algorithm selector. Vamana stays default; PiPNN is opt-in. Stage 1 has explicit milestones (in Future Work) that gate readiness for Stage 2. -- **Stage 2 (separate proposal, conditional on Stage 1 milestones):** Retire the Vamana **disk-index full-build** path (initial builds and rebuilds). Vamana remains the implementation for incremental inserts on the in-memory graph via the hybrid model above. +- **Stage 1 (this RFC).** Land PiPNN as an alternative builder for the disk-index full-build path (initial builds *and* rebuilds), behind a `pipnn` Cargo feature. Vamana stays default. +- **Stage 2 (separate proposal, gated by Stage-1 milestones).** Retire Vamana's full-build path; keep Vamana for in-memory incremental inserts. -In-memory PiPNN build/search is **not part of any stage**. DiskANN's in-memory `DiskANNIndex` path exists primarily to support streaming (per-point) index construction, which is exactly the use case PiPNN's batch algorithm does not address (see "PiPNN is algorithmically batch-only"). Replacing or extending the in-memory builder with PiPNN would offer no incremental capability and duplicate the disk path's value. We therefore list it under unstaged future work rather than as a Stage 1 / Stage 2 milestone. +**In-memory PiPNN build is not in any stage.** The in-mem path exists for streaming construction — exactly what PiPNN's batch design cannot do. See "Out of scope" below. ### Goals (Stage 1) -Stage 1 is scoped to the **disk-index full-build path** — both initial builds (no prior index) and full rebuilds (replacing an existing index). In-memory index construction is explicitly out of scope. - -1. **Algorithm-level pluggability for the disk builder**: introduce a build-algorithm selector to the disk-index build pipeline that routes between Vamana (existing) and PiPNN (new). Existing build sites continue to default to Vamana with no behavior change. -2. **Disk format compatibility**: the PiPNN-built index is byte-compatible with Vamana-built indexes on disk — search, PQ, and storage layouts are unchanged. This is the foundation for the hybrid update model. -3. **Public API compatibility**: the disk-index public API surface (`DiskIndexBuilder::new`, `IndexConfiguration`, `DiskIndexWriter`, JSON config schema) remains backward-compatible. PiPNN configuration is added under a new tagged enum variant. -4. **Feature-parity milestones (disk path only)**: deliver the Vamana disk-build capabilities PiPNN needs to take over both initial builds and full rebuilds in production (see Future Work below). -5. **Documented memory mitigation**: provide a configuration knob (three-tier build) that brings PiPNN's peak RSS to or below Vamana's at the cost of build time. +1. **Pluggable disk builder** — a selector that routes Vamana (default) vs. PiPNN (opt-in). No behavior change at existing call sites. +2. **Disk-format compatibility** — byte-identical to Vamana's output; search/PQ/storage layouts unchanged. +3. **API backward compatibility** — `DiskIndexBuilder`, `IndexConfiguration`, JSON schema all stay additive. +4. **Feature parity for the full-build role** — deliver the Vamana disk-build capabilities PiPNN still lacks (quantization, label filters). +5. **Documented memory mitigation** — a three-tier build path that brings PiPNN's peak RSS to or under Vamana's at a documented build-time cost. ## Proposal ### Workspace structure -Add a new crate, `diskann-pipnn`, that depends on the existing `diskann`, `diskann-linalg`, `diskann-vector`, `diskann-quantization`, and `diskann-utils` crates. **`diskann-pipnn` does not depend on `diskann-disk`.** The PiPNN builder produces a plain `Vec>` adjacency list (defined in terms of core types from `diskann`), and `diskann-disk` consumes that output behind its own `pipnn` Cargo feature. This is intentional: a `diskann-pipnn → diskann-disk → [feature] diskann-pipnn` edge would form a dependency cycle. Keeping the data-flow direction one-way (PiPNN produces, disk consumes) means PiPNN never imports any disk-layout symbols and the feature gate sits cleanly on the consumer side. +Add a crate `diskann-pipnn` depending on `diskann`, `diskann-linalg`, `diskann-vector`, `diskann-quantization`, `diskann-utils`. **It does NOT depend on `diskann-disk`** — that would form a cycle with the consumer-side feature gate. Data flows one-way: PiPNN produces `Vec>`, `diskann-disk` consumes it behind its `pipnn` feature. ```text -diskann/ # core types, traits, search ←┐ used by both -diskann-linalg/ # GEMM/SVD ├─ shared deps -diskann-quantization/ # PQ/SQ training ├─ (no edges -diskann-vector/ # vector representations ├─ to either -diskann-utils/ # threading, file I/O ←┘ builder) - -diskann-pipnn/ # new: PiPNN builder - ↑ produces Vec> - │ -diskann-disk/ # disk index layout, builder, search - └── feature "pipnn" # opt-in: takes Vec> from diskann-pipnn - # and hands it to DiskIndexWriter +diskann, diskann-linalg, diskann-quantization, diskann-vector, diskann-utils + │ (shared deps, no edges to builders) + ┌───────────┴────────────┐ + diskann-pipnn diskann-disk + │ ↑ feature "pipnn" + └───→ Vec> ────┘ ``` ### `BuildAlgorithm` enum -Introduce a tagged enum in `diskann-disk/src/build/configuration/build_algorithm.rs`: - ```rust #[derive(Debug, Clone, Default, PartialEq, Serialize, Deserialize)] #[serde(tag = "algorithm")] pub enum BuildAlgorithm { - /// Default Vamana graph construction. #[default] Vamana, - /// PiPNN: Pick-in-Partitions Nearest Neighbors. #[cfg(feature = "pipnn")] PiPNN { - c_max: usize, // maximum leaf partition size - c_min: usize, // minimum cluster size before merging - p_samp: f64, // RBC leader sampling fraction - fanout: Vec, // per-level fanout - leaf_k: usize, // k-NN within each leaf - replicas: usize, // independent partitioning passes - l_max: usize, // HashPrune reservoir cap - num_hash_planes: usize, // LSH hyperplane count - final_prune: bool, // optional RobustPrune final pass - leader_cap: usize, // hard cap on leaders per level + c_max: usize, c_min: usize, p_samp: f64, + fanout: Vec, leaf_k: usize, replicas: usize, + l_max: usize, num_hash_planes: usize, + final_prune: bool, leader_cap: usize, saturate_after_prune: bool, - num_threads: usize, // 0 = all logical CPUs (matches Vamana) + num_threads: usize, // 0 = all logical CPUs }, } ``` -`Vamana` is the `Default` so every existing call site that constructs `DiskIndexBuildParameters` without specifying an algorithm keeps the existing behavior. - -`DiskIndexBuildParameters` gains a `build_algorithm: BuildAlgorithm` field and a constructor pair: `new` (defaults to Vamana, no PiPNN dep) and `new_with_algorithm` (explicit). The JSON schema for benchmark configs gains an optional `build_algorithm` block that, when present, deserializes via `#[serde(tag = "algorithm")]` into one of the variants above. +`DiskIndexBuildParameters` gains a `build_algorithm` field defaulting to `Vamana`. The JSON config gains an optional `build_algorithm` block. -**`num_threads`.** Like Vamana, PiPNN accepts `num_threads` as a build-time parameter (default `0` = all logical CPUs). Thread count has a small, bounded effect on peak RSS: each worker holds thread-local stripe buffers in the partition phase (~`stripe_kb`, typically 16 MB) and thread-local leaf-build scratch (~`c_max² × 4 B`, ≈ 256 KB at `c_max=256`). Total per-thread overhead is ~16–20 MB; at 48 threads this is ~960 MB of incremental resident set on top of the dataset and HashPrune reservoir, which dominate the peak. We do not consider `num_threads` a memory-budget knob — to bound RAM, use `build_ram_limit_gb` (see Memory mitigation). +**`num_threads` is not a RAM knob.** Per-thread overhead is small (~16–20 MB/thread: stripe buffers + leaf-build scratch). Use `build_ram_limit_gb` to bound RAM. -**Deserialization behavior when the `pipnn` feature is disabled — scope:** this affects only **JSON configs**, not the index files themselves. Because PiPNN and Vamana write byte-identical disk formats, an index *file* built by either algorithm is loaded by the same search code and does not require the `pipnn` feature at load time. The restriction below applies to the *build-time configuration* that selects which algorithm to invoke. Because `BuildAlgorithm::PiPNN` is gated by `#[cfg(feature = "pipnn")]`, a binary built without the feature does not see that variant. A JSON config containing `"algorithm": "PiPNN"` fed to such a binary fails at parse time with a serde error along the lines of `unknown variant 'PiPNN', expected 'Vamana'`. This is a clear, fail-fast diagnostic — not a backward-compatibility regression. Configs that omit `build_algorithm` (or set `"algorithm": "Vamana"`) parse identically across feature combinations. Documentation alongside the config schema will call this out so users know that PiPNN configs require a PiPNN-enabled build. +**Feature-flag deserialization applies to JSON configs only, not index files.** An index file built by either algorithm loads with or without the `pipnn` feature. A JSON config with `"algorithm": "PiPNN"` fed to a binary built without the feature fails fast with `unknown variant 'PiPNN'`. Configs that omit `build_algorithm` parse identically across feature builds — not a backward-compat regression. ### Builder dispatch -In `DiskIndexBuilder::build()` (or the new equivalent), dispatch on `BuildAlgorithm`: - ```rust match build_parameters.build_algorithm() { - BuildAlgorithm::Vamana => - self.build_inmem_vamana_index().await, + BuildAlgorithm::Vamana => self.build_inmem_vamana_index().await, #[cfg(feature = "pipnn")] - BuildAlgorithm::PiPNN { .. } => - self.build_inmem_pipnn_index().await, + BuildAlgorithm::PiPNN { .. } => self.build_inmem_pipnn_index().await, } ``` -The PiPNN path produces a `Vec>` adjacency list using `diskann_pipnn::builder::build_typed`, then hands it to the existing disk-layout writer (`DiskIndexWriter`) which emits the same format Vamana does (header, per-node adjacency, frozen start-point block). PQ training and disk-sector layout are reused unchanged. +The PiPNN branch produces `Vec>` via `diskann_pipnn::builder::build_typed`, then hands it to the existing `DiskIndexWriter`. PQ training and disk-sector layout are reused. -### Compatibility surface +### Compatibility | Surface | Status | |---|---| -| On-disk graph format (header + adjacency + frozen start point) | unchanged | -| PQ codes / SQ codes on disk | unchanged (trained the same way) | -| Search API (`DiskANNIndex::search`, beam_width, search_list, recall_at, num_nodes_to_cache, search_io_limit, filters API) | unchanged | -| Public Rust types (`IndexConfiguration`, `DiskIndexWriter`, `DiskIndexBuildParameters`) | additive only (new field with default) | -| Benchmark JSON config | additive only (new optional `build_algorithm` field) | -| C/C++ FFI (if any) | unchanged | +| On-disk graph format | unchanged | +| PQ/SQ codes on disk | unchanged | +| Search API | unchanged | +| Public Rust types | additive only (new field with default) | +| Benchmark JSON config | additive only (new optional field) | -Since the produced graph and PQ/SQ artifacts are byte-identical in format, a search-only consumer cannot tell which builder wrote the index. +A search-only consumer cannot tell which builder produced the index. ### Feature gating -- The `diskann-disk` crate gains a `pipnn` Cargo feature. With it disabled, `BuildAlgorithm::PiPNN` does not exist at the type level — no runtime branch, no extra binary size, no dependency on `diskann-pipnn`. -- The benchmark binary and any production binary that wants PiPNN must enable the `pipnn` feature on `diskann-disk` (or transitively). -- The default features set continues to not include `pipnn`, matching the principle that the existing Vamana path is what ships unchanged. - -### What this RFC does *not* change - -- Distance metrics, vector representations, storage layouts. -- The greedy-search / RobustPrune logic used by Vamana — both stay as-is for the Vamana path. PiPNN brings its own equivalents internally (HashPrune + optional final RobustPrune). -- PQ training, search-time decoders, and the disk layout. -- Public traits, types, or method signatures outside the new optional fields/variants described above. +- `diskann-disk` gains a `pipnn` Cargo feature. Default features do **not** include it. +- With the feature off: `BuildAlgorithm::PiPNN` does not exist at the type level. No runtime branch, no extra binary size, no `diskann-pipnn` dependency. ## Trade-offs -### PiPNN is algorithmically batch-only - -This is a property of the algorithm, not of our implementation. The PiPNN paper ([arXiv:2602.21247](https://arxiv.org/abs/2602.21247)) is explicit that the design departs from incremental methods by "eliminating search from the graph-building process altogether": instead of running a greedy search for each new point's neighbors, PiPNN partitions the dataset, then computes neighbors for all points within each leaf as a single batched operation. The paper describes no per-point insertion algorithm and reports no streaming results. The framing throughout is "fast one-shot construction on a static dataset." - -Where this batch assumption is load-bearing: - -- **Partition (RBC)** samples leaders from the global dataset distribution and recursively splits into overlapping leaves. Leader quality depends on representativeness of the full data. Adding new points to an existing partition works mechanically (assign to fanout nearest existing leaders), but the *partition itself* is a one-shot decision — the cluster structure can drift as the data distribution shifts. -- **Leaf k-NN via GEMM** is where PiPNN gets its speed. A leaf's pairwise distance matrix is computed in one batched matrix multiplication and amortizes per-leaf overhead across `c_max²` distance evaluations. **This is the algorithm's central optimization, and it requires knowing the leaf membership before computing distances.** Inserting one point against an existing leaf reduces to `c_max` individual distance computations, which is no faster than what Vamana already does per insert — the batching advantage evaporates at batch size 1. -- **HashPrune** is the one PiPNN component that *is* online — it accepts an arbitrary stream of `(point, neighbor, distance)` edges and maintains a bounded reservoir per point. So the merge stage doesn't structurally object to incremental updates. But by the time you have edges to feed it, you've already paid for the partition assignment and the per-leaf distance work. -- **Final RobustPrune** is per-point and naturally re-runnable. - -In other words: of PiPNN's four phases, two (partition, leaf k-NN) are batch-by-design and would need to be replaced for true incremental construction. Replacing them defeats the purpose — the algorithm degenerates into something more like Vamana but without Vamana's online-friendly graph-search structure. +### Batch-only (algorithmic, not implementation) -The realistic alternatives for "PiPNN-like incremental" are all mini-batch variants (accumulate N new points → run a partial partition + leaf-build), which works fine but isn't really an incremental algorithm. Vamana already does per-point online inserts correctly; we keep it for that role. +The PiPNN paper "eliminates search from graph-building" — partition first, then one batched GEMM per leaf. The batching advantage **requires knowing leaf membership before computing distances**; at batch size 1, PiPNN reduces to per-point distance work no faster than Vamana's greedy insert. Two phases (partition, leaf k-NN) are batch-by-design; HashPrune and final RobustPrune happen to be online, but you've already paid for the batch phases by the time you reach them. This is why Vamana keeps the incremental role. -This is why the Motivation section's hybrid update model exists: **PiPNN for full rebuilds, Vamana for inserts**, with the disk format as the integration point. PiPNN is not a drop-in replacement for code paths that rely on `insert(point)` semantics — and the limitation is the algorithm, not just our crate's API surface. +### Memory vs. build speed -### Memory vs build speed - -PiPNN's batch design holds more working memory during build than Vamana's incremental design. The dominant overhead is the **HashPrune reservoir** — a bounded per-point candidate list (`l_max × 8 bytes` per point) that PiPNN needs to merge edges from overlapping leaves. Vamana has no equivalent: it writes neighbors directly into the final adjacency list as it inserts each point. - -For example, on BigANN 10M (10M × 128 fp16, `c_max=256, fanout=[10,3], leaf_k=3, l_max=64`): +PiPNN holds more working memory than Vamana — dominated by the **HashPrune reservoir** (`l_max × 8 B` per point). On BigANN 10M (`c_max=256, fanout=[10,3], leaf_k=3, l_max=64`): | | PiPNN one-shot | Vamana | |---|---:|---:| | Peak RSS | 10.8 GB | 6.3 GB | -That delta — roughly **+4.5 GB**, dominated by HashPrune (`10M × 64 × 8 ≈ 5 GB`) plus smaller PiPNN-only working buffers (LSH sketches, partition leaf indices) — is the cost of the batch design and not a bug. It is the working set the algorithm explicitly needs. The next subsection describes the mitigation. - -### Memory mitigation: three-tier build +The +4.5 GB delta is the working set the algorithm needs, not a bug. Mitigation via the three-tier build (dispatched by the existing `build_ram_limit_gb` knob): -For deployments that need PiPNN's build speed but cannot afford its working memory, we reuse the same **`MemoryBudget`** parameter Vamana already uses for sharded builds. When `build_ram_limit_gb` is below a threshold, PiPNN switches to a chunked path that spills HashPrune reservoirs to disk between leaf batches. Measurements on the same dataset as the table above (BigANN 10M): - -| Strategy | Peak RSS | Build time | Recall@10 L=50 | Trigger | +| Strategy | Peak RSS | Build | Recall@10 L=50 | Trigger | |---|---:|---:|---:|---| -| **One-shot** (in-memory) | 10.8 GB | 133s | 95.00% | RAM ≥ ~32 GB | -| **Disk-edges** (per-batch reservoir flush) | 6.4 GB | 126s | 95.00% | RAM 8-32 GB | -| **Merged shards** (per-shard graph, then merge) | 3.3 GB | 332s | 95.31% | RAM 4-8 GB | - -Note on disk-edges build time (~126s vs one-shot's ~133s): the disk-edges path is not slower despite the extra I/O. The smaller resident working set means HashPrune inserts touch fewer cache lines per operation, and the spill to disk is sequential append-only and overlaps with leaf-build compute. Net: roughly the same wall-clock as one-shot in this benchmark, with significantly lower peak RSS. +| One-shot | 10.8 GB | 133s | 95.00% | RAM ≥ ~32 GB | +| Disk-edges | 6.4 GB | 126s | 95.00% | RAM 8–32 GB | +| Merged shards | 3.3 GB | 332s | 95.31% | RAM 4–8 GB | -The merged-shards path **uses less peak RSS than Vamana** (3.3 GB vs Vamana's 6.3 GB on this same dataset) at a 2.5× build-time cost. The disk-edges path matches Vamana on RAM at 3× the build speed. +Disk-edges matches Vamana's RAM at ~3× the build speed. Merged-shards uses *less* RAM than Vamana (3.3 vs. 6.3 GB) at a 2.5× build-time cost. -The control knob is the existing `build_ram_limit_gb` config; no new parameter is introduced. The dispatch happens inside `build_inmem_pipnn_index()`. +### Alternatives considered -### Stage-1 separate path vs immediate-replace - -We considered three options: - -**A. (Chosen) Add PiPNN as an alternative behind a feature flag.** Default is Vamana, opt-in for PiPNN. Existing users see no change. Lets us collect production validation signal without risk. - -**B. Replace Vamana with PiPNN immediately.** Cleaner code, smaller binary. Rejected because: (1) PiPNN lacks checkpoint, full quantization, and label-filtered search support today — replacing now is a regression; (2) we have not validated PiPNN under the full production workload mix; (3) recall behavior on edge-case datasets is not yet characterized at production scale. - -**C. Maintain PiPNN as a fully separate top-level binary/crate.** Rejected because it would duplicate the PQ training, disk-layout writer, search pipeline, and benchmark harness — adding maintenance burden with no compatibility benefit. +| | Choice | Rejected because | +|---|---|---| +| A | (Chosen) Add PiPNN behind a feature flag | — | +| B | Replace Vamana immediately | PiPNN lacks checkpoint / full quantization / label-filter parity; production-validation gap | +| C | Separate top-level crate/binary | Duplicates PQ training, disk writer, search pipeline — maintenance burden, no compatibility benefit | ### Algorithm risks -PiPNN's recall depends on partition overlap (controlled by `fanout`) and reservoir size (`l_max`). On the workloads in the benchmark section recall matches or beats Vamana at the chosen settings, but the parameter space is larger than Vamana's `R`/`L_build`. Stage-1 mitigates by keeping Vamana as the default and providing reference parameter sets in code comments and benchmark configs. +Recall depends on partition overlap (`fanout`) and reservoir size (`l_max`). Parameter space is larger than Vamana's `R`/`L_build`. Stage 1 mitigates by keeping Vamana as default and shipping reference parameter sets per workload class. ## Benchmark Results -All benchmarks run on Azure `Standard_L16s_v3` (Intel Xeon Platinum 8370C, 16 threads, NVMe), with `RUSTFLAGS=-C target-cpu=native`. +Azure `Standard_L16s_v3`, 16 threads, NVMe, `RUSTFLAGS=-C target-cpu=native`. ### Build time -| Dataset | Vamana | PiPNN (one-shot) | Speedup | +| Dataset | Vamana | PiPNN | Speedup | |---|---:|---:|---:| -| Enron 1M (1.087M × 384, fp16, cosine_normalized) | 70s | 13s | 5.4× | -| BigANN 10M (10M × 128, fp16, squared_l2) | 358s | 80.2s | 4.5× | -| Enron 10M (10M × 384, fp16, cosine_normalized) | 844s | 133s | 6.3× | +| Enron 1M (384d) | 70s | 13s | 5.4× | +| BigANN 10M (128d) | 358s | 80s | 4.5× | +| Enron 10M (384d) | 844s | 133s | 6.3× | -### Recall / QPS — BigANN 10M +### BigANN 10M — recall × QPS -Config: PiPNN `c_max=256, fanout=[10,3], leaf_k=3, l_max=64, hp=12, pq_chunks=64, no final_prune`. Vamana `R=64, L=64, pq_chunks=64`. +Default PiPNN (`c_max=256, fanout=[10,3], leaf_k=3, l_max=64, pq_chunks=64`) vs. Vamana (`R=64, L=64, pq_chunks=64`): -| L | PiPNN Recall@10 | PiPNN QPS | Vamana Recall@10 | Vamana QPS | +| L | PiPNN R@10 | PiPNN QPS | Vamana R@10 | Vamana QPS | |---|---:|---:|---:|---:| | 10 | 77.76% | 10,670 | 79.23% | 11,618 | | 50 | 96.31% | 5,574 | 97.10% | 5,940 | | 100 | 98.61% | 3,430 | 99.01% | 3,568 | -With higher-recall PiPNN config (`c_max=512, fanout=[10,4], leaf_k=3, l_max=128, final_prune`), PiPNN exceeds Vamana on recall at L=50 (97.22% vs 97.10%) and L=100 (99.21% vs 99.01%) at the cost of 143s build time (still 2.5× faster than Vamana's 358s). +With a higher-recall config (`c_max=512, fanout=[10,4], l_max=128, final_prune`), PiPNN matches/exceeds Vamana at L=50 (97.22%) and L=100 (99.21%) at 143s (still 2.5× faster). -### Recall / QPS — Enron 10M (384d) +### Enron 10M (384d) — recall × QPS -Config: PiPNN `c_max=256, fanout=[8,3], leaf_k=2, l_max=64, hp=14, pq_chunks=192`. Vamana `R=64, L=72, pq_chunks=192`. +PiPNN (`c_max=256, fanout=[8,3], leaf_k=2, l_max=64, pq_chunks=192`) vs. Vamana (`R=64, L=72, pq_chunks=192`): -| L | PiPNN Recall@1000 | PiPNN QPS | Vamana Recall@1000 | Vamana QPS | +| L | PiPNN R@1000 | PiPNN QPS | Vamana R@1000 | Vamana QPS | |---|---:|---:|---:|---:| | 1000 | 89.99% | 378 | 89.33% | 384 | -| 1500 | 95.19% | 255 | 94.12% | 258 | | 2000 | 96.46% | 192 | 95.36% | 195 | -| 2500 | 97.23% | 154 | 96.15% | 155 | | 3000 | 97.74% | 129 | 96.68% | 130 | -PiPNN beats Vamana on recall at every L on the 384d Enron 10M workload, at parity QPS and 6.3× faster build. - -## Future Work - -The Stage 1 milestones below are gating items for Stage 2 (retiring Vamana's disk-index full-build path — initial builds and rebuilds). Each must be addressed before that proposal is credible. M0 is the foundation shipped by this RFC; M3–M9 are deferred to follow-on work and ordered by dependency, not strict calendar sequence — some can run in parallel. M1 (in-memory build/search) and M2 (checkpoint/resume) are intentionally absent — see "Out of scope: not part of any stage" and "Deferred to Stage 2" below. - -### M0 — Skeleton integration - -The foundation that ships first. - -- **Scope:** introduce the `diskann-pipnn` crate, the `BuildAlgorithm` enum, and the dispatch in `DiskIndexBuilder` behind a `pipnn` Cargo feature. -- **Config surface:** JSON config gains an optional `build_algorithm` block; default behavior unchanged. -- **Compatibility:** PiPNN-built indexes are read by the existing search pipeline unchanged (the on-disk format is identical) and produce recall numbers within the tolerances the existing disk-index test suite enforces. -- **CI:** benchmark binary runs with `--features pipnn` on a small smoke test (SIFT-1M). - -M1, M3–M5 close the feature-parity gaps in Stage 1; M6–M9 are validation and operational readiness. Checkpoint/resume (previously M2) is deferred to Stage 2 — see "Deferred to Stage 2" below for the rationale. +PiPNN beats Vamana on recall at every L at parity QPS — and 6.3× faster build. -### M3 — Feature parity: quantized vector support +## Future Work — Stage-1 Milestones -PiPNN currently has only a `SQ1` (1-bit) build path. +These are gating items for Stage 2. M0 ships in this RFC; M3–M9 are follow-on work, parallelizable where dependencies allow. **M1** (in-memory build) and **M2** (checkpoint/resume) are intentionally absent — see "Out of scope" and "Deferred to Stage 2". -- **Scope:** extend the build to accept `QuantizationType::SQ { nbits, standard_deviation }` for the same `nbits` values Vamana supports (`SQ_2`, `SQ_4`, `SQ_8`). -- **Reuse:** trained `ScalarQuantizer` from `diskann-quantization`; do not duplicate quantizer training. -- **Implementation:** the leaf-build distance kernel needs an `nbits`-aware path. Today the kernel is either FP (GEMM) or 1-bit Hamming. -- **Validation:** PiPNN at `SQ_8` produces recall within 0.5% of FP for BigANN 10M and Enron 10M, matching the Vamana SQ_8 baseline. +### M0 — Skeleton integration (this RFC) +Crate, `BuildAlgorithm` enum, dispatch behind `pipnn` Cargo feature. JSON config gains optional `build_algorithm`. CI smoke test (SIFT-1M) with `--features pipnn`. -*Note: build-time Product Quantization (PQ-distance during graph construction) is not currently used by Vamana in any production path and is out of scope.* +### M3 — Quantization parity +Extend PiPNN beyond `SQ1` to `SQ_2/4/8`, reusing the trained `ScalarQuantizer`. **Pass:** SQ_8 recall within 0.5% of FP on BigANN 10M and Enron 10M. -### M4 — Feature parity: label-filtered indexes +### M4 — Label-filtered indexes +Run filter benchmark configs with `BuildAlgorithm::PiPNN`; confirm filter-recall within ±1% of Vamana. Partition may need label-aware leaf assignment for high-cardinality labels. -PiPNN-built graphs already work with the existing search-time filter pipeline (`diskann-label-filter`) because the disk format is the same. The build-time flow for filter-aware indexes has not been exercised end-to-end. +### M5 — Three-tier memory dispatch +Implement and validate the disk-edges + merged-shards paths selected by `build_ram_limit_gb`. **Pass:** at `build_ram_limit_gb=4`, PiPNN-merged on BigANN 10M has peak RSS ≤ 4 GB and recall within 1% of one-shot. -- **Scope:** run the filter benchmark JSON configs with `BuildAlgorithm::PiPNN`; confirm filter-recall numbers match Vamana's. -- **Risk:** the partition phase may need label-aware leaf assignment for high-cardinality labels. -- **Validation:** filter-recall on a representative labeled dataset within ±1% of Vamana's filter-recall. - -### M5 — Memory mitigation: three-tier dispatch - -Implement two memory-constrained PiPNN paths and select among them via the existing `build_ram_limit_gb` knob. - -- **Disk-edges:** today's prototype generates all leaf edges first, spills them to disk, then streams chunks back into HashPrune. An alternative we plan to evaluate is to interleave the two — write partition metadata to disk and run leaf-build + HashPrune in chunks (build edges for the first N leaves' points, flush their adjacency lists, then move on). Both variants bound the resident HashPrune reservoir; the second avoids the full edge-set materialization at the cost of a second pass over the partition. -- **Merged-shards:** per-shard graphs built independently then merged, mirroring Vamana's `build_merged_vamana_index` pipeline at `diskann-disk/src/build/builder/build.rs:327`. The existing shard merger is reused. -- **Dispatch:** inside `build_inmem_pipnn_index()` — no new public parameter. -- **Validation:** at `build_ram_limit_gb=4`, the PiPNN-merged path on BigANN 10M produces peak RSS ≤ 4 GB and recall within 1% of one-shot PiPNN. +Two disk-edges variants are on the table: (i) materialize all leaf edges then stream HashPrune (current prototype), or (ii) interleave leaf-build + HashPrune in chunks. The second avoids full edge-set materialization at the cost of a second partition pass. ### M6 — Fixed-resource trade-off validation +Validates the **trade-off hypothesis** from the Problem Statement. -This milestone validates the **concrete trade-off hypothesis** stated in the Problem Statement: under a fixed worker shape (CPU cores, RAM budget, SSD throughput), PiPNN delivers higher build throughput than Vamana at matching recall when its working set fits, and remains competitive (via the three-tier dispatch in M5) when it does not. The output of this milestone is the evidence behind the per-budget recommendation in the Stage-1 deployment guide. - -- **Fixed worker shape per run.** Lock CPU cores (e.g. 16), SSD model/throughput, and a RAM ceiling enforced via cgroups (`memory.max`) so the build *cannot* exceed it. RAM-budget sweep on BigANN 10M: `{3, 6, 8, 12, 16, 24, 32}` GB at minimum. Include at least one row each for Enron 10M (higher dim, larger reservoir) and a 100M-scale dataset (one budget per algorithm sufficient to fit). -- **Algorithm × strategy cells.** For each RAM budget, run: Vamana one-shot, Vamana partitioned, PiPNN one-shot (if fits), PiPNN disk-edges, PiPNN merged-shards. Skip cells whose minimum working set exceeds the budget — those count as "OOM, not supported at this budget" and are part of the result, not a gap. -- **Metrics captured per cell.** Wall-clock build time, peak RSS (via heaptrack or `/usr/bin/time -v`), CPU utilization (`pidstat`), SSD bytes read/written, recall@K at L=50/100/L_target, and queries-per-second at matching recall. Throughput reported as **vectors per minute per worker** so different worker shapes compare directly. -- **Hypotheses to confirm or falsify.** - 1. PiPNN's wall-clock advantage over Vamana persists across all RAM budgets where its working set fits (one-shot or disk-edges variant). - 2. PiPNN's merged-shards path matches or beats Vamana's partitioned-rebuild at the same RAM ceiling on build time *and* recall. - 3. Neither algorithm reduces build time when given RAM headroom past its working-set requirement (validates the "surplus RAM doesn't buy speed" claim). - 4. PiPNN's per-thread overhead is bounded as stated (~16–20 MB/thread) and `num_threads` is not a hidden RAM knob. -- **Out-of-budget behavior.** Each (algorithm × budget) cell that cannot complete is recorded as such — explicit "PiPNN one-shot not supported at 6 GB on BigANN 10M" is a valid result, not a failed experiment. -- **Pass criterion for Stage 2 readiness.** A documented matrix where each budget has a clearly-better algorithm (or "tie") at matching recall, with no surprise cells that contradict the Problem Statement's hypothesis. Surprises must be either reproduced and explained, or treated as Stage-1 blockers. - -### M7 — Production validation: recall × QPS × dimensionality matrix +- **Setup.** Lock CPU/SSD on a fixed worker; enforce RAM via cgroups. Sweep RAM `{3, 6, 8, 12, 16, 24, 32}` GB on BigANN 10M; include rows for Enron 10M and a 100M-scale dataset. +- **Cells.** Vamana one-shot, Vamana partitioned, PiPNN one-shot, PiPNN disk-edges, PiPNN merged-shards. OOM cells are valid results. +- **Metrics.** Wall-clock, peak RSS, CPU util, SSD bytes, recall@K, QPS — reported as **vectors/min/worker** for cross-shape comparison. +- **Pass.** Documented matrix with a clearly-better algorithm (or tie) per budget at matching recall. Surprises are Stage-1 blockers. -End-to-end validation on the full production workload mix (independent of the resource matrix in M6). +### M7 — Production validation: recall × QPS × dimensionality +End-to-end on the full workload mix. Datasets: BigANN, Enron, plus one production-representative. Scales 10M and 100M (billion if hardware permits). Metrics `squared_l2` and `cosine_normalized`. **Pass:** per cell, PiPNN recall@K within ±1% of Vamana's at matching QPS, or higher QPS at matching recall. -- **Datasets:** at minimum three families (BigANN, Enron, plus one production-representative). -- **Scales:** 10M and 100M; one billion-scale sample if hardware permits. -- **Metrics:** `squared_l2` and `cosine_normalized`. -- **Pass criterion:** for each (dataset, scale, metric) cell, PiPNN recall@K is within Vamana's recall ±1% at matching QPS, *or* higher QPS at matching recall. -- **Out-of-band cells** are documented as "PiPNN not yet recommended for X" rather than blocking Stage 2 entirely. - -### M8 — Production validation: hybrid update model - -Validate the Stage-2 hybrid loop end-to-end. - -- **Sequence:** PiPNN build → N incremental Vamana inserts representing production churn → measure recall decay vs. graph age → trigger PiPNN rebuild from snapshot → confirm post-rebuild recall restored. -- **Output:** a recommended "quality decay threshold" for production rebuild triggers, derived from the measured decay curve. -- **Disk-format compatibility test:** confirm Vamana's incremental-insert path reads PiPNN-produced graphs correctly. This is the load-bearing compatibility check for the hybrid model. +### M8 — Hybrid update model validation +PiPNN build → N incremental Vamana inserts → measure recall decay → trigger PiPNN rebuild → confirm recall restored. Output: a recommended "quality decay threshold" for production rebuild triggers. Also confirms Vamana's in-mem insert path reads PiPNN graphs correctly. ### M9 — Operational readiness - -- **Telemetry:** emit per-phase timing and peak RSS via the existing OpenTelemetry tracer, comparable to Vamana's spans. -- **Documentation:** replace experimental notes in `CLAUDE.md` with a permanent doc covering recommended parameters per workload class (dim × scale × metric). -- **Runbook:** failure modes (OOM under one-shot, partition timeout, `l_max` saturation), diagnosis, recovery. -- **Defaults:** parameter recommendations baked into the JSON config builder so users don't hand-tune for common cases. +Telemetry (per-phase timing + RSS via existing OTel tracer), permanent docs replacing experimental `CLAUDE.md` notes, runbook (OOM, partition timeout, `l_max` saturation), default parameter recommendations per workload class. ### Deferred to Stage 2 -- **Checkpoint / resume (was M2).** Vamana's checkpoint/resume is a *streaming* mechanism — it relies on the per-point incremental insert order to define natural checkpoint boundaries. PiPNN's batch design has no equivalent monotonic insertion sequence: partition output, per-leaf GEMM, and HashPrune merge are all coarse-grained whole-phase artifacts rather than fine-grained incremental progress. A useful PiPNN checkpoint scheme would therefore *not* mirror Vamana's; it would need new design choices about which phase boundaries to materialize, at what granularity, and whether the cost-benefit justifies the extra disk I/O. Empirically, PiPNN's full BigANN-10M build runs in ~80 s, so the operational value of resuming a partially completed build is materially lower than for Vamana's multi-hour rebuilds. We defer checkpoint design until Stage 2, when the production rebuild cadence and observed failure modes will tell us whether it is needed and what shape it should take. +- **Checkpoint / resume.** Vamana's streaming checkpoint design doesn't fit PiPNN's batch phases. Useful boundaries (partition output, post-extract) would need a different scheme, and operational value is lower (PiPNN's BigANN-10M build is ~80s). Defer until Stage 2 reveals the production rebuild cadence. - *Note on determinism for any future checkpoint validation:* PiPNN is a parallel algorithm (rayon-parallel partition, leaf-build GEMM, and HashPrune merge), so byte-identical output across runs — and therefore across "resumed vs. never-interrupted" runs — is **not** a free property. It would require extra determinism work (fixed thread schedule, deterministic reduction order in the HashPrune reservoir, seeded LSH hyperplanes). The right validation criterion for a resumed build is **recall parity with a non-resumed build**, not byte-identical adjacency lists. + *Determinism note:* PiPNN is rayon-parallel — byte-identical output across runs is not free (would need fixed thread schedule, deterministic reductions, seeded LSH). For any future resumed-build test, the right validation criterion is **recall parity**, not byte-identity. ### Out of scope: not part of any stage -These are explicitly *not* on a Stage 1 or Stage 2 roadmap. They may be revisited if a future workload demands them, but they are not gating items for either stage. - -- **In-memory PiPNN build / in-memory index population (was M1).** DiskANN's in-memory `DiskANNIndex` exists primarily to support streaming per-point construction and online inserts — which is exactly what PiPNN's batch design cannot do efficiently (see "PiPNN is algorithmically batch-only"). Building a `DiskANNIndex` from PiPNN-produced adjacency lists is mechanically possible (the data structures are compatible) but offers no incremental capability, duplicates the disk path's value, and would force `diskann-pipnn` to take a runtime dependency on the in-mem graph crate. We defer indefinitely; if a non-streaming in-mem consumer ever needs PiPNN's build speed, the simpler answer is "build to disk, then load." -- **Build-time PQ distance kernel.** Not used by Vamana in production paths today; deferred indefinitely. -- **PiPNN incremental insert API.** The hybrid model (PiPNN rebuild + Vamana inserts) removes the need. -- **PiPNN incremental delete API.** Same reason. -- **Frozen-point semantics differences.** PiPNN writes the dataset medoid as the single frozen start point, same as Vamana's default. Already byte-compatible; no work required. +- **In-memory PiPNN build (was M1).** The in-mem `DiskANNIndex` exists for streaming construction — exactly what PiPNN can't do efficiently. Building one from PiPNN adjacency lists is mechanically possible but offers no incremental capability and would force `diskann-pipnn` to depend on the in-mem graph crate. If a non-streaming in-mem consumer ever needs PiPNN's speed: build to disk, then load. +- **Build-time PQ distance kernel.** Not used by Vamana in production today. +- **PiPNN incremental insert/delete API.** Hybrid model removes the need. +- **Frozen-point semantics.** PiPNN writes the medoid as the single frozen start point — already byte-compatible with Vamana's default. - **Multi-vector index support.** Revisit only if a production workload requires it. ## References From da8b2ac076bd7956cf60ecafc96af44d7b597020 Mon Sep 17 00:00:00 2001 From: Weiyao Luo <9347182+SeliMeli@users.noreply.github.com> Date: Thu, 14 May 2026 04:35:41 +0000 Subject: [PATCH 11/14] docs(rfc): remove hybrid update model validation from Stage 1 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Stage 1 covers only build-from-scratch and full rebuilds with PiPNN. Validating the Stage-2 hybrid loop (PiPNN build → incremental Vamana inserts → rebuild) doesn't belong in Stage 1: there is no in-production hybrid behavior to characterize until Stage 2 actually adopts the model. - Remove M8 (Hybrid update model validation) from the milestone list. - Move it to "Deferred to Stage 2" with the rationale; note that the one-shot disk-format-compatibility check is sufficient at Stage 2 entry. - Update the milestone-list intro to call out that M1, M2, and now M8 are intentionally absent, and add a sentence stating Stage 1 exercises no hybrid behavior. The "Hybrid update model (Stage 2 direction)" motivation section is preserved — it explains why PiPNN doesn't need insert(point), which remains true regardless of when hybrid validation runs. Co-Authored-By: Claude Opus 4.7 (1M context) --- rfcs/01049-pipnn-integration.md | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/rfcs/01049-pipnn-integration.md b/rfcs/01049-pipnn-integration.md index 8994f6934..45e7b2d83 100644 --- a/rfcs/01049-pipnn-integration.md +++ b/rfcs/01049-pipnn-integration.md @@ -232,7 +232,7 @@ PiPNN beats Vamana on recall at every L at parity QPS — and 6.3× faster build ## Future Work — Stage-1 Milestones -These are gating items for Stage 2. M0 ships in this RFC; M3–M9 are follow-on work, parallelizable where dependencies allow. **M1** (in-memory build) and **M2** (checkpoint/resume) are intentionally absent — see "Out of scope" and "Deferred to Stage 2". +These are gating items for Stage 2. M0 ships in this RFC; M3–M7 and M9 are follow-on work, parallelizable where dependencies allow. **M1** (in-memory build), **M2** (checkpoint/resume), and **M8** (hybrid update model) are intentionally absent — see "Out of scope" and "Deferred to Stage 2". Stage 1 is strictly about build-from-scratch and full rebuilds with PiPNN; no hybrid behavior is exercised or validated here. ### M0 — Skeleton integration (this RFC) Crate, `BuildAlgorithm` enum, dispatch behind `pipnn` Cargo feature. JSON config gains optional `build_algorithm`. CI smoke test (SIFT-1M) with `--features pipnn`. @@ -259,14 +259,13 @@ Validates the **trade-off hypothesis** from the Problem Statement. ### M7 — Production validation: recall × QPS × dimensionality End-to-end on the full workload mix. Datasets: BigANN, Enron, plus one production-representative. Scales 10M and 100M (billion if hardware permits). Metrics `squared_l2` and `cosine_normalized`. **Pass:** per cell, PiPNN recall@K within ±1% of Vamana's at matching QPS, or higher QPS at matching recall. -### M8 — Hybrid update model validation -PiPNN build → N incremental Vamana inserts → measure recall decay → trigger PiPNN rebuild → confirm recall restored. Output: a recommended "quality decay threshold" for production rebuild triggers. Also confirms Vamana's in-mem insert path reads PiPNN graphs correctly. - ### M9 — Operational readiness Telemetry (per-phase timing + RSS via existing OTel tracer), permanent docs replacing experimental `CLAUDE.md` notes, runbook (OOM, partition timeout, `l_max` saturation), default parameter recommendations per workload class. ### Deferred to Stage 2 +- **Hybrid update model validation (was M8).** End-to-end validation of the Stage-2 loop — PiPNN build → incremental Vamana inserts → recall-decay curve → PiPNN rebuild — belongs with the Stage-2 proposal that actually adopts the hybrid model. Stage 1 only exercises the full-build path, so there is no in-production hybrid behavior to characterize yet. The disk-format-compatibility check (Vamana's in-mem insert path reading a PiPNN-produced graph) is a one-shot sanity test that can be performed at Stage 2 entry; it does not need to gate Stage 1. + - **Checkpoint / resume.** Vamana's streaming checkpoint design doesn't fit PiPNN's batch phases. Useful boundaries (partition output, post-extract) would need a different scheme, and operational value is lower (PiPNN's BigANN-10M build is ~80s). Defer until Stage 2 reveals the production rebuild cadence. *Determinism note:* PiPNN is rayon-parallel — byte-identical output across runs is not free (would need fixed thread schedule, deterministic reductions, seeded LSH). For any future resumed-build test, the right validation criterion is **recall parity**, not byte-identity. From d0baaba1a36b8cbaa9c57e19e8d5e57b0ae9a0d9 Mon Sep 17 00:00:00 2001 From: Weiyao Luo <9347182+SeliMeli@users.noreply.github.com> Date: Thu, 14 May 2026 04:37:05 +0000 Subject: [PATCH 12/14] docs(rfc): renumber milestones contiguously, drop edit-history phrasing MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The RFC describes the current plan, not its revision history. Strip "(was M1)" / "(was M8)" / "intentionally absent" phrasing and renumber milestones contiguously M0–M6. - M0 — Skeleton integration (this RFC) - M1 — Quantization parity (was M3) - M2 — Label-filtered indexes (was M4) - M3 — Three-tier memory dispatch (was M5) - M4 — Fixed-resource validation (was M6) - M5 — Production matrix (was M7) - M6 — Operational readiness (was M9) The "Deferred to Stage 2" and "Out of scope" sections describe what's not in the plan in their own terms, not as removed milestones. Co-Authored-By: Claude Opus 4.7 (1M context) --- rfcs/01049-pipnn-integration.md | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/rfcs/01049-pipnn-integration.md b/rfcs/01049-pipnn-integration.md index 45e7b2d83..caaa3ccc9 100644 --- a/rfcs/01049-pipnn-integration.md +++ b/rfcs/01049-pipnn-integration.md @@ -232,23 +232,23 @@ PiPNN beats Vamana on recall at every L at parity QPS — and 6.3× faster build ## Future Work — Stage-1 Milestones -These are gating items for Stage 2. M0 ships in this RFC; M3–M7 and M9 are follow-on work, parallelizable where dependencies allow. **M1** (in-memory build), **M2** (checkpoint/resume), and **M8** (hybrid update model) are intentionally absent — see "Out of scope" and "Deferred to Stage 2". Stage 1 is strictly about build-from-scratch and full rebuilds with PiPNN; no hybrid behavior is exercised or validated here. +Stage 1 covers build-from-scratch and full rebuilds with PiPNN. M0 ships in this RFC; M1–M6 are follow-on work, parallelizable where dependencies allow. ### M0 — Skeleton integration (this RFC) Crate, `BuildAlgorithm` enum, dispatch behind `pipnn` Cargo feature. JSON config gains optional `build_algorithm`. CI smoke test (SIFT-1M) with `--features pipnn`. -### M3 — Quantization parity +### M1 — Quantization parity Extend PiPNN beyond `SQ1` to `SQ_2/4/8`, reusing the trained `ScalarQuantizer`. **Pass:** SQ_8 recall within 0.5% of FP on BigANN 10M and Enron 10M. -### M4 — Label-filtered indexes +### M2 — Label-filtered indexes Run filter benchmark configs with `BuildAlgorithm::PiPNN`; confirm filter-recall within ±1% of Vamana. Partition may need label-aware leaf assignment for high-cardinality labels. -### M5 — Three-tier memory dispatch +### M3 — Three-tier memory dispatch Implement and validate the disk-edges + merged-shards paths selected by `build_ram_limit_gb`. **Pass:** at `build_ram_limit_gb=4`, PiPNN-merged on BigANN 10M has peak RSS ≤ 4 GB and recall within 1% of one-shot. Two disk-edges variants are on the table: (i) materialize all leaf edges then stream HashPrune (current prototype), or (ii) interleave leaf-build + HashPrune in chunks. The second avoids full edge-set materialization at the cost of a second partition pass. -### M6 — Fixed-resource trade-off validation +### M4 — Fixed-resource trade-off validation Validates the **trade-off hypothesis** from the Problem Statement. - **Setup.** Lock CPU/SSD on a fixed worker; enforce RAM via cgroups. Sweep RAM `{3, 6, 8, 12, 16, 24, 32}` GB on BigANN 10M; include rows for Enron 10M and a 100M-scale dataset. @@ -256,25 +256,25 @@ Validates the **trade-off hypothesis** from the Problem Statement. - **Metrics.** Wall-clock, peak RSS, CPU util, SSD bytes, recall@K, QPS — reported as **vectors/min/worker** for cross-shape comparison. - **Pass.** Documented matrix with a clearly-better algorithm (or tie) per budget at matching recall. Surprises are Stage-1 blockers. -### M7 — Production validation: recall × QPS × dimensionality +### M5 — Production validation: recall × QPS × dimensionality End-to-end on the full workload mix. Datasets: BigANN, Enron, plus one production-representative. Scales 10M and 100M (billion if hardware permits). Metrics `squared_l2` and `cosine_normalized`. **Pass:** per cell, PiPNN recall@K within ±1% of Vamana's at matching QPS, or higher QPS at matching recall. -### M9 — Operational readiness +### M6 — Operational readiness Telemetry (per-phase timing + RSS via existing OTel tracer), permanent docs replacing experimental `CLAUDE.md` notes, runbook (OOM, partition timeout, `l_max` saturation), default parameter recommendations per workload class. ### Deferred to Stage 2 -- **Hybrid update model validation (was M8).** End-to-end validation of the Stage-2 loop — PiPNN build → incremental Vamana inserts → recall-decay curve → PiPNN rebuild — belongs with the Stage-2 proposal that actually adopts the hybrid model. Stage 1 only exercises the full-build path, so there is no in-production hybrid behavior to characterize yet. The disk-format-compatibility check (Vamana's in-mem insert path reading a PiPNN-produced graph) is a one-shot sanity test that can be performed at Stage 2 entry; it does not need to gate Stage 1. +- **Hybrid update model validation.** End-to-end validation of the Stage-2 loop — PiPNN build → incremental Vamana inserts → recall-decay curve → PiPNN rebuild — belongs with the Stage-2 proposal that actually adopts the hybrid model. Stage 1 exercises only the full-build path. The disk-format-compatibility check (Vamana's in-mem insert path reading a PiPNN-produced graph) is a one-shot sanity test that can run at Stage 2 entry. - **Checkpoint / resume.** Vamana's streaming checkpoint design doesn't fit PiPNN's batch phases. Useful boundaries (partition output, post-extract) would need a different scheme, and operational value is lower (PiPNN's BigANN-10M build is ~80s). Defer until Stage 2 reveals the production rebuild cadence. - *Determinism note:* PiPNN is rayon-parallel — byte-identical output across runs is not free (would need fixed thread schedule, deterministic reductions, seeded LSH). For any future resumed-build test, the right validation criterion is **recall parity**, not byte-identity. + *Determinism note:* PiPNN is rayon-parallel — byte-identical output across runs is not free (would need fixed thread schedule, deterministic reductions, seeded LSH). The right validation criterion for any future resumed-build test is **recall parity**, not byte-identity. ### Out of scope: not part of any stage -- **In-memory PiPNN build (was M1).** The in-mem `DiskANNIndex` exists for streaming construction — exactly what PiPNN can't do efficiently. Building one from PiPNN adjacency lists is mechanically possible but offers no incremental capability and would force `diskann-pipnn` to depend on the in-mem graph crate. If a non-streaming in-mem consumer ever needs PiPNN's speed: build to disk, then load. +- **In-memory PiPNN build.** The in-mem `DiskANNIndex` exists for streaming construction — exactly what PiPNN can't do efficiently. Building one from PiPNN adjacency lists is mechanically possible but offers no incremental capability and would force `diskann-pipnn` to depend on the in-mem graph crate. If a non-streaming in-mem consumer ever needs PiPNN's speed: build to disk, then load. - **Build-time PQ distance kernel.** Not used by Vamana in production today. -- **PiPNN incremental insert/delete API.** Hybrid model removes the need. +- **PiPNN incremental insert/delete API.** The hybrid update model (Vamana inserts on the in-memory graph, PiPNN for full rebuilds) removes the need. - **Frozen-point semantics.** PiPNN writes the medoid as the single frozen start point — already byte-compatible with Vamana's default. - **Multi-vector index support.** Revisit only if a production workload requires it. From e1c2b8c29bff911f9c3185329c544dc0bfbd9266 Mon Sep 17 00:00:00 2001 From: Weiyao Luo <9347182+SeliMeli@users.noreply.github.com> Date: Thu, 14 May 2026 05:47:53 +0000 Subject: [PATCH 13/14] docs(rfc): drop obvious determinism note from checkpoint defer entry Co-Authored-By: Claude Opus 4.7 (1M context) --- rfcs/01049-pipnn-integration.md | 2 -- 1 file changed, 2 deletions(-) diff --git a/rfcs/01049-pipnn-integration.md b/rfcs/01049-pipnn-integration.md index caaa3ccc9..046370690 100644 --- a/rfcs/01049-pipnn-integration.md +++ b/rfcs/01049-pipnn-integration.md @@ -268,8 +268,6 @@ Telemetry (per-phase timing + RSS via existing OTel tracer), permanent docs repl - **Checkpoint / resume.** Vamana's streaming checkpoint design doesn't fit PiPNN's batch phases. Useful boundaries (partition output, post-extract) would need a different scheme, and operational value is lower (PiPNN's BigANN-10M build is ~80s). Defer until Stage 2 reveals the production rebuild cadence. - *Determinism note:* PiPNN is rayon-parallel — byte-identical output across runs is not free (would need fixed thread schedule, deterministic reductions, seeded LSH). The right validation criterion for any future resumed-build test is **recall parity**, not byte-identity. - ### Out of scope: not part of any stage - **In-memory PiPNN build.** The in-mem `DiskANNIndex` exists for streaming construction — exactly what PiPNN can't do efficiently. Building one from PiPNN adjacency lists is mechanically possible but offers no incremental capability and would force `diskann-pipnn` to depend on the in-mem graph crate. If a non-streaming in-mem consumer ever needs PiPNN's speed: build to disk, then load. From 4de515cbb1ccfda1bef7702577146b299dc9da40 Mon Sep 17 00:00:00 2001 From: Weiyao Luo <9347182+SeliMeli@users.noreply.github.com> Date: Thu, 14 May 2026 05:50:03 +0000 Subject: [PATCH 14/14] docs(rfc): strip obvious filler and defensive prose MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Drop "scales linearly with point count" — table speaks for itself. - Drop the memory-channel caveat and "honest framing:" meta-tag. - Drop "is the working set the algorithm needs, not a bug" — defensive. - Drop "A search-only consumer cannot tell..." — restates the compat table. - Drop "This is why Vamana keeps the incremental role" — already said. - Drop "Data flows one-way" — diagram already shows this. - Trim verbose elaborations on M-deferred entries to one sentence each. - Demote "In-memory PiPNN build is not in any stage" callout to inline. Update milestone reference (validation hypothesis now points to M4). Co-Authored-By: Claude Opus 4.7 (1M context) --- rfcs/01049-pipnn-integration.md | 35 +++++++++++++++------------------ 1 file changed, 16 insertions(+), 19 deletions(-) diff --git a/rfcs/01049-pipnn-integration.md b/rfcs/01049-pipnn-integration.md index 046370690..42d271a80 100644 --- a/rfcs/01049-pipnn-integration.md +++ b/rfcs/01049-pipnn-integration.md @@ -38,7 +38,7 @@ Output: `Vec>` adjacency lists, handed to the existing disk writer. PQ ### Problem statement -Vamana's per-point cost scales linearly with point count, making 10M+ full builds the bottleneck: +Vamana full builds at 10M+ are the bottleneck: | Dataset | Vamana build | |---|---:| @@ -60,7 +60,7 @@ On BigANN 10M, 16 threads: | 6–12 GB | disk-edges | ~126s | 358s | | 3–6 GB | merged-shards | ~332s | ~358s (partitioned) | -**Neither algorithm uses surplus RAM to build faster.** PiPNN's wall-clock is bottlenecked by HashPrune + GEMM; extra RAM headroom past the working set doesn't help (more memory *channels* / bandwidth does, but that's a hardware property, not a budget knob). The honest framing: PiPNN trades a higher *minimum* RAM budget for a substantially faster build at that budget. Validation: see **M6**. +**Neither algorithm converts surplus RAM into faster builds.** PiPNN's wall-clock is bottlenecked by HashPrune + GEMM; once the working set fits, more RAM doesn't help. PiPNN trades a higher *minimum* RAM budget for a faster build at that budget. Validation: see **M4**. ### Hybrid update model (Stage 2 direction) @@ -77,7 +77,7 @@ This is why we don't need PiPNN to support `insert(point)` — the disk format i - **Stage 1 (this RFC).** Land PiPNN as an alternative builder for the disk-index full-build path (initial builds *and* rebuilds), behind a `pipnn` Cargo feature. Vamana stays default. - **Stage 2 (separate proposal, gated by Stage-1 milestones).** Retire Vamana's full-build path; keep Vamana for in-memory incremental inserts. -**In-memory PiPNN build is not in any stage.** The in-mem path exists for streaming construction — exactly what PiPNN's batch design cannot do. See "Out of scope" below. +In-memory PiPNN build is not in any stage — see "Out of scope". ### Goals (Stage 1) @@ -85,13 +85,13 @@ This is why we don't need PiPNN to support `insert(point)` — the disk format i 2. **Disk-format compatibility** — byte-identical to Vamana's output; search/PQ/storage layouts unchanged. 3. **API backward compatibility** — `DiskIndexBuilder`, `IndexConfiguration`, JSON schema all stay additive. 4. **Feature parity for the full-build role** — deliver the Vamana disk-build capabilities PiPNN still lacks (quantization, label filters). -5. **Documented memory mitigation** — a three-tier build path that brings PiPNN's peak RSS to or under Vamana's at a documented build-time cost. +5. **Memory mitigation** — a three-tier build path that brings PiPNN's peak RSS to or under Vamana's at a documented build-time cost. ## Proposal ### Workspace structure -Add a crate `diskann-pipnn` depending on `diskann`, `diskann-linalg`, `diskann-vector`, `diskann-quantization`, `diskann-utils`. **It does NOT depend on `diskann-disk`** — that would form a cycle with the consumer-side feature gate. Data flows one-way: PiPNN produces `Vec>`, `diskann-disk` consumes it behind its `pipnn` feature. +Add a crate `diskann-pipnn` depending on `diskann`, `diskann-linalg`, `diskann-vector`, `diskann-quantization`, `diskann-utils`. **It does not depend on `diskann-disk`** — that would form a cycle with the consumer-side feature gate. ```text diskann, diskann-linalg, diskann-quantization, diskann-vector, diskann-utils @@ -123,11 +123,11 @@ pub enum BuildAlgorithm { } ``` -`DiskIndexBuildParameters` gains a `build_algorithm` field defaulting to `Vamana`. The JSON config gains an optional `build_algorithm` block. +`DiskIndexBuildParameters` gains a `build_algorithm` field; JSON config gains an optional `build_algorithm` block. -**`num_threads` is not a RAM knob.** Per-thread overhead is small (~16–20 MB/thread: stripe buffers + leaf-build scratch). Use `build_ram_limit_gb` to bound RAM. +**`num_threads` is not a RAM knob.** Per-thread overhead is ~16–20 MB (stripe + leaf-build scratch). Use `build_ram_limit_gb` to bound RAM. -**Feature-flag deserialization applies to JSON configs only, not index files.** An index file built by either algorithm loads with or without the `pipnn` feature. A JSON config with `"algorithm": "PiPNN"` fed to a binary built without the feature fails fast with `unknown variant 'PiPNN'`. Configs that omit `build_algorithm` parse identically across feature builds — not a backward-compat regression. +**Feature-flag deserialization is config-only, not index-file.** Index files built by either algorithm load with or without the `pipnn` feature. A JSON config naming `"algorithm": "PiPNN"` fed to a binary built without the feature fails fast with `unknown variant 'PiPNN'`; configs that omit `build_algorithm` parse identically across feature builds. ### Builder dispatch @@ -139,7 +139,7 @@ match build_parameters.build_algorithm() { } ``` -The PiPNN branch produces `Vec>` via `diskann_pipnn::builder::build_typed`, then hands it to the existing `DiskIndexWriter`. PQ training and disk-sector layout are reused. +The PiPNN branch produces `Vec>` via `diskann_pipnn::builder::build_typed` and hands it to the existing `DiskIndexWriter`. ### Compatibility @@ -151,18 +151,15 @@ The PiPNN branch produces `Vec>` via `diskann_pipnn::builder::build_typ | Public Rust types | additive only (new field with default) | | Benchmark JSON config | additive only (new optional field) | -A search-only consumer cannot tell which builder produced the index. - ### Feature gating -- `diskann-disk` gains a `pipnn` Cargo feature. Default features do **not** include it. -- With the feature off: `BuildAlgorithm::PiPNN` does not exist at the type level. No runtime branch, no extra binary size, no `diskann-pipnn` dependency. +`diskann-disk` gains a `pipnn` Cargo feature, off by default. With it off, `BuildAlgorithm::PiPNN` does not exist at the type level — no runtime branch, no `diskann-pipnn` dependency. ## Trade-offs ### Batch-only (algorithmic, not implementation) -The PiPNN paper "eliminates search from graph-building" — partition first, then one batched GEMM per leaf. The batching advantage **requires knowing leaf membership before computing distances**; at batch size 1, PiPNN reduces to per-point distance work no faster than Vamana's greedy insert. Two phases (partition, leaf k-NN) are batch-by-design; HashPrune and final RobustPrune happen to be online, but you've already paid for the batch phases by the time you reach them. This is why Vamana keeps the incremental role. +The PiPNN paper "eliminates search from graph-building" — partition first, then one batched GEMM per leaf. The batching advantage **requires knowing leaf membership before computing distances**; at batch size 1, PiPNN reduces to per-point distance work no faster than Vamana's greedy insert. Two phases (partition, leaf k-NN) are batch-by-design; HashPrune and final RobustPrune happen to be online, but you've already paid for the batch phases by the time you reach them. ### Memory vs. build speed @@ -172,7 +169,7 @@ PiPNN holds more working memory than Vamana — dominated by the **HashPrune res |---|---:|---:| | Peak RSS | 10.8 GB | 6.3 GB | -The +4.5 GB delta is the working set the algorithm needs, not a bug. Mitigation via the three-tier build (dispatched by the existing `build_ram_limit_gb` knob): +Mitigation via the three-tier build (dispatched by the existing `build_ram_limit_gb` knob): | Strategy | Peak RSS | Build | Recall@10 L=50 | Trigger | |---|---:|---:|---:|---| @@ -192,7 +189,7 @@ Disk-edges matches Vamana's RAM at ~3× the build speed. Merged-shards uses *les ### Algorithm risks -Recall depends on partition overlap (`fanout`) and reservoir size (`l_max`). Parameter space is larger than Vamana's `R`/`L_build`. Stage 1 mitigates by keeping Vamana as default and shipping reference parameter sets per workload class. +Recall depends on partition overlap (`fanout`) and reservoir size (`l_max`) — a larger parameter space than Vamana's `R`/`L_build`. Mitigation: keep Vamana as default and ship reference parameter sets per workload class. ## Benchmark Results @@ -264,13 +261,13 @@ Telemetry (per-phase timing + RSS via existing OTel tracer), permanent docs repl ### Deferred to Stage 2 -- **Hybrid update model validation.** End-to-end validation of the Stage-2 loop — PiPNN build → incremental Vamana inserts → recall-decay curve → PiPNN rebuild — belongs with the Stage-2 proposal that actually adopts the hybrid model. Stage 1 exercises only the full-build path. The disk-format-compatibility check (Vamana's in-mem insert path reading a PiPNN-produced graph) is a one-shot sanity test that can run at Stage 2 entry. +- **Hybrid update model validation.** The Stage-2 loop (PiPNN build → incremental Vamana inserts → recall decay → rebuild) belongs with the Stage-2 proposal that adopts the model. -- **Checkpoint / resume.** Vamana's streaming checkpoint design doesn't fit PiPNN's batch phases. Useful boundaries (partition output, post-extract) would need a different scheme, and operational value is lower (PiPNN's BigANN-10M build is ~80s). Defer until Stage 2 reveals the production rebuild cadence. +- **Checkpoint / resume.** Vamana's streaming checkpoint design doesn't fit PiPNN's batch phases, and operational value is lower (PiPNN's BigANN-10M build is ~80s). ### Out of scope: not part of any stage -- **In-memory PiPNN build.** The in-mem `DiskANNIndex` exists for streaming construction — exactly what PiPNN can't do efficiently. Building one from PiPNN adjacency lists is mechanically possible but offers no incremental capability and would force `diskann-pipnn` to depend on the in-mem graph crate. If a non-streaming in-mem consumer ever needs PiPNN's speed: build to disk, then load. +- **In-memory PiPNN build.** The in-mem `DiskANNIndex` exists for streaming construction, which PiPNN can't do efficiently. If a non-streaming in-mem consumer ever wants PiPNN's speed: build to disk, then load. - **Build-time PQ distance kernel.** Not used by Vamana in production today. - **PiPNN incremental insert/delete API.** The hybrid update model (Vamana inserts on the in-memory graph, PiPNN for full rebuilds) removes the need. - **Frozen-point semantics.** PiPNN writes the medoid as the single frozen start point — already byte-compatible with Vamana's default.