Skip to content

RFC: Multi-Vector Distance Function Architecture #985

@suri-kumkaran

Description

@suri-kumkaran

RFC: Multi-Vector Distance Function Architecture

Description

This RFC proposes the architectural design for introducing multi-vector support (e.g., ColBERT/MaxSim, Chamfer distance) into the DiskANN ecosystem. This design must balance high-performance, specialized distance evaluations with strict API compatibility and standalone usability.

Goals

  • Define the core mathematical operations required for multi-vector distances, starting with Chamfer and MaxSim.
  • Maintain strict compatibility with DiskANN's DistanceFunctionMut trait.
  • Provide a clean API that enables standalone distance function usage without requiring full index integration.
  • Establish the strategy for supporting various datatypes (f32 initially, with future f16 and u8 support) and subsequent quantizations.
  • Outline proposed memory layouts specifically optimized to achieve a 2x+ speedup over baseline SIMD.

Action Item: Solicit feedback from maintainers on the DistanceFunctionMut implementation and standalone API boundaries before committing to implementation.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type
No fields configured for issues without a type.

Projects

Status

No status

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions