JVector Release Notes

Version: 4.0.0-rc.7 - 4.0.0-rc.8

New Features and Enhancements

Fused Product Quantization (Fused PQ)

Description
Fused Product Quantization (Fused PQ) is a performance optimization that embeds compressed Product Quantization (PQ) codes directly into the graph index structure alongside each node's neighbor lists. This eliminates the need for separate lookups to retrieve compressed vectors during graph traversal, significantly improving query performance by reducing memory access overhead. The feature stores PQ-encoded neighbor vectors inline with the graph edges, enabling faster approximate similarity scoring during search operations. By embedding the compressed neighbor vectors into the graph index itself, Fused PQ eliminates the need to maintain a separate in-memory data structure for PQ-encoded vectors, reducing heap memory usage while maintaining fast approximate similarity search performance.

Purpose / Impact

Reduces memory usage for large-scale vector datasets
Improves cache locality during graph traversal
Enables higher writer scalability for large / high-dimensional vector workloads

How to Enable
To enable Fused PQ when writing an on-disk graph index:

Create the FusedPQ feature by passing your graph's max degree and a ProductQuantization compressor to the constructor:
```
var fusedPQFeature = new FusedPQ(graph.maxDegree(), pq);
```

Add it to your OnDiskGraphIndexWriter builder:

var writer = new OnDiskGraphIndexWriter.Builder(graph, outputPath)
    .with(fusedPQFeature)
    .build();

Provide a state supplier during the write phase that includes the graph view and PQ vectors:

Map<FeatureId, IntFunction<Feature.State>> writeSuppliers = new EnumMap<>(FeatureId.class);
writeSuppliers.put(FeatureId.FUSED_PQ, ordinal -> new FusedPQ.State(view, pqVectors, ordinal));
writer.write(writeSuppliers);

Notes

Fused PQ requires a 256-cluster ProductQuantization compressor. The feature automatically embeds compressed neighbor vectors inline with the graph structure during the write operation.
To enable the FUSED_PQ feature, we introduced the new version 6 file format for our graph indices.

Parallel Graph Index Construction

Description
OnDiskParallelGraphIndexWriter significantly accelerates graph index construction by addressing the disk I/O bottleneck that limits the serial OnDiskGraphIndexWriter. This implementation uses asynchronous file I/O with multiple worker threads to write graph records in parallel, with parallelism automatically determined by available system resources (or configurable via builder options). By parallelizing both record building and disk writes while maintaining correct ordering, this approach dramatically reduces the time required to persist large graph indexes to disk.

Purpose / Impact

Eliminates i/o bottleneck in on disk graph construction
Maintains backwards compatibility for existing clients of the JVector library

How to Enable
To enable parallel graph index writes, simply use OnDiskParallelGraphIndexWriter.Builder instead of OnDiskGraphIndexWriter.Builder:

Basic usage (uses default parallelism based on available processors):

try (var writer = new OnDiskParallelGraphIndexWriter.Builder(graph, outputPath)
        .with(features...)
        .build()) {
    writer.write(featureSuppliers);
}

Advanced configuration:

try (var writer = new OnDiskParallelGraphIndexWriter.Builder(graph, outputPath)
        .with(features...)
        .withParallelWorkerThreads(8)           // Optional: specify thread count (0 = auto)
        .withParallelDirectBuffers(true)        // Optional: use direct ByteBuffers for better performance
        .build()) {
    writer.write(featureSuppliers);
}

The parallel writer is a drop-in replacement for the standard writer with the same API, automatically leveraging multiple threads and asynchronous I/O to accelerate the write process.

Notes

Currently still marked as @experimental
Includes deprecation of method

public synchronized void writeInline(int ordinal, Map<FeatureId, Feature.State> stateMap)

in favor of the more descriptive method

public synchronized void writeFeaturesInline(int ordinal, Map<FeatureId, Feature.State> stateMap)

Related Issues

Documentation and Tutorials

Description
Detailed javadoc added for all JVector components. Quickstart tutorials added to jvector-examples including documentation and code.

Purpose / Impact

Provide better documentation for JVector and its components
Give new users an entry point showing how to use JVector

Improved DataSet Handling

Description
Revision of how datasets are handled internally within JVector for acquisition and representation of vector data. Includes overhaul of the loading process, better logging and error handling, and virtualization and metadata handling.

Purpose / Impact

Make datasets easier to find, work with, and define through metadata for internal testing
Resolves several issues with regression and inter-release comparisons

Notes

Used internally by JVector Bench classes, no client impact

Testing Enhancements

Description
Enhancements to the JVector testing infrastructure:

On disk index cache added for Grid benchmark harness
Logging subsystem overhaul
New JMH tests
Test results now include metrics for nodes visited, heap usage, disk usage, PQ Distance

Purpose / Impact

Faster testing cycle
Better comprehension of test results
new metrics to compare inter-release

Notes

Used internally by JVector, no client impact

Related Issues

Bug Fixes and Issue Resolutions

Fix: NullPointerException in `OnDiskGraphIndex#ramBytesUsed`

Problem
Now that we lazily load the inMemoryNeighbors and the inMemoryFeatures, we need to handle the case in OnDiskGraphIndex where they are null or have values that are null when the ramBytesUsed() method is called.

Resolution
Added appropriate null checks and safeguards to ensure ramBytesUsed() can be safely invoked in all valid states.

Related Issues

#586

Fix: Protection Against Invalid Ordinal Mappings

Problem
JVector relies on the calling source code to pass in ordinal maps constructed outside of the JVector library. Improper or inconsistent ordinal mappings can lead to failures when the Graph is built or incorrect indexing or search results.

Resolution
Added safeguards to detect invalid ordinal mappings.

Notes
Full validation of ordinal mapping requires iterating over the entire set of ordinals and can be a costly operation. This safeguard will only be activated if debug logging is enabled or if System.getProperties().containsKey("VECTOR_DEBUG")

Related Issues

Fix: extractTrainingVectors may produce more than MAX_PQ_TRAINING_SET_SIZE vectors

Problem
extractTrainingVectors could return more vectors than the intended maximum (MAX_PQ_TRAINING_SET_SIZE), leading to excessive memory usage during PQ training.

Resolution
Uses floyd's random sampling algorithm to select random training vectors from the RandomAccessVectorValues. The solution has two phases. The first is to select MAX_PQ_TRAINING_SET_SIZE random ordinals. Then, it maps those ordinals to vectors.

Related Issues

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

4.0.0-rc.8

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

JVector Release Notes

Version: 4.0.0-rc.7 - 4.0.0-rc.8

New Features and Enhancements

Fused Product Quantization (Fused PQ)

Parallel Graph Index Construction

Documentation and Tutorials

Improved DataSet Handling

Testing Enhancements

Bug Fixes and Issue Resolutions

Fix: NullPointerException in `OnDiskGraphIndex#ramBytesUsed`

Fix: Protection Against Invalid Ordinal Mappings

Fix: extractTrainingVectors may produce more than MAX_PQ_TRAINING_SET_SIZE vectors

Contributors

Uh oh!

4.0.0-rc.8

JVector Release Notes

Version: 4.0.0-rc.7 - 4.0.0-rc.8

New Features and Enhancements

Fused Product Quantization (Fused PQ)

Parallel Graph Index Construction

Documentation and Tutorials

Improved DataSet Handling

Testing Enhancements

Bug Fixes and Issue Resolutions

Fix: NullPointerException in OnDiskGraphIndex#ramBytesUsed

Fix: Protection Against Invalid Ordinal Mappings

Fix: extractTrainingVectors may produce more than MAX_PQ_TRAINING_SET_SIZE vectors

Contributors

Uh oh!

Fix: NullPointerException in `OnDiskGraphIndex#ramBytesUsed`