Hemant/rebuild updated by hemant-endee · Pull Request #172 · endee-io/endee

hemant-endee · 2026-04-09T05:24:27Z

No description provided.

* Rebuild index with new config * fix 1 * index name in get stattu api correction * docs changes * Rebuild Status Persistence

github-actions · 2026-04-09T05:25:24Z

VectorDB Benchmark — Dense — Passed

Triggered by @omnish-endee · Commit 4a3982f

Step	Status
Provision Servers	Up
Deploy Endee Server	Done
Run Benchmark	Passed
Results	See below
Teardown	Done ✓

No result file found on benchmark server.

…me as addVectors)

hemant-endee · 2026-04-09T08:51:42Z

/correctness_benchmarking dense

omnish-endee · 2026-04-10T07:21:43Z

/correctness_benchmarking dense

omnish-endee · 2026-04-10T08:41:21Z

/correctness_benchmarking dense

omnish-endee · 2026-04-10T09:47:43Z

/correctness_benchmarking dense

shaleenji · 2026-04-21T04:28:26Z

+                }
+
+                // Get current metadata for defaults
+                auto meta = index_manager.getMetadata(index_id);


Here there should be a check if there are any vectors in the index

Fixed. Uses live count from getElementCount (not meta->total_elements which can be stale before first save). Check is placed after getElementCount so the same count is reused in the response.

shaleenji · 2026-04-21T04:29:07Z

+                // Get actual vector count for response
+                size_t actual_element_count = 0;
+                try {
+                    actual_element_count = index_manager.getElementCount(index_id);
+                } catch (...) {}


this is ignoring any exceptions. This shouldnt be the case. use docs/logs.md format to throw an error.

If getIndexEntry throws (index not loaded), the response will report total_vectors: 0 which is misleading in the next code block.

Fixed. Removed the try/catch — index existence is already validated by getMetadata above, so any load failure here is a genuine error that should propagate to the outer handler and return a 500.

shaleenji · 2026-04-21T04:30:10Z

+                        index_id, new_M, new_ef_con);
+
+                    if (!success) {
+                        int code = (message.find("already in progress") != std::string::npos) ? 409 : 400;


Error messages should not be used to determine the return code. some error code should be used here instead.

Fixed. Replaced std::pair<bool, std::string> with a RebuildResult struct carrying an explicit http_code field. rebuildIndexAsync now returns typed codes (404, 409, 400, 202) directly — no more string matching in the route.

shaleenji · 2026-04-21T04:36:57Z

+        return metadata_manager_->getMetadata(index_id);
+    }
+
+    // Index stats (safe to call from routes)


what does this comment mean ?

Reads live count from the in-memory HNSW graph; meta->total_elements can be stale between saves.

shaleenji · 2026-04-21T04:38:12Z

+    nlohmann::json getRebuildProgress(const std::string& username,
+                                      const std::string& index_id) const {
+        auto state = rebuild_.getActiveRebuild(username);
+        if (state && state->index_id == index_id) {
+            size_t processed = state->vectors_processed.load();
+            size_t total = state->total_vectors.load();
+            double percent = total > 0 ? (100.0 * processed / total) : 0.0;
+            nlohmann::json result = {
+                {"status", state->status},
+                {"vectors_processed", processed},
+                {"total_vectors", total},
+                {"percent_complete", percent},
+                {"started_at", Rebuild::formatTime(state->started_at)}
+            };
+            if (state->status == "completed" || state->status == "failed") {
+                result["completed_at"] = Rebuild::formatTime(state->completed_at);
+            }
+            if (state->status == "failed" && !state->error_message.empty()) {
+                result["error"] = state->error_message;
+            }
+            return result;
+        }
+        return {{"status", "idle"}};
+    }


This function should be in rebuild class to keep every subsystem separate

Fixed. Moved the full method body to Rebuild::getProgress(username, index_id). IndexManager::getRebuildProgress is now a one-liner delegate: return rebuild_.getProgress(username, index_id).

shaleenji · 2026-04-21T04:58:37Z

+        executeRebuildJob(index_id, username, new_M, new_ef_con, st);
+    });
+    rebuild_.setActiveRebuild(username, index_id, current_count, std::move(t));
+


setActiveRebuild should be called before executeRebuildJob. This way two rebuild jobs will not happen at once.

Fixed. Now calls setActiveRebuild with std::jthread{} (empty) first — state is registered immediately, blocking any concurrent rebuild request before the thread is even spawned. The real thread is then spawned and moved in via the new attachRebuildThread method.

shaleenji · 2026-04-21T05:02:47Z

+                }
+            }
+        } catch (const std::exception&) {
+            // Silently ignore cleanup errors on startup


dont ignore errors here

Fixed. Changed to LOG_WARN(2053, "rebuild", "Failed to cleanup temp files on startup: " << e.what()).

shaleenji · 2026-04-21T05:07:34Z

+
+    // State tracking — per user
+
+    void setActiveRebuild(const std::string& username, const std::string& index_id,


should use rebuild_state_mutex_

Already fixed — setActiveRebuild holds rebuild_state_mutex_ via lock_guard on entry.

shaleenji · 2026-04-21T05:19:34Z

+        new_alg->setVectorFetcher([vs = entry->vector_storage](ndd::idInt label, uint8_t* buffer) {
+            return vs->get_vector(label, buffer);
+        });
+
+        new_alg->setVectorFetcherBatch([vs = entry->vector_storage](const ndd::idInt* labels,
+                                                                     uint8_t* buffers,
+                                                                     bool* success,
+                                                                     size_t count) -> size_t {
+            return vs->get_vectors_batch_into(labels, buffers, success, count);
+        });


This code piece is duplicate from the same at the end of the function. deduplicate it

Fixed. Extracted wireVectorFetchers(alg, vector_storage) as a private static helper on IndexManager. Both duplicate blocks replaced with a single call each. Note: the first call (on new_alg) must happen before any addPoint — searchBaseLayer during graph construction needs the fetcher for base-layer-only nodes that don't store data inline. Comment preserved at the call site.

shaleenji · 2026-04-21T05:21:08Z

+        auto fresh_alg = std::make_unique<hnswlib::HierarchicalNSW<float>>(index_path, 0);
+
+        fresh_alg->setVectorFetcher([vs = entry->vector_storage](ndd::idInt label, uint8_t* buffer) {
+            return vs->get_vector(label, buffer);
+        });
+
+        fresh_alg->setVectorFetcherBatch([vs = entry->vector_storage](const ndd::idInt* labels,
+                                                                       uint8_t* buffers,
+                                                                       bool* success,
+                                                                       size_t count) -> size_t {
+            return vs->get_vectors_batch_into(labels, buffers, success, count);


this is duplicate from reloadIndex. deduplicate

Cannot call reloadIndex here — executeRebuildJob holds operation_mutex while reloadIndex acquires indices_mutex_, but deleteIndex holds indices_mutex_ and then acquires operation_mutex. Calling reloadIndex from inside executeRebuildJob would deadlock with a concurrent delete on the same index. Keeping the inline block and added a comment explaining the constraint.

github-actions · 2026-04-21T09:59:03Z

VectorDB Benchmark - Ready To Run

CI Passed ([lint + unit tests] (https://github.com/endee-io/endee/actions/runs/24716159797)) - benchmark options unlocked.

Post one of the command below. Only members with write access can trigger runs.

Available Modes

Mode	Command	What runs
Dense	`/correctness_benchmarking dense`	HNSW insert throughput · query P50/P95/P99 · recall@10 · concurrent QPS
Hybrid	`/correctness_benchmarking hybrid`	Dense + sparse BM25 fusion · same suite + fusion latency overhead

Infrastructure

Server	Role	Instance
Endee Server	Endee VectorDB — code from this branch	`t2.large`
Benchmark Server	Benchmark runner	`t3a.large`

Both servers start on demand and are always terminated after the run — pass or fail.

How Correctness Benchmarking Works

1. Post /correctness_benchmarking <mode>
2. Endee Server Create  →  this branch's code deployed  →  Endee starts in chosen mode
3. Benchmark Server Create  →  benchmark suite transferred
4. Benchmark Server runs correctness benchmarking against Endee Server
5. Results posted back here  →  pass/fail + full metrics table
6. Both servers terminated   →  always, even on failure

After a new push, CI must pass again before this menu reappears.

hemant-endee · 2026-04-21T10:18:36Z

/correctness_benchmarking dense

github-actions · 2026-04-21T10:18:55Z

VectorDB Benchmark — Dense — Failed

Triggered by @hemant-endee · Commit 117a81f

Step	Status
Provision Servers	Up
Deploy Endee Server	Done
Run Benchmark	Failed
Results	See reason below
Teardown	Done ✓

Benchmark job status: skipped

hemant-endee added 2 commits April 7, 2026 15:33

Rebuild index with new config (#136)

e66b946

* Rebuild index with new config * fix 1 * index name in get stattu api correction * docs changes * Rebuild Status Persistence

using jthread with stop token

f69cc10

Shared parallel addPoint utility function — static chunk partition(sa…

4a3982f

…me as addVectors)

shaleenji requested changes Apr 21, 2026

View reviewed changes

comments resolved

117a81f


		// State tracking — per user

		void setActiveRebuild(const std::string& username, const std::string& index_id,

Conversation

hemant-endee commented Apr 9, 2026

Uh oh!

github-actions Bot commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

VectorDB Benchmark — Dense — Passed

Uh oh!

hemant-endee commented Apr 9, 2026

Uh oh!

omnish-endee commented Apr 10, 2026

Uh oh!

omnish-endee commented Apr 10, 2026

Uh oh!

omnish-endee commented Apr 10, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Apr 21, 2026

VectorDB Benchmark - Ready To Run

Available Modes

Infrastructure

How Correctness Benchmarking Works

Uh oh!

hemant-endee commented Apr 21, 2026

Uh oh!

github-actions Bot commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

VectorDB Benchmark — Dense — Failed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions Bot commented Apr 9, 2026 •

edited

Loading

github-actions Bot commented Apr 21, 2026 •

edited

Loading