Skip to content

Hemant/rebuild updated#172

Open
hemant-endee wants to merge 4 commits intomasterfrom
hemant/rebuild_updated
Open

Hemant/rebuild updated#172
hemant-endee wants to merge 4 commits intomasterfrom
hemant/rebuild_updated

Conversation

@hemant-endee
Copy link
Copy Markdown
Collaborator

No description provided.

* Rebuild index with new config

* fix 1

* index name in get stattu api correction

* docs changes

* Rebuild Status Persistence
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 9, 2026

VectorDB Benchmark — Dense — Passed

Triggered by @omnish-endee · Commit 4a3982f

Step Status
Provision Servers Up
Deploy Endee Server Done
Run Benchmark Passed
Results See below
Teardown Done ✓

No result file found on benchmark server.

@hemant-endee
Copy link
Copy Markdown
Collaborator Author

/correctness_benchmarking dense

3 similar comments
@omnish-endee
Copy link
Copy Markdown
Contributor

/correctness_benchmarking dense

@omnish-endee
Copy link
Copy Markdown
Contributor

/correctness_benchmarking dense

@omnish-endee
Copy link
Copy Markdown
Contributor

/correctness_benchmarking dense

Comment thread src/main.cpp
}

// Get current metadata for defaults
auto meta = index_manager.getMetadata(index_id);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here there should be a check if there are any vectors in the index

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Uses live count from getElementCount (not meta->total_elements which can be stale before first save). Check is placed after getElementCount so the same count is reused in the response.

Comment thread src/main.cpp Outdated
Comment on lines +763 to +767
// Get actual vector count for response
size_t actual_element_count = 0;
try {
actual_element_count = index_manager.getElementCount(index_id);
} catch (...) {}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is ignoring any exceptions. This shouldnt be the case. use docs/logs.md format to throw an error.

If getIndexEntry throws (index not loaded), the response will report total_vectors: 0 which is misleading in the next code block.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Removed the try/catch — index existence is already validated by getMetadata above, so any load failure here is a genuine error that should propagate to the outer handler and return a 500.

Comment thread src/main.cpp Outdated
index_id, new_M, new_ef_con);

if (!success) {
int code = (message.find("already in progress") != std::string::npos) ? 409 : 400;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error messages should not be used to determine the return code. some error code should be used here instead.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Replaced std::pair<bool, std::string> with a RebuildResult struct carrying an explicit http_code field. rebuildIndexAsync now returns typed codes (404, 409, 400, 202) directly — no more string matching in the route.

Comment thread src/core/ndd.hpp Outdated
return metadata_manager_->getMetadata(index_id);
}

// Index stats (safe to call from routes)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does this comment mean ?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reads live count from the in-memory HNSW graph; meta->total_elements can be stale between saves.

Comment thread src/core/ndd.hpp
Comment on lines +1949 to +1972
nlohmann::json getRebuildProgress(const std::string& username,
const std::string& index_id) const {
auto state = rebuild_.getActiveRebuild(username);
if (state && state->index_id == index_id) {
size_t processed = state->vectors_processed.load();
size_t total = state->total_vectors.load();
double percent = total > 0 ? (100.0 * processed / total) : 0.0;
nlohmann::json result = {
{"status", state->status},
{"vectors_processed", processed},
{"total_vectors", total},
{"percent_complete", percent},
{"started_at", Rebuild::formatTime(state->started_at)}
};
if (state->status == "completed" || state->status == "failed") {
result["completed_at"] = Rebuild::formatTime(state->completed_at);
}
if (state->status == "failed" && !state->error_message.empty()) {
result["error"] = state->error_message;
}
return result;
}
return {{"status", "idle"}};
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function should be in rebuild class to keep every subsystem separate

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Moved the full method body to Rebuild::getProgress(username, index_id). IndexManager::getRebuildProgress is now a one-liner delegate: return rebuild_.getProgress(username, index_id).

Comment thread src/core/ndd.hpp
executeRebuildJob(index_id, username, new_M, new_ef_con, st);
});
rebuild_.setActiveRebuild(username, index_id, current_count, std::move(t));

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setActiveRebuild should be called before executeRebuildJob. This way two rebuild jobs will not happen at once.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Now calls setActiveRebuild with std::jthread{} (empty) first — state is registered immediately, blocking any concurrent rebuild request before the thread is even spawned. The real thread is then spawned and moved in via the new attachRebuildThread method.

Comment thread src/core/rebuild.hpp Outdated
}
}
} catch (const std::exception&) {
// Silently ignore cleanup errors on startup
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dont ignore errors here

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Changed to LOG_WARN(2053, "rebuild", "Failed to cleanup temp files on startup: " << e.what()).

Comment thread src/core/rebuild.hpp

// State tracking — per user

void setActiveRebuild(const std::string& username, const std::string& index_id,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should use rebuild_state_mutex_

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already fixed — setActiveRebuild holds rebuild_state_mutex_ via lock_guard on entry.

Comment thread src/core/ndd.hpp Outdated
Comment on lines +2349 to +2358
new_alg->setVectorFetcher([vs = entry->vector_storage](ndd::idInt label, uint8_t* buffer) {
return vs->get_vector(label, buffer);
});

new_alg->setVectorFetcherBatch([vs = entry->vector_storage](const ndd::idInt* labels,
uint8_t* buffers,
bool* success,
size_t count) -> size_t {
return vs->get_vectors_batch_into(labels, buffers, success, count);
});
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code piece is duplicate from the same at the end of the function. deduplicate it

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Extracted wireVectorFetchers(alg, vector_storage) as a private static helper on IndexManager. Both duplicate blocks replaced with a single call each. Note: the first call (on new_alg) must happen before any addPoint — searchBaseLayer during graph construction needs the fetcher for base-layer-only nodes that don't store data inline. Comment preserved at the call site.

Comment thread src/core/ndd.hpp Outdated
Comment on lines +2423 to +2433
auto fresh_alg = std::make_unique<hnswlib::HierarchicalNSW<float>>(index_path, 0);

fresh_alg->setVectorFetcher([vs = entry->vector_storage](ndd::idInt label, uint8_t* buffer) {
return vs->get_vector(label, buffer);
});

fresh_alg->setVectorFetcherBatch([vs = entry->vector_storage](const ndd::idInt* labels,
uint8_t* buffers,
bool* success,
size_t count) -> size_t {
return vs->get_vectors_batch_into(labels, buffers, success, count);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is duplicate from reloadIndex. deduplicate

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cannot call reloadIndex here — executeRebuildJob holds operation_mutex while reloadIndex acquires indices_mutex_, but deleteIndex holds indices_mutex_ and then acquires operation_mutex. Calling reloadIndex from inside executeRebuildJob would deadlock with a concurrent delete on the same index. Keeping the inline block and added a comment explaining the constraint.

@github-actions
Copy link
Copy Markdown

VectorDB Benchmark - Ready To Run

CI Passed ([lint + unit tests] (https://github.com/endee-io/endee/actions/runs/24716159797)) - benchmark options unlocked.

Post one of the command below. Only members with write access can trigger runs.


Available Modes

Mode Command What runs
Dense /correctness_benchmarking dense HNSW insert throughput · query P50/P95/P99 · recall@10 · concurrent QPS
Hybrid /correctness_benchmarking hybrid Dense + sparse BM25 fusion · same suite + fusion latency overhead

Infrastructure

Server Role Instance
Endee Server Endee VectorDB — code from this branch t2.large
Benchmark Server Benchmark runner t3a.large

Both servers start on demand and are always terminated after the run — pass or fail.


How Correctness Benchmarking Works

1. Post /correctness_benchmarking <mode>
2. Endee Server Create  →  this branch's code deployed  →  Endee starts in chosen mode
3. Benchmark Server Create  →  benchmark suite transferred
4. Benchmark Server runs correctness benchmarking against Endee Server
5. Results posted back here  →  pass/fail + full metrics table
6. Both servers terminated   →  always, even on failure

After a new push, CI must pass again before this menu reappears.

@hemant-endee
Copy link
Copy Markdown
Collaborator Author

/correctness_benchmarking dense

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 21, 2026

VectorDB Benchmark — Dense — Failed

Triggered by @hemant-endee · Commit 117a81f

Step Status
Provision Servers Up
Deploy Endee Server Done
Run Benchmark Failed
Results See reason below
Teardown Done ✓

Benchmark job status: skipped

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants