feat: add estimate-size subcommand for network size estimation#88
Closed
jacderida wants to merge 1 commit into
Closed
feat: add estimate-size subcommand for network size estimation#88jacderida wants to merge 1 commit into
jacderida wants to merge 1 commit into
Conversation
c3f5d28 to
ab5a3a3
Compare
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds a new ant-node estimate-size subcommand to estimate live network size by bootstrapping a client-mode P2PNode, running multiple random-key FIND_NODE lookups, and aggregating per-sample Kademlia density estimates.
Changes:
- Introduces a new
estimatormodule with the sampling/aggregation logic and unit tests. - Extends the
ant-nodeCLI with anestimate-sizesubcommand and routes execution inmain.rs. - Exposes the new module from
src/lib.rs.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| src/lib.rs | Exports the new estimator module from the crate. |
| src/estimator.rs | Implements client-mode bootstrap, lookup sampling, and estimate aggregation (+unit tests). |
| src/bin/ant-node/main.rs | Dispatches between “run node” vs “estimate-size” subcommand execution paths. |
| src/bin/ant-node/cli.rs | Adds Command::EstimateSize and associated CLI flags/help text. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
ab5a3a3 to
91b17ed
Compare
Operators have no easy way to gauge how many nodes are participating in the live Autonomi network. ant-node does not emit node-count metrics, and there is no purpose-built crawler — yet a rough size estimate is useful for capacity planning, anomaly detection, and release sanity checks. Add a non-breaking `ant-node estimate-size` subcommand that bootstraps a saorsa-core P2PNode in NodeMode::Client (no listen socket, no DHT routing participation), runs many random-key iterative FIND_NODE lookups, and infers the network size from the XOR distance to the k-th closest peer in each lookup (standard Kademlia density estimator: N̂ = k · 2^256 / d_k). Per-sample estimates are aggregated into a mean, median, and 95% confidence interval. The subcommand reuses the existing bootstrap-resolution cascade (CLI/env → config file → auto-discovered bootstrap_peers.toml), so users get the same flags they're already used to. Invocations without a subcommand continue to launch a node exactly as before — clap's `Option<Subcommand>` pattern is non-breaking. Progress is reported to stderr (connecting, bootstrap time, routing-table size, per-sample status, sampling completion) so an operator running the command can tell whether work is happening or it is stuck. The default per-lookup timeout is 90s — saorsa-core's iterative lookup can take this long when a dead peer's dial cascade drags out an early iteration. Verification: - cargo fmt clean - cargo clippy --all-features ... -D clippy::panic -D clippy::unwrap_used -D clippy::expect_used clean - cargo test --lib estimator: 8/8 passing - ant-node --help shows top-level options and the new subcommand - ant-node estimate-size --help shows estimator-specific flags - live testnet smoke run produced consistent estimates across successful samples (mean ~3119, median ~3035, 95% CI ±10%) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
91b17ed to
5b2cde7
Compare
Comment on lines
+339
to
+347
| fn sample_estimate(target: &[u8; 32], peers: &[saorsa_core::DHTNode], k: usize) -> Option<f64> { | ||
| if peers.len() < k { | ||
| return None; | ||
| } | ||
|
|
||
| // The lookup returns peers sorted by distance to the target (closest first). | ||
| // We want the XOR distance to the k-th closest, i.e. the (k-1)th element. | ||
| let kth = peers.get(k - 1)?; | ||
| let kth_bytes = kth.peer_id.to_bytes(); |
Comment on lines
+174
to
+181
| if per_sample.is_empty() { | ||
| return Err(Error::Startup( | ||
| "no samples produced a usable density estimate (all lookups failed or returned too few peers)" | ||
| .to_string(), | ||
| )); | ||
| } | ||
|
|
||
| Ok(aggregate(per_sample, params.samples, k_used)) |
Comment on lines
+105
to
+112
| eprintln!( | ||
| "Connecting to bootstrap peers ({} configured)... [this can take 30\u{2013}60s]", | ||
| config.bootstrap.len() | ||
| ); | ||
| let p2p_node = P2PNode::new(core_config) | ||
| .await | ||
| .map_err(|e| Error::Startup(format!("Failed to create client P2P node: {e}")))?; | ||
| let p2p = Arc::new(p2p_node); |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a non-breaking
ant-node estimate-sizesubcommand that estimates the size of the live network.P2PNodeinNodeMode::Client(no listen socket, no DHT routing participation), runs N random-key iterativeFIND_NODElookups, and infers the network size from the XOR distance to the k-th closest peer in each lookup. Standard Kademlia density estimator:N̂ = k · 2^256 / d_k. Per-sample values are aggregated into a mean, median, and 95% confidence interval.bootstrap_peers.toml), so flags users already know carry over.clap'sOption<Subcommand>pattern means invocations without a subcommand continue to launch a node exactly as before. No behavior change for existing callers.Why
Operators currently have no way to gauge how many nodes are participating in the live network —
ant-nodedoesn't emit node-count metrics and there's no separate crawler. A rough estimate is useful for capacity planning, anomaly detection, and release sanity checks.The subcommand approach (rather than a separate binary or new crate) was chosen because it lets the estimator share
ant-node's bootstrap configuration,bootstrap_peers.tomldiscovery, and dependency onsaorsa-core— without inflating the runtime node binary in any user-visible way when not invoked.Test plan
cargo fmt --allcleancargo clippy --all-features -- -D clippy::panic -D clippy::unwrap_used -D clippy::expect_usedcleancargo test --lib estimator— 8/8 unit tests passing (sample density formula, aggregator edge cases, XOR distance, leading-u64 extraction, CI clamping)cargo buildcleanant-node --helpshows existing top-level flags plus the new subcommand (non-breaking)ant-node estimate-size --helpshows estimator-specific flagsant-node estimate-sizeagainst the live network completes without timeouts at the new 90s default🤖 Generated with Claude Code