fix: make signature aggregation asynchronous#299
Conversation
🤖 Claude Code ReviewNow I have a full picture of the PR. Let me write up the review. PR #299 —
|
🤖 Codex Code ReviewFindings
No other obvious consensus or memory-safety regressions stood out in the diff. I couldn’t run Automated review by OpenAI Codex · gpt-5.4 · custom prompt |
Greptile SummaryThis PR moves committee-signature aggregation off the Confidence Score: 5/5Safe to merge; the only new inline finding is a minor Cargo feature hygiene issue. All new findings are P2. The prior-thread P1 concern (AggregationDone session fence) is not re-flagged here. The async architecture is sound, devnet validation is thorough, and the test plan is complete. crates/blockchain/Cargo.toml — missing explicit features = ["sync"] for tokio-util
|
| Filename | Overview |
|---|---|
| crates/blockchain/src/aggregation.rs | New module: defines all aggregation types, pure functions, and the run_aggregation_worker loop. Session-id fencing is handled for AggregateProduced but not AggregationDone. |
| crates/blockchain/src/lib.rs | Actor orchestration with Handler impls for AggregateProduced (fenced), AggregationDone (no session fence — prior thread), and AggregationDeadline; #[stopped] lifecycle hook joins worker cleanly. |
| crates/blockchain/src/store.rs | on_tick interval-2 branch is now a no-op; aggregation fully removed from the store tick. |
| crates/blockchain/Cargo.toml | Adds tokio-util with default-features = false but without features = ["sync"], yet CancellationToken from tokio_util::sync is used — works today via Cargo feature unification but not self-contained. |
| crates/blockchain/src/metrics.rs | No functional changes; metrics are queried by finalize_aggregation_session instead of being updated inline per-aggregate. |
| crates/blockchain/tests/signature_spectests.rs | New test harness for verify_signatures spec fixtures; straightforward pass/fail matching, no issues. |
Sequence Diagram
sequenceDiagram
participant T as Tick (interval 2)
participant A as BlockChainServer (actor)
participant W as spawn_blocking worker
participant M as Actor mailbox
T->>A: handle_tick → on_tick(slot, interval=2)
A->>A: start_aggregation_session(slot)
A->>A: join prior worker (if any, ≤2s)
A->>A: snapshot_aggregation_inputs(&store)
A->>W: tokio::task::spawn_blocking(run_aggregation_worker)
A->>M: send_after(750ms, AggregationDeadline{session_id})
A->>A: store current_aggregation = Some(session)
loop For each AggregationJob
W->>W: check cancel.is_cancelled() → break if true
W->>W: aggregate_job(job) [XMSS proofs]
W->>M: send(AggregateProduced{session_id, output})
end
M->>A: AggregateProduced{session_id}
A->>A: fence: current session_id == msg.session_id?
A->>A: apply_aggregated_group (insert payload, delete gossip sigs)
A->>A: publish via gossipsub p2p
alt Deadline fires (750ms)
M->>A: AggregationDeadline{session_id}
A->>A: session.cancel.cancel()
end
W->>M: send(AggregationDone{session_id, stats, cancelled})
M->>A: AggregationDone
A->>A: finalize_aggregation_session (gauge refresh)
A->>A: log summary metrics
alt Actor stopping
A->>A: on_stopped()
A->>W: cancel.cancel()
A->>W: join (≤2s timeout)
end
Prompt To Fix All With AI
This is a comment left during a code review.
Path: crates/blockchain/Cargo.toml
Line: 25
Comment:
**`tokio-util/sync` feature not explicitly declared**
`default-features = false` is set but `features = ["sync"]` is absent. `CancellationToken` lives in `tokio_util::sync`, which is only compiled when that feature is enabled. The build currently passes via Cargo's feature unification (some transitive dependency in the workspace enables `sync`), but this makes the crate's compilation implicitly dependent on the rest of the dependency graph. A future dep-graph change could silently break this crate in isolation.
```suggestion
tokio-util = { version = "0.7", default-features = false, features = ["sync"] }
```
How can I resolve this? If you propose a fix, please make it concise.Reviews (2): Last reviewed commit: "refactor: appease clippy" | Re-trigger Greptile
Isolate committee-signature aggregation code in its own module so the async worker orchestration, job types, and pure aggregation helpers live together. The store now focuses on attestation/block state, and the actor (lib.rs) only handles session lifecycle hooks on BlockChainServer. No behavior change.
|
Blocked by #304 |
…e-aggregation # Conflicts: # crates/blockchain/tests/forkchoice_spectests.rs
🤖 Codex Code Review
I couldn’t run Automated review by OpenAI Codex · gpt-5.4 · custom prompt |
🤖 Claude Code ReviewGood — I have the full diff and context. Let me write the review. PR 299 —
|
Summary
BlockChainServeractor thread onto atokio::task::spawn_blockingworker. The actor no longer blocks for the duration of the XMSS proofs (previously 400-1200 ms per slot).CancellationTokenfired by a 750 ms deadline (self-message viasend_after). Aggregates produced inside the window are streamed back viaAggregateProducedactor messages and published immediately; post-deadline aggregates still get applied locally and published.#[stopped]lifecycle hook cancels the in-flight worker and joins it (bounded at 2 s) so node shutdown is clean; new-session start joins any straggler from the previous slot and warns if it hadn't finished yet.Architecture
crates/blockchain/src/store.rsnow exposes pure helpers:snapshot_aggregation_inputs,aggregate_job,apply_aggregated_group,finalize_aggregation_session. The oldaggregate_committee_signatures/try_aggregate(single inline function) are gone.store::on_tickno longer calls into aggregation — the actor drives it.pub(crate)):AggregateProduced,AggregationDone,AggregationDeadline. Late messages are fenced by a session id (the slot number).new_payloads, delete consumed gossip sigs, publish via gossipsub. End-of-session work (gauge refresh) is batched inAggregationDoneto avoid 2N lock acquisitions per slot.Observations from devnet runs
Under a 4-node devnet with three induced partitions (via
docker pause) across 200 slots:groups_considered > 1Prior aggregation worker still runningwarnings — worker wind-down is cleanTest plan
cargo check --workspacemake fmt,make lint(clippy clean)cargo test -p ethlambda-blockchain --test forkchoice_spectests— 77/77 pass