Skip to content

SIMD-accelerate target-sum grinding during signing#51

Open
dicethedev wants to merge 1 commit intoleanEthereum:devnet4from
dicethedev:feat/simd-signing-acceleration
Open

SIMD-accelerate target-sum grinding during signing#51
dicethedev wants to merge 1 commit intoleanEthereum:devnet4from
dicethedev:feat/simd-signing-acceleration

Conversation

@dicethedev
Copy link
Copy Markdown

@dicethedev dicethedev commented Apr 24, 2026

🗒️ Description

This PR speeds up signing for the target-sum encoding path by adding a SIMD-accelerated grinding flow for Poseidon-based message hashing.

The signing bottleneck was the deterministic retry loop that keeps sampling encoding randomness until the chunk sum matches the target sum. Previously, this checked one candidate at a time. This PR keeps the same deterministic behavior, but evaluates multiple candidate randomness values per Poseidon permutation using packed SIMD lanes.

What Changed

MessageHash - new grind_target_sum hook

Added a default method to MessageHash for deterministically searching for the first valid randomness value. The default falls back to scalar behavior, so non-Poseidon hashes are unaffected.

src/symmetric/message_hash.rs

IncomparableEncoding - new grind hook

Added a default grind method so signing can delegate the retry loop to the encoding itself, allowing specific encodings to override the search strategy.

src/inc_encoding.rs

TargetSumEncoding - overrides grind

Now forwards the deterministic search to MH::grind_target_sum, keeping target-sum logic centralized while enabling message-hash-specific acceleration.

src/inc_encoding/target_sum.rs

PoseidonMessageHash - SIMD grind_target_sum

The core optimization. The new implementation:

  • Packs message, parameter, and epoch once
  • Generates a SIMD-width batch of candidate rho values from the PRF
  • Hashes all candidates together with packed Poseidon
  • Decodes each lane independently and returns the first success in counter order

src/symmetric/message_hash/poseidon.rs

GeneralizedXMSS::sign - uses encoding-level grinding

The manual retry loop is replaced with IE::grind::<PRF>(...). Behavior is identical from the caller's perspective.

src/signature/generalized_xmss.rs

Why This Fix Matters

The expensive part of signing for target-sum instantiations is the repeated encoding grind, not the already-optimized tree hashing code. This PR targets that hot path directly:

  • before: one PRF-derived randomness candidate checked per hash
  • after: multiple PRF-derived randomness candidates checked per hash using SIMD
    This should improve signing throughput for Poseidon target-sum instantiations, especially where many retries are needed before hitting the target sum.

Correctness Guarantees

  • Deterministic signing is fully preserved
  • The first successful PRF counter is always selected
  • EncodingAttemptsExceeded failure path is unchanged
  • Non-Poseidon hashes continue using the scalar fallback

Tests

Added inc_encoding::target_sum::tests::test_grind_matches_first_successful_attempt — verifies the SIMD grind path returns the same randomness and chunks as the scalar search.

Passing:

  • cargo test inc_encoding::target_sum::tests::test_grind_matches_first_successful_attempt
  • cargo test test_deterministic

Notes

This PR focuses on accelerating the encoding grind path used during signing, which matches the issue’s target and follows the same high-level idea as plonky3 grinding: batch many candidate witnesses and test them in parallel using packed field operations.

🔗 Related Issues or PRs

Closes #49

Benchmark Results

To evaluate the benefits of this PR, I compared this branch against its parent commit on a small Poseidon target-sum instance.

  • Before: 5cc7e37
  • After: 3c4d6d2
  • Scheme: SIGTargetSumLifetime18W1NoOff
  • Activation window: 8 epochs
  • Mode: cargo run --release
  • Averaging: 3 key-generation runs, 1000 signing runs
Operation Before After Change
Key generation 1128.828 ms 1079.075 ms ~4.4% faster
Signing 7.680 ms 7.099 ms ~7.6% faster

The primary goal of this PR is to accelerate the signing path by optimizing the target-sum grinding loop. On this small instance, signing throughput improves by approximately 7.6%. Key generation remains in the same general range, as expected, since this PR does not directly target key generation performance.

@tcoratger
Copy link
Copy Markdown
Contributor

@dicethedev Could you display in the PR description the benchmark results before/after, including key generation for a small instance (not need to run the big one to see the improvements in general)?

This is just a way to evaluate the benefits of the PR

@dicethedev
Copy link
Copy Markdown
Author

@dicethedev Could you display in the PR description the benchmark results before/after, including key generation for a small instance (not need to run the big one to see the improvements in general)?

This is just a way to evaluate the benefits of the PR

I will do just that. Thanks!

@tcoratger
Copy link
Copy Markdown
Contributor

@dicethedev Could you display in the PR description the benchmark results before/after, including key generation for a small instance (not need to run the big one to see the improvements in general)?
This is just a way to evaluate the benefits of the PR

I will do just that. Thanks!

@dicethedev Don't hesitate to let us know if you need help with anything for this one :)

@dicethedev
Copy link
Copy Markdown
Author

dicethedev commented May 4, 2026

@dicethedev Don't hesitate to let us know if you need help with anything for this one :)

@tcoratger You can check benchmark results included in my PR description.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants