Skip to content

stagedsync: fix commitmentCalculator asOfReader.txNum=0 lazy-load on snapshot-loaded chains#21010

Merged
mh0lt merged 2 commits intomainfrom
mh/fix-commit-calc-asofreader-txnum
May 7, 2026
Merged

stagedsync: fix commitmentCalculator asOfReader.txNum=0 lazy-load on snapshot-loaded chains#21010
mh0lt merged 2 commits intomainfrom
mh/fix-commit-calc-asofreader-txnum

Conversation

@mh0lt
Copy link
Copy Markdown
Contributor

@mh0lt mh0lt commented May 6, 2026

Summary

commitmentCalculator lazy-loads from the storage/account domains via a shared asOfStateReader. The reader is constructed with txNum=0 and only updated inside computeAndPublish / computeWithoutCheck — both of which run on a blockResult message. But cc.state.ApplyWrites runs on every txResult and triggers ensureAccount / ensureStorage lazy-loads through the same reader.

Since txResults arrive before the first blockResult, the very first lazy-load fires with asOfReader.txNum = 0. On a synced-from-genesis datadir txNum=0 is in the visible history window so seekInFiles returns gracefully. On a snapshot-loaded datadir the visible window starts well past genesis and seekInFiles errors with:

ensureStorage(...): seekInFiles(invIndex=storage,txNum=0) but data before txNum=2918750000 not available

The calculator publishes the wrapped error and FCU fails on every block.

Fix

Pin asOfReader.txNum to the current tx's txNum before ApplyWrites runs. Semantically GetAsOf(r.txNum) returns the value before tx N modified it = the pre-tx baseline, which is what lazy-load needs (only first-touch slots lazy-load; subsequent same-slot writes hit cs.storageState and skip the read path).

computeAndPublish overwrites this back to lastTxNum + 1 right before ComputeCommitment, so the per-tx setting only affects the lazy-load path and never leaks into the trie fold path.

Reproduction

Run the parallel commitment calculator on a snapshot-loaded chain (e.g. perf-devnet-3 starting at txNum~2.9B). Every block fails with the seekInFiles error above. Synced-from-genesis chains don't trip the bug because txNum=0 is in their window.

Test plan

  • CI green
  • Bench on snapshot-loaded perf-devnet-3 with EXEC3_PARALLEL=true: 17 SETUP blocks + 1 TEST block all VALID with 0 lazy-load errors (verified locally)
  • Bench on synced-from-genesis chain: no behavior change (lazy-load already worked at txNum=0 there)

🤖 Generated with Claude Code

commitmentCalculator constructs asOfReader with txNum=0 and only updates
it inside computeAndPublish/computeWithoutCheck (which run on
blockResult). But cc.state.ApplyWrites runs on every txResult and triggers
ensureAccount/ensureStorage lazy-loads via the same asOfReader. Since the
first message into cc.in is always a txResult (txs complete before their
containing block boundary fires blockResult), the first lazy-load fires
with asOfReader.txNum=0.

On a synced-from-genesis datadir txNum=0 is in the visible history
window so seekInFiles returns gracefully. On a snapshot-loaded datadir
(perf-devnet-3, anything started from a chunked snapshot) the visible
window starts well past genesis (~2.9B in our case) and seekInFiles
errors with "data before txNum=2918750000 not available". The
calculator publishes the wrapped error and FCU fails for every block.

Pin asOfReader.txNum to r.txNum before ApplyWrites. Semantically
GetAsOf(r.txNum) returns the value before tx N modified it = pre-tx
state, which is the baseline ApplyWrites needs (only first-touch slots
lazy-load; subsequent same-slot writes hit cs.storageState and skip
the read path). computeAndPublish overwrites this back to lastTxNum+1
right before ComputeCommitment, so the per-tx setting only affects the
lazy-load path and never leaks into the trie fold.
@mh0lt mh0lt requested a review from yperbasis as a code owner May 6, 2026 11:19
@yperbasis yperbasis requested review from awskii and taratorio May 6, 2026 11:52
@yperbasis yperbasis added this to the 3.5.0 milestone May 7, 2026
@mh0lt
Copy link
Copy Markdown
Contributor Author

mh0lt commented May 7, 2026

Linked as a precursor for #21017 (CI matrix exec_mode={serial,parallel}). Without this fix, the matrix's QA entries that run against snapshot-loaded datadirs (qa-rpc-performance-comparison-tests, qa-rpc-integration-tests-latest) hit seekInFiles(txNum=0) on every block under EXEC3_PARALLEL=true because asOfReader's initial txNum=0 falls outside the visible history window. Companion to #21032 (SD/incarnation differentiator) and #21039 (apply-loop completeness).

… pin

Locks down the post-condition exercised by the fix in this PR:
after handleMessage(*txResult), cc.asOfReader.txNum must equal
the txResult's txNum. This is the property whose absence makes
snapshot-loaded chains crash with seekInFiles(txNum=0) on the
first lazy-load.

Three subassertions:
  1. First txResult bumps txNum from 0 → r.txNum (the basic fix).
  2. Subsequent txResult bumps it again (the per-tx update).
  3. Empty-writes txResult does NOT bump (matches the fix's
     `if len(r.writes) > 0` guard — no lazy-loads to seed, so
     leaving the prior pin alone is correct).

Verified the test fails on origin/main without the fix
(expected 12345, actual 0).
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a commitment-calculator failure on snapshot-loaded datadirs where early per-tx lazy-loads could read with asOfReader.txNum=0 (outside the visible history window), causing seekInFiles(txNum=0) errors and failing FCU.

Changes:

  • Pin cc.asOfReader.txNum to the current txResult.txNum before ApplyWrites to ensure lazy-load reads use the correct pre-tx baseline.
  • Add a regression unit test asserting asOfReader.txNum is updated on txResult processing (and not updated for empty writes).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
execution/stagedsync/committer.go Pins asOfReader.txNum on txResult before applying writes to prevent snapshot-window txNum=0 lazy-load failures.
execution/stagedsync/committer_test.go Adds a regression test validating txResult processing updates asOfReader.txNum as intended.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +110 to +114
// the snapshot-loaded lazy-load crash: cc.asOfReader is constructed with
// txNum=0, and only computeAndPublish / computeAndCheck (which run on
// blockResult) overwrite that field. But cc.state.ApplyWrites runs on
// every txResult and triggers ensureAccount / ensureStorage lazy-loads
// through the same reader. txResults arrive before the first blockResult,
@mh0lt mh0lt added this pull request to the merge queue May 7, 2026
Merged via the queue into main with commit 69d6181 May 7, 2026
42 checks passed
@mh0lt mh0lt deleted the mh/fix-commit-calc-asofreader-txnum branch May 7, 2026 19:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants