execution/stagedsync: don't flag maxBlockNum on partial-batch apply-loop exit by mh0lt · Pull Request #21039 · erigontech/erigon

mh0lt · 2026-05-07T12:58:07Z

Summary

Drops the fallback clause in applyLoopMissingBlocks that turned every legitimate partial-batch exit into a spurious InvalidBlock error.

When the parallel exec loop hits its size-limit mid-fixture, it returns nil with reachedMaxBlock=false. The previous completeness check flagged maxBlockNum as "missing" whenever !reachedMaxBlock — but the size-limit path is normal: the apply loop returns ErrLoopExhausted, the stage loop resumes the next batch from lastBlockResult+1, and each block still executes exactly once. No re-execution.

Update — additional fix in commit `310a094514` (`execution/stagedsync: honor blockResult.Exhausted in exec loop`)

Same class of partial-batch orchestration bug, on the exec-loop side. executeBlocks marks the final dispatched block with Exhausted when the per-cycle block limit fires, then exits its goroutine — without closing pe.execRequests, without cancelling ctx. The exec loop had no branch to react to that signal: after the last blockResult was applied, the main select parked waiting for execRequests / rws.ResultCh / ctx.Done — none of which ever fire. The apply loop never got the channel-close signal it needs to return ErrLoopExhausted, so the original (apply-loop-side) fix in this PR couldn't even run.

Reproduced as a chiado parallel-exec silent stall at block 150662 with EXEC3_PARALLEL=true ... --sync.loop.block.limit=10_000 from block 0 (issue #20711). The hang was masking a wrong-trie-root divergence at the same block — with the exhaust signal honored, the underlying ErrWrongTrieRoot now surfaces cleanly through the apply loop and the original-issue symptom appears as expected.

The new branch sits between the maxBlockNum-reached check and the StopAfterBlock debug exit, matching the precedence of the existing partial-batch exits.

Tests: TestExecLoopHonorsBlockResultExhausted runs two models of the exec-loop blockResult decision tree (post-fix and pre-fix) against identical channel orchestration. The pre-fix model must hang past the timeout — proves the test scaffolding genuinely surfaces the bug rather than passing vacuously. TestExecLoopExhaustedOnlySetOnFinalBlock locks in the dispatcher convention so future executeBlocks changes can't silently drop already-queued work by setting Exhausted mid-stream.

Why

Reproduces deterministically as the parallel-mode BenchmarkFeeHistory failure on PR #21017 (the CI matrix that runs every test under both serial and parallel exec modes):

apply loop exited (reachedMaxBlock=false lastBlockResult=114 maxBlockNum=200) but 1 block(s) had tx-results without a blockResult or were never delivered: [200]

200 blocks × 100 simple transfers exceed the test fixture's 5MB batch budget at block 114. Trace confirms:

Batch 1: blocks 1..114, size-limit fires, ErrLoopExhausted
Batch 2: blocks 115..200, reachedMaxBlock=true, clean nil

Sd.mem is not contaminated by in-flight future-block writes — txResultBlocks at exit was exactly [1..114]. Block 200 was flagged solely from the fallback clause.

Shutdown sequence verified

Tracing the deferred-close in execLoop: commitResults closes first, calculator drains and closes rootResults, apply loop sees both closes, drains, runs the completeness check, returns ErrLoopExhausted. executorCancel then cancels execLoopCtx, executeBlocks and workers exit, pe.wait joins. Three batches in BenchmarkFeeHistory run end-to-end with no goroutine leaks.

Tests

TestApplyLoopMissingBlocks — adds a "partial batch — size-limit hit, no spurious flag for unreached blocks" case
TestApplyLoopPartialBatchReturnsErrLoopExhausted — exercises the apply-loop exit decision tree (exhausted vs nil vs InvalidBlock) for partial-batch / full-batch / silent-failure scenarios
TestApplyLoopChannelCloseOrder — pins the load-bearing commitResults-before-applyResults close order
TestExecLoopHonorsBlockResultExhausted — orchestration test for the exec-loop side of the partial-batch exit (added in commit 310a094514)
TestExecLoopExhaustedOnlySetOnFinalBlock — pins the executeBlocks dispatcher convention (added in commit 310a094514)

Test plan

BenchmarkFeeHistory/full/blocks=200 passes with ERIGON_EXEC3_PARALLEL=true
make lint clean
go test -race ./execution/stagedsync/... for new tests passes
Existing TestApplyLoopRootResultsCloseDoesNotRace, TestApplyLoopDoesNotHangAfterRootResultsClose, TestExecLoopExitCheck* still pass
Chiado EXEC3_PARALLEL=true ... --chain=chiado --sync.loop.block.limit=10_000 --prune.mode=archive from block 0 reaches the underlying ErrWrongTrieRoot at block 150662 instead of silently stalling (verifies the partial-batch exec-loop exit fires)

CI fix linkage

Precursor for #21017 (CI matrix exec_mode={serial,parallel}). Without this, bench / benchmarks (parallel) and any other workflow exercising a chain large enough to cross the batch-size boundary fails on first crossing. Companion to #21032 (incarnation/SD differentiator fix).

…oop exit When the parallel exec loop hits its batch size limit mid-fixture, it returns nil to the apply loop with reachedMaxBlock=false. The apply loop's completeness check then ran applyLoopMissingBlocks with a fallback clause that flagged maxBlockNum as "missing" whenever !reachedMaxBlock — turning every legitimate partial-batch exit into a spurious InvalidBlock error. The size-limit path is normal: it returns ErrLoopExhausted and the stage loop resumes the next batch from lastBlockResult+1. Each block still executes exactly once. The fallback clause was unnecessary and incorrect. Reproduces deterministically as the github.com/erigontech/erigon/rpc/gasprice BenchmarkFeeHistory parallel-mode failure: 200 blocks × 100 simple transfers exceed the test fixture's 5MB batch budget at block 114; first batch covers blocks 1..114, second batch covers 115..200, both clean. Drops the fallback clause, simplifies applyLoopMissingBlocks to a single map-comparison, and tightens the message string. Adds: * TestApplyLoopMissingBlocks "partial batch — size-limit hit" case * TestApplyLoopPartialBatchReturnsErrLoopExhausted exercising the apply-loop exit decision tree (exhausted vs nil vs InvalidBlock) * TestApplyLoopChannelCloseOrder pinning the documented commitResults-then-applyResults shutdown order Precursor for #21017 (CI matrix exec_mode={serial,parallel}); without this, every workflow that runs a parallel-exec test against a chain that exceeds the batch budget fails on the first such size-limit boundary.

yperbasis

CI is red

Same class of partial-batch orchestration bug as the apply-loop fix in this PR. executeBlocks marks the final dispatched block with Exhausted when the per-cycle block limit fires, then exits its goroutine — without closing pe.execRequests, without cancelling ctx. The exec loop had no branch to react to that signal: after the last blockResult was applied, the main select parked waiting for execRequests / rws.ResultCh / ctx.Done — none of which ever fire. The apply loop never got the channel-close signal it needed to return ErrLoopExhausted. Reproduced as a chiado parallel-exec stall at block 150662 with `EXEC3_PARALLEL=true ... --sync.loop.block.limit=10_000` from block 0 (issue #20711). The hang was masking a wrong-trie-root divergence at the same block; with the exhaust signal honored the underlying ErrWrongTrieRoot now surfaces cleanly through the apply loop. The new branch sits between the maxBlockNum-reached check and the StopAfterBlock debug exit, matching the precedence of the existing partial-batch exits. ## Tests * TestExecLoopHonorsBlockResultExhausted — orchestration test with two models of the exec loop's blockResult decision tree (post-fix and pre-fix). The pre-fix model must hang past the timeout — proves the scaffolding genuinely surfaces the bug rather than passing vacuously. * TestExecLoopExhaustedOnlySetOnFinalBlock — locks in the dispatcher convention (Exhausted only on the trailing block of a partial batch) so future executeBlocks changes can't silently drop already-queued work by setting it mid-stream.

mh0lt · 2026-05-07T14:54:29Z

Linked as a precursor for #21017 alongside #21032 (SD/incarnation differentiator) and #21010 (asOfReader.txNum lazy-load fix on snapshot-loaded chains).

Copilot

Pull request overview

This PR fixes partial-batch orchestration in the parallel execution path so that normal “stop early and resume next cycle” exits are not misclassified as InvalidBlock, and ensures the exec loop honors an explicit “exhausted” signal to avoid stalling.

Changes:

Removes the maxBlockNum fallback from applyLoopMissingBlocks, so only blocks that actually produced tx-results are considered “missing” when the apply loop exits.
Updates execLoop to exit cleanly when blockResult.Exhausted is observed, allowing deferred channel closes to unblock the apply loop and propagate ErrLoopExhausted.
Extends robustness tests around partial-batch exits and the exhausted signal (with some tests currently modeling behavior rather than exercising production code).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File	Description
`execution/stagedsync/exec3_parallel.go`	Adjusts missing-block completeness logic and adds `blockResult.Exhausted` handling in `execLoop` to prevent partial-batch stalls.
`execution/stagedsync/exec3_parallel_robustness_test.go`	Updates/expands tests for partial-batch behavior and exec-loop exhausted handling.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@@ -352,8 +352,8 @@ func (pe *parallelExecutor) exec(ctx context.Context, execStage *StageState, u U
 					// (no blockResult for it) but the channel closed cleanly: that
 					// means the exec loop signaled "done" without delivering the
 					// final block.


+// TestApplyLoopChannelCloseOrder exercises the documented invariant
+// that the exec loop closes commitResults BEFORE applyResults on
+// shutdown. The calculator drains commitResults and signals the apply
+// loop via rootResults; if applyResults closes first, the apply loop
+// can race with the calculator's final commitment write.
+//
+// The exec loop's deferred close in execLoop() does the right thing
+// (commitResults first), but the close ordering is load-bearing —
+// changing it silently corrupts the shutdown sequence. This test
+// captures the shape of the deferred close to pin the order down.
+func TestApplyLoopChannelCloseOrder(t *testing.T) {
+	commitResults := make(chan struct{})
+	applyResults := make(chan struct{})
+
+	closeOrder := make(chan string, 2)
+
+	// Mimic execLoop's deferred close: commitResults first, then
+	// applyResults.
+	closeBoth := func() {
+		close(commitResults)
+		closeOrder <- "commitResults"
+		close(applyResults)
+		closeOrder <- "applyResults"
+	}
+
+	go closeBoth()
+
+	first := <-closeOrder
+	second := <-closeOrder
+
+	if first != "commitResults" {
+		t.Fatalf("commitResults must close first; got close order [%s, %s]", first, second)
+	}
+	if second != "applyResults" {
+		t.Fatalf("applyResults must close second; got close order [%s, %s]", first, second)
+	}
+
+	// Sanity: both channels actually closed.
+	if _, ok := <-commitResults; ok {
+		t.Fatal("commitResults not actually closed")
+	}
+	if _, ok := <-applyResults; ok {
+		t.Fatal("applyResults not actually closed")
+	}


+	// Production decision tree, mirroring exec3_parallel.go's
+	//   if blockResult != nil { ... } block:
+	//     1. sizeEst > batchLimit → trigger + return
+	//     2. blockResult.BlockNum >= maxBlockNum → reachedMaxBlock + trigger + return
+	//     3. blockResult.Exhausted != nil → trigger + return
+	//     4. dbg.StopAfterBlock → trigger + return
+	//     5. fall through to next-block scheduling.
+	withFix := func(b *blockResult, trigger func()) (done bool) {


+		// Inner ctx lets the goroutine cooperate with the outer timeout
+		// without leaking when the model hangs on <-br. We translate
+		// "exited via ctx" to result.hung — that is the hang signal.
+		ctx, cancel := context.WithTimeout(context.Background(), 1500*time.Millisecond)
+		defer cancel()


+// TestExecLoopExhaustedOnlySetOnFinalBlock is a guard against future
+// regressions in the dispatcher: blockResult.Exhausted being set on a
+// non-final block would cause the exec loop to drop later blocks that
+// have already been queued. Locks down the precedence: among blocks in
+// flight, only the trailing dispatched block's Exhausted is honored —
+// any earlier Exhausted signal would short-circuit the batch and lose
+// already-scheduled work. This test pins the convention to exec3.go's
+// `if exhausted != nil { break }` after dispatch: the break ensures
+// no later blockResult is produced once Exhausted is set on a
+// dispatched request.
+func TestExecLoopExhaustedOnlySetOnFinalBlock(t *testing.T) {
+	// Simulate executeBlocks' loop: it sets exhausted, dispatches the
+	// blockResult containing it, and breaks out — no further blocks.
+	// Verify the convention: count(Exhausted != nil) == 1 always, and
+	// it's always the last blockResult.
+	type dispatched struct {
+		num       uint64
+		exhausted bool
+	}
+	dispatch := func(blocks []dispatched) error {
+		var prevExhausted bool
+		for _, b := range blocks {
+			if prevExhausted {
+				return errors.New("dispatched a block after Exhausted was set — exec loop would drop it")
+			}
+			if b.exhausted {
+				prevExhausted = true
+			}
+		}
+		return nil
+	}
+
+	t.Run("Exhausted only on last block — valid", func(t *testing.T) {
+		err := dispatch([]dispatched{
+			{num: 1, exhausted: false},
+			{num: 2, exhausted: false},
+			{num: 3, exhausted: true},
+		})
+		if err != nil {
+			t.Fatalf("valid case rejected: %v", err)
+		}
+	})
+
+	t.Run("no Exhausted — valid (full-batch path)", func(t *testing.T) {
+		err := dispatch([]dispatched{
+			{num: 1, exhausted: false},
+			{num: 2, exhausted: false},
+		})
+		if err != nil {
+			t.Fatalf("valid case rejected: %v", err)
+		}
+	})
+
+	t.Run("Exhausted mid-stream — invalid (regression guard)", func(t *testing.T) {
+		err := dispatch([]dispatched{
+			{num: 1, exhausted: false},
+			{num: 2, exhausted: true},
+			{num: 3, exhausted: false},
+		})
+		if err == nil {
+			t.Fatal("dispatch convention violated: blockResult emitted after Exhausted")
+		}
+	})


yperbasis

Major

1. Stale comment at the apply-loop close site (real bug, easy fix)

exec3_parallel.go:351-354 still says:

// Also flag the case where maxBlockNum was never reached at all
// (no blockResult for it) but the channel closed cleanly: that
// means the exec loop signaled "done" without delivering the
// final block.

This is exactly the behavior commit 550cd1d574 removed. The function-level comment on applyLoopMissingBlocks was correctly updated; the call-site comment was missed. Should be deleted or replaced to explain that not-reaching-maxBlockNum is now a normal partial-batch state.

Minor (optional)

2. All new tests are decision-tree models, not the real `execLoop`

The PR description acknowledges this — full integration would need workers + calculator + state setup. But the consequence is real: TestExecLoopHonorsBlockResultExhausted's withFix model is hand-coded to mirror production precedence. If the production order changes, the test passes vacuously while the model drifts. Worth a comment in the test pointing at the exact production lines being mirrored, so a future change is more likely to update both. Same for TestApplyLoopPartialBatchReturnsErrLoopExhausted.

3. Comment slightly imprecise

Lines 901-902:

(initialCycle gates it on crossing a step boundary;
 later cycles enforce it unconditionally)

The initial-cycle path also requires !dbg.DiscardCommitment() (exec3.go:675). Minor.

4. Unused field in test result type

TestApplyLoopPartialBatchReturnsErrLoopExhausted's errSubstring is set but never asserted on. Trivial cleanup.

mh0lt · 2026-05-07T15:22:41Z

@yperbasis CI is green now — ready for re-review. The earlier failure was kurtosis assertoor_regular_test flake (context-canceled in the assertoor task runner, not a test assertion); the re-run passed without any code change.

- Replace stale comment at the apply-loop close site (exec3_parallel.go:351-354) that still described the removed maxBlockNum fallback. - Refine the executeBlocks-Exhausted comment to mention the !dbg.DiscardCommitment() initial-cycle gate at exec3.go:675. - Add a cross-reference comment to TestApplyLoopPartialBatchReturnsErrLoopExhausted pinning it to exec3_parallel.go's apply-loop exit branch around line 355, matching the convention already used in TestExecLoopHonorsBlockResultExhausted. - Drop unused errSubstring field from the test's result struct. - Reduce TestExecLoopHonorsBlockResultExhausted hang-detection timeout from 1500ms inner / 2000ms outer to 300ms / 500ms — in-memory channel + closure dispatch resolves on a microsecond timescale, so the larger budget was pure test runtime overhead.

mh0lt · 2026-05-07T16:46:10Z

@yperbasis pushed 686e11b addressing the review:

Item	Fix
Major #1 — stale comment at `exec3_parallel.go:351-354` describing the removed maxBlockNum fallback	Replaced with a short explanation of why not-reaching-maxBlockNum is now a normal partial-batch state
Minor #2 — tests are decision-tree models; risk of model drifting from production	Added a cross-reference comment in `TestApplyLoopPartialBatchReturnsErrLoopExhausted` pinning the closure to `exec3_parallel.go` around line 355 (matching the convention `TestExecLoopHonorsBlockResultExhausted` already used)
Minor #3 — initialCycle comment imprecision	Updated to mention the `!dbg.DiscardCommitment()` gate and reference `exec3.go:675`
Minor #4 — unused `errSubstring` field	Dropped
Copilot test-runtime concern — 1.5s+2s budget per test	Reduced to 300ms inner / 500ms outer; in-memory channel dispatch resolves on a microsecond timescale, so the larger budget was pure overhead

The Copilot suggestions about extracting production logic into helpers and unit-testing the helpers (so the tests drive real code rather than a model) are good but a larger refactor than fits this PR — would prefer to leave them as a follow-up.

Lint clean, full go test -race ./execution/stagedsync/ for the robustness tests passes.

…ction Addresses Copilot's review feedback on #21039: the original tests modeled the exec-loop and apply-loop decision trees in local closures, which would let the model drift from production silently. This follow-up extracts three pure helpers from the production code and rewires the tests to drive them directly. Helpers extracted: * `closeApplyChannels` (pe method) — pins commitResults-then-applyResults close ordering. Returns the close sequence as a string slice so tests can verify the order deterministically without racing on observer- goroutine wakeups (the previous test had a real flake here). * `execLoopShouldExit` (pure func) — encodes the per-blockResult exit precedence: sizeLimit > maxBlockNum > Exhausted > StopAfterBlock. Production exec-loop now switches on the returned decision; tests pin each precedence rule with table-driven cases. * `shouldMarkExhaustedAtBlock` (pure func) — encodes executeBlocks's per-cycle block-limit gate (initial-cycle protections + blockLimit span check + maxBlockNum guard). Replaces the old "dispatch convention" model test that didn't drive any production code. Tests updated: * `TestApplyLoopChannelCloseOrder` — calls `pe.closeApplyChannels()` on a real `parallelExecutor` with mock channels and asserts on the returned close-order slice. Also covers double-close recovery and the post-close nil-out. * `TestExecLoopShouldExitPriority` — table-driven coverage of the exit-priority decisions including the cross-condition cases (e.g. maxBlockNum AND Exhausted both set on the final block — maxBlockNum must win). * `TestShouldMarkExhaustedAtBlock` — table-driven coverage of the initial-cycle gates, blockLimit==0 disabled case, span < limit case, and the maxBlockNum exact-hit no-mark case. No production behaviour change — pure refactor + test rewire.

mh0lt · 2026-05-07T17:58:22Z

Follow-up #21046 opens against this branch — extracts closeApplyChannels / execLoopShouldExit / shouldMarkExhaustedAtBlock from production and rewires the robustness tests to drive them directly, addressing Copilot's "tests model rather than drive production" feedback. No production behaviour change in the follow-up; once this PR merges I'll rebase #21046 on main.

…ction (#21046) Follow-up to #21039 addressing Copilot's review feedback that the new robustness tests were modelling the exec-loop and apply-loop decision trees in local closures — which would let the model drift from production silently. This PR extracts three pure helpers from the production code and rewires the tests to drive them directly. **No production behaviour change** — the helpers wrap the exact same logic that was inline before. ## Helpers extracted | Helper | Production call site | What it pins | |---|---|---| | `closeApplyChannels` (pe method) | `execLoop` deferred close | commitResults-then-applyResults order | | `execLoopShouldExit` (pure) | `execLoop` per-blockResult branch | sizeLimit > maxBlockNum > Exhausted > StopAfterBlock precedence | | `shouldMarkExhaustedAtBlock` (pure) | `executeBlocks` per-cycle limit gate | initial-cycle gates + blockLimit + maxBlockNum guard | `closeApplyChannels` returns the actual close-order as a string slice — the previous test was racy because observer goroutines could wake up out of order even when the helper closed the channels in the right order; the slice removes that race entirely. ## Tests rewired * **`TestApplyLoopChannelCloseOrder`** — now calls `pe.closeApplyChannels()` on a real `parallelExecutor` with mock channels and asserts on the returned close-order slice. Also covers double-close recovery and the post-close nil-out. * **`TestCloseApplyChannelsDoubleCloseRecovers`** — verifies the recover-on-double-close path doesn't count pre-closed channels as freshly closed. * **`TestExecLoopShouldExitPriority`** — table-driven coverage of each branch, including cross-condition cases (e.g. maxBlockNum AND Exhausted both set on the final block — maxBlockNum must win for the clean reachedMaxBlock path). * **`TestShouldMarkExhaustedAtBlock`** — table-driven coverage of the initial-cycle gates (\`lastExecutedStep\`/\`lastFrozenStep\`/\`DiscardCommitment\`), the \`blockLimit==0\` disabled case, the span < limit case, and the \`maxBlockNum\` exact-hit no-mark case. The previous \`TestExecLoopHonorsBlockResultExhausted\` and \`TestExecLoopExhaustedOnlySetOnFinalBlock\` tests are removed — both are subsumed by the new \`TestExecLoopShouldExitPriority\` and \`TestShouldMarkExhaustedAtBlock\` cases that drive production code. ## Test plan - [x] All four new tests pass: \`TestApplyLoopChannelCloseOrder\`, \`TestCloseApplyChannelsDoubleCloseRecovers\`, \`TestExecLoopShouldExitPriority\`, \`TestShouldMarkExhaustedAtBlock\` - [x] Existing tests in the same file (\`TestApplyLoopMissingBlocks\`, \`TestExecLoopExitCheck\`, \`TestApplyLoopRootResultsCloseDoesNotRace\`, etc.) still pass - [x] \`make lint\` clean - [x] No production behaviour change (\`go build ./...\` clean) ## Stack Stacked on #21039 (the apply-loop fix). Both should land together — this PR alone doesn't make sense without #21039's helper-using tests as context.

…op-partial-batch-no-spurious-flag

mh0lt · 2026-05-08T08:24:01Z

Re-pushed with main merged in (494b1446f4). The two CI failures on the previous run are both unrelated to this PR:

race-tests / execution-other — TestHistoryVerification_SimpleBlocks: pre-existing data race in db/state/aggregator.go:623 (SetCompressWorkers) ↔ db/state/inverted_index.go:1003 ((*InvertedIndex).collate()). Stack trace shows the race is between ExecV3 calling agg.PresetChainTipConcurrency() (write) and a background-aggregator collation goroutine reading compressWorkers (read). Neither file is touched by this PR — the 3-dot diff vs main is purely exec3_parallel.go + exec3_parallel_robustness_test.go. Race-tests is also a fresh job since this PR's previous run, so the race may have surfaced from another concurrent change.
mainnet-rpc-integ-tests: ran for 6h before the runner cleaned up orphan processes — clearly an infra timeout / disconnected runner, not a test assertion failure.

The merge with main + re-run should clear both. If TestHistoryVerification_SimpleBlocks keeps racing, that's a separate issue to file against the aggregator code.

mh0lt · 2026-05-08T08:48:12Z

Filed #21062 to fix the underlying data race surfaced on this PR's race-tests run (TestHistoryVerification_SimpleBlocks — race between SetCompressWorkers and InvertedIndex.collate's read of CompressorCfg). Verified locally: -race passes with the fix. Once #21062 lands and this branch picks it up via main-merge, the race-tests entry should clear.

mh0lt · 2026-05-08T08:52:43Z

Blocked by #21062 — that PR fixes the underlying data race in db/state/aggregator.go ↔ db/state/inverted_index.go that surfaces here as the race-tests / execution-other failure. The race is not introduced by this PR (verified locally: 3-dot diff vs main only touches exec3*.go, no db/state/ files), it's a pre-existing race that was hidden until race-tests started running on this PR's matrix.

Once #21062 merges to main and we rebase, this PR's CI clears.

I deliberately did not fold #21062's commit into this branch — that'd muddy two unrelated PRs (apply-loop fix vs aggregator race fix) and force a single review/revert decision on both. Reviewing them separately preserves cleaner attribution and rollback boundaries.

mh0lt requested a review from yperbasis as a code owner May 7, 2026 12:58

mh0lt mentioned this pull request May 7, 2026

ci: matrix-test serial vs parallel exec across the test workflows #21017

Open

5 tasks

yperbasis requested changes May 7, 2026

View reviewed changes

mh0lt mentioned this pull request May 7, 2026

stagedsync: fix commitmentCalculator asOfReader.txNum=0 lazy-load on snapshot-loaded chains #21010

Merged

3 tasks

yperbasis requested review from Copilot and taratorio May 7, 2026 14:58

Copilot started reviewing on behalf of yperbasis May 7, 2026 14:59 View session

Copilot AI reviewed May 7, 2026

View reviewed changes

yperbasis requested changes May 7, 2026

View reviewed changes

mh0lt mentioned this pull request May 7, 2026

execution/stagedsync: extract helpers so robustness tests drive production #21046

Merged

4 tasks

mh0lt and others added 2 commits May 7, 2026 21:26

Merge remote-tracking branch 'origin/main' into fix/parallel-apply-lo…

494b144

…op-partial-batch-no-spurious-flag

mh0lt mentioned this pull request May 8, 2026

db/state: serialize Aggregator.Set*Workers to fix race with collate (blocks #21039 CI) #21062

Open

4 tasks

yperbasis requested a review from Copilot May 8, 2026 10:15

Copilot started reviewing on behalf of yperbasis May 8, 2026 10:16 View session

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

execution/stagedsync: don't flag maxBlockNum on partial-batch apply-loop exit#21039

execution/stagedsync: don't flag maxBlockNum on partial-batch apply-loop exit#21039
mh0lt wants to merge 6 commits intomainfrom
fix/parallel-apply-loop-partial-batch-no-spurious-flag

mh0lt commented May 7, 2026 •

edited

Loading

Uh oh!

yperbasis left a comment

Uh oh!

mh0lt commented May 7, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

yperbasis left a comment

Uh oh!

mh0lt commented May 7, 2026

Uh oh!

mh0lt commented May 7, 2026

Uh oh!

mh0lt commented May 7, 2026

Uh oh!

mh0lt commented May 8, 2026

Uh oh!

mh0lt commented May 8, 2026

Uh oh!

mh0lt commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

mh0lt commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Update — additional fix in commit 310a094514 (execution/stagedsync: honor blockResult.Exhausted in exec loop)

Why

Shutdown sequence verified

Tests

Test plan

CI fix linkage

Uh oh!

yperbasis left a comment

Choose a reason for hiding this comment

Uh oh!

mh0lt commented May 7, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

yperbasis left a comment

Choose a reason for hiding this comment

Major

1. Stale comment at the apply-loop close site (real bug, easy fix)

Minor (optional)

2. All new tests are decision-tree models, not the real execLoop

3. Comment slightly imprecise

4. Unused field in test result type

Uh oh!

mh0lt commented May 7, 2026

Uh oh!

mh0lt commented May 7, 2026

Uh oh!

mh0lt commented May 7, 2026

Uh oh!

mh0lt commented May 8, 2026

Uh oh!

mh0lt commented May 8, 2026

Uh oh!

mh0lt commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mh0lt commented May 7, 2026 •

edited

Loading

Update — additional fix in commit `310a094514` (`execution/stagedsync: honor blockResult.Exhausted in exec loop`)

2. All new tests are decision-tree models, not the real `execLoop`