fix(nodes): retry daemon status during boot race to kill 10s flicker#100
Open
Nic-dorman wants to merge 2 commits into
Open
fix(nodes): retry daemon status during boot race to kill 10s flicker#100Nic-dorman wants to merge 2 commits into
Nic-dorman wants to merge 2 commits into
Conversation
Cold-launch flow before this fix:
t=0 app boot, nodesStore.init() runs
t=~0.5s ensure_daemon_running spawns daemon, polls for port file
t=~1s port file appears, ensure_daemon_running returns URL
(daemon's HTTP server may not have bound the port yet)
t=~1s first fetchDaemonStatus call → connection refused → throws
t=~1s outer catch: daemonConnected=false, initializing=false,
scheduleReconnect (10s timer)
t=1-11s UI shows "Cannot connect to node daemon" for ~10s
t=~11s reconnect timer fires, retry succeeds, UI flips to nodes
The port file is written before the HTTP server binds, so a single
status fetch immediately after `ensure_daemon_running` reliably races
the bind on cold boot.
New behavior: `fetchDaemonStatusWithRetry` retries up to ~5s (20 ×
250ms) inside init() with `initializing=true` throughout. The user
sees "Starting node daemon..." until the daemon actually serves or
the budget exhausts. The disconnected path and scheduleReconnect
loop are unchanged for genuine outages.
Test added: mocks a first-attempt failure + second-attempt success
under fake timers. Asserts no transient `daemonConnected=false`
escapes init() and the store ends up connected with two status calls.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The original test relied on \`.mockRejectedValueOnce → .mockResolvedValueOnce\` to script the daemon race, plus an implicit cross-test fallback for any extra calls. CI hit 23 status() calls instead of 2 because: 1. \`beforeEach\` did not clear vue-i18n / daemon-api module mocks, so the \`.mockResolvedValue\` default fallback set by the prior test leaked into this one and silently absorbed extra calls. 2. \`nodesStore.\$reset()\` clears state references like \`_pollTimer\` and \`_reconnectTimer\` but does not call \`clearInterval\`/\`clearTimeout\` on the underlying JS timers, so a previous test's polling or reconnect schedule could fire \`fetchNodes\`/\`fetchDaemonStatus\` against this test's mocks mid-run. Fix: - \`beforeEach\` now \`vi.clearAllMocks\`s, restores real timers, and explicitly calls \`stopPolling\` / \`cancelReconnect\` on the store. - The retry test uses a counter-based \`mockImplementation\` (throws on call 1, succeeds on call \>=2) instead of the Once queue, so the fixture is self-contained and the call-count assertion is meaningful. - The retry test cleans up its own polling / reconnect timers before flipping back to real timers. No behaviour change in the store under test — only test hygiene. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The "Cannot connect to node daemon" message flashes for ~10 seconds on every cold launch. Root cause: a boot race between the daemon's port-file write and its HTTP-server bind.
Cold-launch timing today
Fix
New `fetchDaemonStatusWithRetry({ attempts, delayMs })` helper retries up to ~5 s (20 × 250 ms) inside `init()`. `initializing` stays true through the retries, so the user sees "Starting node daemon..." until the daemon actually serves or the budget exhausts.
The disconnected path and the existing `scheduleReconnect` 10 s loop are untouched — those still kick in for genuine outages.
Test plan
🤖 Generated with Claude Code