fix: orchestration monitor checks events.jsonl for lazy placeholder workers (fixes #400)#823
Open
github-actions[bot] wants to merge 1 commit intomainfrom
Open
fix: orchestration monitor checks events.jsonl for lazy placeholder workers (fixes #400)#823github-actions[bot] wants to merge 1 commit intomainfrom
github-actions[bot] wants to merge 1 commit intomainfrom
Conversation
…orkers After app relaunch, MonitorAndSynthesizeAsync could see workers as idle before InvokeOnUI had set IsProcessing=true (race between background thread and UI thread dispatch). This caused the monitor to collect empty/stale results and abort orchestration. Fix: - Add IsWorkerIdleForMonitor() that checks events.jsonl directly for lazy placeholder workers (Session == null), bypassing the InvokeOnUI race condition - Wait for IsRestoring to become false before starting the poll loop - Add 2s settle time for queued InvokeOnUI callbacks Tests: 10 new tests (8 behavioral + 2 structural guards) Fixes #400 Co-authored-by: copilot-agentic-workflow[bot] <224017+copilot-agentic-workflow[bot]@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
When a multi-agent worker runs
relaunch.sh(or the app is rebuilt/relaunched for any reason),MonitorAndSynthesizeAsynccould immediately see workers as idle and abort the orchestration — even though the workers were still actively processing on the CLI server.Root Cause
Race condition between two threads during app relaunch recovery:
Restore thread (
RestoreSessionsInBackgroundAsync): Creates lazy placeholderSessionStateobjects withIsProcessing = false. SetsIsProcessing = trueviaInvokeOnUI()— which is async (queued on UI thread).Monitor thread (
MonitorAndSynthesizeAsync): Starts onTask.Runand immediately checksstate.Info.IsProcessing. SinceInvokeOnUIhasn't fired yet, it seesfalsefor all workers → considers them idle → collects empty/stale results → aborts orchestration.The orchestrator never receives the completion signal, and the reflection loop hangs forever.
Fix
1.
IsWorkerIdleForMonitor()— new internal methodChecks worker idle state with a fallback for lazy placeholders:
IsProcessing == true→ not idle (normal path)Session == null(lazy placeholder) ANDIsSessionStillProcessing(sessionId)returns true (events.jsonl shows active tool execution) → not idle (fixes the race)This bypasses the
InvokeOnUIrace by checking the source of truth (events.jsonl on disk) directly.2. Restore completion wait
MonitorAndSynthesizeAsyncnow waits forIsRestoringto become false before polling, plus a 2-second settle time for queuedInvokeOnUIcallbacks to execute.Files Changed
CopilotService.Organization.csIsWorkerIdleForMonitor(), updatedMonitorAndSynthesizeAsyncpolling loopOrchestrationRelaunchTests.cs(unit)OrchestrationRelaunchTests.cs(integration)Testing
3680 tests pass (0 failures, 0 skipped)
10 new tests cover all
IsWorkerIdleForMonitorscenarios:MonitorAndSynthesizeAsyncusesIsWorkerIdleForMonitorMonitorAndSynthesizeAsyncwaits forIsRestoringIntegration tests build successfully
Fixes Workers stop when relaunch.sh rebuilds the app mid-orchestration #400
Warning
The following domain was blocked by the firewall during workflow execution:
192.0.2.1To allow these domains, add them to the
network.allowedlist in your workflow frontmatter:See Network Configuration for more information.