Fix CPU spin when defunct runner is still in use#260
Conversation
User reported this problem on windows. Signed-off-by: Eric Curtin <eric.curtin@docker.com>
Reviewer's guide (collapsed on small PRs)Reviewer's GuideThe idleCheckDuration function now defers immediate defunct checks for runners that are still referenced, preventing tight CPU loops when a defunct runner remains in use. Class diagram for updated loader idleCheckDuration logicclassDiagram
class loader {
runners map[int]RunnerInfo
slots map[int]Slot
references map[int]int
timestamps map[int]time.Time
idleCheckDuration() time.Duration
}
class RunnerInfo {
slot int
}
class Slot {
done chan_struct
}
loader --> RunnerInfo
loader --> Slot
loader --> time.Time
Flow diagram for idleCheckDuration defunct runner handlingflowchart TD
A["Start idleCheckDuration()"] --> B["Iterate over runners"]
B --> C["Check if runner is defunct (process exited)"]
C --> D["Is runner unused? (references == 0)"]
D -- Yes --> E["If defunct, return 0 (immediate check)"]
D -- No --> F["If defunct but still in use, do not return 0"]
F --> G["Continue to next runner"]
E --> H["End"]
G --> H
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Pull Request Overview
This PR fixes a CPU spinning issue that occurs when a defunct runner (whose process has exited) is still being used, causing tight loops of immediate scheduling checks. The fix differentiates between defunct runners that are unused (which need immediate checking) versus those still in use (which can wait until they become unused).
- Modified the idle check duration logic to avoid immediate returns when defunct runners are still in use
- Added logic to track defunct status and only trigger immediate checks for unused defunct runners
- Prevents CPU spinning by avoiding unnecessary immediate scheduling for in-use defunct runners
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
Summary of ChangesHello @ericcurtin, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request resolves a significant performance problem, specifically a CPU spin observed on Windows, which arose when the system attempted to repeatedly check and clean up runner processes that had terminated but were still being referenced. The core change refines the logic for determining when to immediately check for idle runners, ensuring that defunct runners are only subject to immediate cleanup if they are truly unused. This prevents the system from entering a tight loop of checks, thereby reducing CPU load and improving overall stability. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
@xenoscopic @doringeman do you think this will fix #259 ? |
There was a problem hiding this comment.
Code Review
This pull request correctly fixes a high CPU usage issue that occurs when a runner becomes defunct while still in use. The change prevents the scheduler from entering a tight loop by only scheduling an immediate check for defunct runners that are also unused. I've suggested a minor refactoring to improve the code's clarity and structure, but the core logic of your fix is sound.
|
Have you been able to reproduce the issue @ericcurtin? |
|
I don't think it will fix it. It'll postpone the eviction of a runner that's unusable if there are more references to it. |
|
I didn't reproduce, closing... |
User reported this problem on windows.
Summary by Sourcery
Bug Fixes: