Skip to content

fix: implement graceful shutdown respecting OS signals#505

Open
Ajpantuso wants to merge 1 commit into
openshift:masterfrom
Ajpantuso:apantuso/graceful_shutdown
Open

fix: implement graceful shutdown respecting OS signals#505
Ajpantuso wants to merge 1 commit into
openshift:masterfrom
Ajpantuso:apantuso/graceful_shutdown

Conversation

@Ajpantuso
Copy link
Copy Markdown
Contributor

@Ajpantuso Ajpantuso commented May 4, 2026

Summary

Implements graceful shutdown when SIGTERM or SIGINT is received to prevent blocking node drains.

Summary by CodeRabbit

  • Improvements
    • Implemented graceful server shutdown with controlled termination and signal handling.
    • Enhanced application lifecycle management through improved context initialization.
    • Optimized server startup and shutdown flow for more reliable operations.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 4, 2026

Walkthrough

The main function now installs a controller-runtime signal handler to manage application shutdown. Metrics configuration uses this signal context instead of context.TODO(). The HTTP server switches from blocking calls to a background goroutine with graceful shutdown triggered by either server error or signal reception, using a 10-second timeout.

Changes

Application Lifecycle & Graceful Shutdown

Layer / File(s) Summary
Dependencies
cmd/main.go (imports)
Adds sigs.k8s.io/controller-runtime/pkg/manager/signals for signal handler setup.
Signal Handler Installation
cmd/main.go (lines 80–93)
ctrl.SetupSignalHandler() creates a context from OS signals; metrics configuration uses this context instead of context.TODO().
Server Lifecycle Refactor
cmd/main.go (lines 114–140)
HTTP server runs in a background goroutine reporting errors to a channel. Main goroutine waits for either server failure or ctx.Done() signal, then performs graceful shutdown with 10-second timeout.

Sequence Diagram

sequenceDiagram
    participant Main as Application
    participant SigHandler as Signal Handler
    participant Metrics as Metrics
    participant Server as HTTP Server
    participant OS as OS Signals

    Main->>SigHandler: SetupSignalHandler()
    SigHandler->>OS: Subscribe to SIGTERM/SIGINT
    SigHandler-->>Main: Return context
    
    Main->>Metrics: ConfigureMetrics(ctx)
    Metrics->>Metrics: Initialize with signal context
    
    Main->>Server: Start server in goroutine
    Server-->>Main: Report errors via channel
    
    alt Signal or Server Error
        OS->>SigHandler: Signal received
        SigHandler->>Main: ctx.Done() fires
    else
        Server->>Main: Error sent to channel
    end
    
    Main->>Server: Shutdown(10s timeout)
    Server->>Server: Graceful drain & close
    Server-->>Main: Shutdown complete
    Main->>Main: Exit
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes


Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Ote Binary Stdout Contract ❓ Inconclusive Unable to access and directly examine the actual code changes in cmd/main.go to verify stdout contract compliance of graceful shutdown implementation. Directly inspect modified cmd/main.go to verify no fmt.Print, log.Print, or klog calls write to stdout in main() or top-level code during graceful shutdown.
✅ Passed checks (10 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: implementing graceful shutdown with OS signal handling. The code adds signal handling via ctrl.SetupSignalHandler() and implements graceful server shutdown.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed All Ginkgo test names in the e2e test file are stable and deterministic, containing no dynamic information such as generated identifiers, timestamps, variable interpolation, or other values that could change between test runs.
Test Structure And Quality ✅ Passed The PR only modifies cmd/main.go for graceful shutdown and contains no changes to Ginkgo test files.
Microshift Test Compatibility ✅ Passed No Ginkgo e2e tests were found in the repository or PR changes based on search for Ginkgo patterns and imports.
Single Node Openshift (Sno) Test Compatibility ✅ Passed PR does not add new Ginkgo e2e tests; only modifies cmd/main.go for graceful shutdown handling.
Topology-Aware Scheduling Compatibility ✅ Passed PR modifies only cmd/main.go for graceful shutdown logic with no scheduling constraints, pod affinity rules, node selectors, or deployment manifest changes.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed PR does not add any new Ginkgo e2e tests; changes are limited to graceful shutdown implementation in main.go. Custom check scope requires new e2e tests to apply.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
Review rate limit: 0/1 reviews remaining, refill in 60 minutes.

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 4, 2026

@Ajpantuso: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
cmd/main.go (1)

125-131: ⚡ Quick win

Defensively filter http.ErrServerClosed in the server error case.

Structurally, http.ErrServerClosed cannot land in errCh while the select is still active today — server.Shutdown() is only called after ctx.Done() exits the select. However, without an explicit guard, a future change that calls Shutdown() earlier (e.g., from another goroutine) would silently misclassify a clean shutdown as a fatal error and exit with code 1. Adding the check costs one line and self-documents the intended invariant.

🛡️ Proposed defensive filter
 case err := <-errCh:
-    log.Error(err, "Server failed")
-    os.Exit(1)
+    if !errors.Is(err, http.ErrServerClosed) {
+        log.Error(err, "Server failed")
+        os.Exit(1)
+    }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/main.go` around lines 125 - 131, When receiving an error from errCh
inside the select, defensively ignore the benign http.ErrServerClosed instead of
treating it as a fatal error: replace the current case handling (case err :=
<-errCh:) with a conditional that checks if err == http.ErrServerClosed (or
errors.Is(err, http.ErrServerClosed)) and, if so, log it at Info/debug level and
return without calling os.Exit(1); otherwise call log.Error(err, "Server
failed") and os.Exit(1). This touches the select branch that reads from errCh
and uses ctx.Done(), and references errCh, ctx.Done(), http.ErrServerClosed,
log.Error and os.Exit.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@cmd/main.go`:
- Around line 125-131: When receiving an error from errCh inside the select,
defensively ignore the benign http.ErrServerClosed instead of treating it as a
fatal error: replace the current case handling (case err := <-errCh:) with a
conditional that checks if err == http.ErrServerClosed (or errors.Is(err,
http.ErrServerClosed)) and, if so, log it at Info/debug level and return without
calling os.Exit(1); otherwise call log.Error(err, "Server failed") and
os.Exit(1). This touches the select branch that reads from errCh and uses
ctx.Done(), and references errCh, ctx.Done(), http.ErrServerClosed, log.Error
and os.Exit.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 1ca11990-11ec-498d-964a-645d5effc8d3

📥 Commits

Reviewing files that changed from the base of the PR and between 25b62a0 and 66cc288.

📒 Files selected for processing (1)
  • cmd/main.go

@tnierman
Copy link
Copy Markdown
Member

tnierman commented May 4, 2026

/approve

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 4, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Ajpantuso, tnierman

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants