Skip to content

fix: avoid infinite retry in blue-green migration test#1502

Open
nodece wants to merge 2 commits into
apache:masterfrom
nodece:fix-blue-green-ci-timeout
Open

fix: avoid infinite retry in blue-green migration test#1502
nodece wants to merge 2 commits into
apache:masterfrom
nodece:fix-blue-green-ci-timeout

Conversation

@nodece
Copy link
Copy Markdown
Member

@nodece nodece commented May 20, 2026

Motivation

The extensible-load-manager CI job timed out in TestBlueGreenMigrationTestSuite/TestTopicMigration/proxyConnection after 5 minutes.

From the stack and logs, the test was stuck waiting on WaitGroup while producer/consumer goroutines were still looping in retry paths. During migration, producer can enter terminal states (for example TopicTerminated or ProducerClosed), but the test retry loops had no terminal-exit logic, causing effectively unbounded retries and suite timeout.

Modifications

  • Add terminal error handling in producer send retry loop:
    • if error is ErrTopicTerminated or ErrProducerClosed, fail fast instead of retrying forever.
  • Add bounded retry windows for both producer and consumer loops (30 seconds per message stage).
  • Add an error channel and stage-level wait helper around WaitGroup waits to fail early on goroutine errors.
  • Add stage timeout protection while waiting for:
    • pre-unload send/receive synchronization
    • producer/consumer goroutine completion
  • Keep per-iteration context cancellation immediate (cancel after each send/receive attempt).

These changes make the test deterministic under migration failures and prevent hanging until global test timeout.

nodece and others added 2 commits May 20, 2026 15:54
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@nodece nodece force-pushed the fix-blue-green-ci-timeout branch from 3c7c9e2 to bc05b7a Compare May 20, 2026 07:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant