Skip to content

Commit 1f56a4d

Browse files
abueideclaude
andauthored
feat(core): add RetryManager state machine for TAPI backoff/retry (#1158)
* feat(core): add RetryManager state machine for TAPI backoff/retry Add RetryManager with three states (READY, RATE_LIMITED, BACKING_OFF) that handles 429 rate limiting with Retry-After parsing and transient error exponential backoff with jitter. Includes sovran-based state persistence and configurable retry strategies. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: 429 overrides BACKING_OFF state and use additive-only jitter - Remove early return in handle429 when in BACKING_OFF state. 429 is a server-explicit signal that should always take precedence over transient backoff. - Change jitter from ±jitterPercent to additive-only (0 to jitterPercent) per SDD specification. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: remove incorrect claim about global counter equivalence Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: add persistence validation, atomic backoff calc, and 429 precedence - Validate persisted state in canRetry() to handle clock changes/corruption per SDD §Metadata Lifecycle - Move backoff calculation inside dispatch to avoid stale retryCount from concurrent batch failures (handleErrorWithBackoff) - Ensure RATE_LIMITED state is never downgraded to BACKING_OFF - Update reset() docstring to clarify when it should be called Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor: clean up RetryManager comments for post-merge readability Simplify class docstring to describe current architecture without referencing SDD deviations. Remove redundant inline comments, compact JSDoc to single-line where appropriate, and ensure all comments use present tense. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: 429 Retry-After overrides transient backoff authoritatively When a 429 arrives while in BACKING_OFF state, use the server's Retry-After directly instead of applying the lazy/eager strategy. The server's timing signal is authoritative over calculated backoff. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor: consolidate error handlers, fix getState race, add limit signaling - Use getState(true) for queue-safe reads to prevent race conditions between concurrent canRetry/handle429/handleTransientError calls - Consolidate handleError and handleErrorWithBackoff into a single method that accepts a computeWaitUntilTime function - Extract side effects (logging, Math.random) from dispatch reducers - Return RetryResult ('rate_limited'|'backed_off'|'limit_exceeded') from handle429/handleTransientError so callers can drop events on limit exceeded - Clear auto-flush timer in transitionToReady - Validate state string in isPersistedStateValid Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * style: trim verbose inline comments in RetryManager Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor: remove auto-flush timer from RetryManager TimerFlushPolicy already drives periodic flushes; the auto-flush timer in RetryManager was redundant. Removes setAutoFlushCallback, scheduleAutoFlush, clearAutoFlushTimer, and the destroy() method. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: remove retryStrategy param, hardcode eager behavior The retryStrategy parameter was previously removed but accidentally re-added. Eager behavior (take shorter wait when consolidating concurrent errors) is now hardcoded as the only strategy. - Remove retryStrategy field and constructor parameter - Replace applyRetryStrategy() with Math.min (eager behavior) - Update class documentation to reflect hardcoded strategy Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * refactor: simplify RetryManager constructor and error handling Extract store creation logic into createStore() helper method and error message extraction into getErrorMessage() utility. Benefits: - Constructor now just assigns fields and delegates to helper - No nested try-catch blocks (linear flow) - Error message formatting is DRY - Easier to understand and maintain Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * refactor: use enums and switch statements for clearer state handling Replace string literals with TypeScript enums: - RetryState enum (READY, RATE_LIMITED, BACKING_OFF) - RetryResult enum (RATE_LIMITED, BACKED_OFF, LIMIT_EXCEEDED) Extract helper methods for clarity: - resolveStatePrecedence(): handles 429 taking priority over backoff - consolidateWaitTime(): uses switch statement for clear wait time logic - getStateDisplayName(): maps state to display names Benefits: - Type-safe state handling (no magic strings) - Switch statements make control flow explicit - Each helper method has a single, named responsibility - Easier to test and maintain Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * refactor: improve RetryManager readability - Extract retry-after clamping into clampRetryAfter method - Extract state computation into computeNewState method - Add stateToResult helper to map state to result enum - Break up isPersistedStateValid into focused validation methods - Eliminate nested conditionals and improve single responsibility * refactor: remove VALID_STATES Set, use Object.values for enum validation Use Object.values(RetryState).includes() instead of maintaining a duplicate Set of valid states. More idiomatic TypeScript and eliminates maintenance burden of keeping Set in sync with enum. --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
1 parent c545aa5 commit 1f56a4d

3 files changed

Lines changed: 522 additions & 0 deletions

File tree

0 commit comments

Comments
 (0)