Commit 1f56a4d
feat(core): add RetryManager state machine for TAPI backoff/retry (#1158)
* feat(core): add RetryManager state machine for TAPI backoff/retry
Add RetryManager with three states (READY, RATE_LIMITED, BACKING_OFF)
that handles 429 rate limiting with Retry-After parsing and transient
error exponential backoff with jitter. Includes sovran-based state
persistence and configurable retry strategies.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: 429 overrides BACKING_OFF state and use additive-only jitter
- Remove early return in handle429 when in BACKING_OFF state. 429 is a
server-explicit signal that should always take precedence over transient
backoff.
- Change jitter from ±jitterPercent to additive-only (0 to jitterPercent)
per SDD specification.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: remove incorrect claim about global counter equivalence
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: add persistence validation, atomic backoff calc, and 429 precedence
- Validate persisted state in canRetry() to handle clock changes/corruption
per SDD §Metadata Lifecycle
- Move backoff calculation inside dispatch to avoid stale retryCount from
concurrent batch failures (handleErrorWithBackoff)
- Ensure RATE_LIMITED state is never downgraded to BACKING_OFF
- Update reset() docstring to clarify when it should be called
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* refactor: clean up RetryManager comments for post-merge readability
Simplify class docstring to describe current architecture without
referencing SDD deviations. Remove redundant inline comments, compact
JSDoc to single-line where appropriate, and ensure all comments use
present tense.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: 429 Retry-After overrides transient backoff authoritatively
When a 429 arrives while in BACKING_OFF state, use the server's
Retry-After directly instead of applying the lazy/eager strategy.
The server's timing signal is authoritative over calculated backoff.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* refactor: consolidate error handlers, fix getState race, add limit signaling
- Use getState(true) for queue-safe reads to prevent race conditions
between concurrent canRetry/handle429/handleTransientError calls
- Consolidate handleError and handleErrorWithBackoff into a single
method that accepts a computeWaitUntilTime function
- Extract side effects (logging, Math.random) from dispatch reducers
- Return RetryResult ('rate_limited'|'backed_off'|'limit_exceeded')
from handle429/handleTransientError so callers can drop events on
limit exceeded
- Clear auto-flush timer in transitionToReady
- Validate state string in isPersistedStateValid
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* style: trim verbose inline comments in RetryManager
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* refactor: remove auto-flush timer from RetryManager
TimerFlushPolicy already drives periodic flushes; the auto-flush
timer in RetryManager was redundant. Removes setAutoFlushCallback,
scheduleAutoFlush, clearAutoFlushTimer, and the destroy() method.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: remove retryStrategy param, hardcode eager behavior
The retryStrategy parameter was previously removed but accidentally
re-added. Eager behavior (take shorter wait when consolidating
concurrent errors) is now hardcoded as the only strategy.
- Remove retryStrategy field and constructor parameter
- Replace applyRetryStrategy() with Math.min (eager behavior)
- Update class documentation to reflect hardcoded strategy
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
* refactor: simplify RetryManager constructor and error handling
Extract store creation logic into createStore() helper method and
error message extraction into getErrorMessage() utility.
Benefits:
- Constructor now just assigns fields and delegates to helper
- No nested try-catch blocks (linear flow)
- Error message formatting is DRY
- Easier to understand and maintain
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
* refactor: use enums and switch statements for clearer state handling
Replace string literals with TypeScript enums:
- RetryState enum (READY, RATE_LIMITED, BACKING_OFF)
- RetryResult enum (RATE_LIMITED, BACKED_OFF, LIMIT_EXCEEDED)
Extract helper methods for clarity:
- resolveStatePrecedence(): handles 429 taking priority over backoff
- consolidateWaitTime(): uses switch statement for clear wait time logic
- getStateDisplayName(): maps state to display names
Benefits:
- Type-safe state handling (no magic strings)
- Switch statements make control flow explicit
- Each helper method has a single, named responsibility
- Easier to test and maintain
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
* refactor: improve RetryManager readability
- Extract retry-after clamping into clampRetryAfter method
- Extract state computation into computeNewState method
- Add stateToResult helper to map state to result enum
- Break up isPersistedStateValid into focused validation methods
- Eliminate nested conditionals and improve single responsibility
* refactor: remove VALID_STATES Set, use Object.values for enum validation
Use Object.values(RetryState).includes() instead of maintaining a
duplicate Set of valid states. More idiomatic TypeScript and eliminates
maintenance burden of keeping Set in sync with enum.
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>1 parent c545aa5 commit 1f56a4d
3 files changed
Lines changed: 522 additions & 0 deletions
0 commit comments