Skip to content

fix(openai): trigger account failover on passthrough 403 forbidden_error#2213

Open
Nimm0ny wants to merge 1 commit intoWei-Shaw:mainfrom
Nimm0ny:fix/passthrough-403-failover
Open

fix(openai): trigger account failover on passthrough 403 forbidden_error#2213
Nimm0ny wants to merge 1 commit intoWei-Shaw:mainfrom
Nimm0ny:fix/passthrough-403-failover

Conversation

@Nimm0ny
Copy link
Copy Markdown

@Nimm0ny Nimm0ny commented May 6, 2026

问题

OpenAI passthrough 模式(走 OAuth 桥接的 Codex / ChatGPT 订阅账号)的失败转移决策函数 shouldFailoverOpenAIPassthroughResponse 当前只把 429529 当作可 failover 状态码:

// backend/internal/service/openai_gateway_service.go:3086
func shouldFailoverOpenAIPassthroughResponse(statusCode int) bool {
    switch statusCode {
    case http.StatusTooManyRequests, 529:
        return true
    default:
        return false
    }
}

但 Codex / ChatGPT 订阅账号的"用量配额耗尽"信号实际是 403 forbidden_error (例如 {"error": {"message": "usage limit reached", "type": "forbidden_error"}}),不是 429。这意味着:

  • 客户 API key 在 group 内绑定多个 OAuth 账号
  • 当前账号配额跑完返回 403
  • vanilla 不 failover,把 403 原样透回客户端
  • 用户体验:单账号跑完后 key 直接断开,即使同 group 还有健康账号

shouldFailoverUpstreamError(用于非 passthrough 的标准模式)对比,后者已经包含 case 401, 402, 403, 429, 529 — passthrough 路径漏掉了 403。

改动

shouldFailoverOpenAIPassthroughResponse 增加 http.StatusForbidden,让 passthrough 模式也能在 403 时触发 handler.FailoverState 循环。

复用现有的 cooldown 状态机(SetTempUnschedulable + IncrementOpenAI403Count),不引入新状态:

  • 单次 403:账号进入 10 分钟临时冷却("OpenAI 403 temporary cooldown"),group 内调度器立即跳过它
  • 累计 3 次 403:升级为永久判坏号(走 SetError 路径)

测试

TestOpenAIGatewayService_OpenAIPassthrough_FailoverStatusesTriggerAccountSwitch 增加 oauth_403_temp_unschedulable case,验证:

  • repo.tempUnschedulableIDs == [123]
  • until 在 9 分钟之后(=10 分钟 cooldown)
  • repo.tempUnschedulableWhy[0] 包含 "OpenAI 403 temporary cooldown"

本地 go test ./internal/service -run TestOpenAIGatewayService_OpenAIPassthrough_FailoverStatusesTriggerAccountSwitch -count=1 -v 全部通过(5 个 subcase)。

兼容性

  • 非 passthrough 路径不受影响(那条路 vanilla 已经处理 403)
  • 单次 403 不会立刻判坏账号 — 临时冷却 10 分钟,符合 sub2api 现有的"先临时冷却,再根据连续次数决定是否判坏号"逻辑(commit 11cf23da 提过同样思路)

OpenAI passthrough mode (OAuth-bridged Codex / ChatGPT subscription accounts)
previously only failed over on 429/529, returning upstream 403 unchanged to
the client. This means a Codex usage-limit-reached response (403 with
{"error": {"type": "forbidden_error"}}) would terminate the user's request
even though the same group has other healthy accounts available.

Add http.StatusForbidden to shouldFailoverOpenAIPassthroughResponse so the
existing handler.FailoverState loop kicks in. Test coverage extended:
oauth_403_temp_unschedulable verifies the account is temp-unscheduled with
"OpenAI 403 temporary cooldown" reason (10min) + IncrementOpenAI403Count
threshold (3 strikes before hard-disable) — same shape as the 429/529
paths, no new state machine.

Verified locally with go test; ready to send upstream as a separate PR.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 6, 2026

Thank you for your contribution! Before we can merge this PR, we need you to sign our Contributor License Agreement (CLA).

To sign, please reply with the following comment:

I have read the CLA Document and I hereby sign the CLA

You only need to sign once — it will be valid for all your future contributions to this project.


I have read the CLA Document and I hereby sign the CLA


You can retrigger this bot by commenting recheck in this Pull Request. Posted by the CLA Assistant Lite bot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant