fix: preserve replay tool output continuation#2247
Open
anzhen-tech wants to merge 1 commit intoWei-Shaw:mainfrom
Open
fix: preserve replay tool output continuation#2247anzhen-tech wants to merge 1 commit intoWei-Shaw:mainfrom
anzhen-tech wants to merge 1 commit intoWei-Shaw:mainfrom
Conversation
Contributor
|
All contributors have signed the CLA. ✅ |
Author
|
I have read the CLA Document and I hereby sign the CLA |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
问题说明
现有修复并不完整。
当前版本只在“当前 payload”里检查是否存在
function_call_output,然后决定是否可以移除previous_response_id做恢复重试。但在 WS ingress 的
store=false重放链路里,function_call_output不一定出现在当前客户端 payload 中。当前 payload 可能只是普通输入,而服务端会通过buildOpenAIWSReplayInputSequence把历史 turn 合并成完整 replay input。也就是说,真实请求可能是:
function_call_outputprevious_response_idfunction_call_outputprevious_response_id并用 replay input 重试function_call_outputNo tool call found for function call output with call_id ...为什么现有版本没有完全修复
现有版本只覆盖了“当前 payload 直接包含
function_call_output”的场景。它没有覆盖“当前 payload 不包含
function_call_output,但重建后的 replay input 包含function_call_output”的场景。这就是线上仍然会出现:
No tool call found for function call output...的原因。
本 PR 的修复
本 PR 将判断范围从:
function_call_output扩展为:
function_call_outputfunction_call_output只要任一位置存在
function_call_output,就不能移除previous_response_id,避免破坏上游 response chain。额外安全处理
直接保留旧的
previous_response_id并换新上游 WS 连接重试也不安全。因为
previous_response_id依赖原来的上游续链连接。原连接不可用后,把旧previous_response_id发到新连接,可能触发:previous_response_not_found所以本 PR 在这种情况下选择本地 fail-close,而不是把必然不可靠的请求继续发给新上游。
验证