fix: preserve redacted_thinking blocks in multi-turn Bedrock conversations#2880
fix: preserve redacted_thinking blocks in multi-turn Bedrock conversations#2880laiman1107 wants to merge 8 commits intoopenai:mainfrom
Conversation
…sations The previous serialisation stored only the thinking text and signature from each block, silently discarding any block whose type is "redacted_thinking". redacted_thinking blocks carry a "data" field (base64-encrypted content) instead of "thinking"/"signature", so they were filtered out in both message_to_output_items and never reconstructed in items_to_messages. Providers like Bedrock enforce that thinking/redacted_thinking blocks in the latest assistant message must be passed back verbatim on the next turn. With redacted_thinking blocks missing from the reconstructed history, Bedrock rejected subsequent turns with: "thinking or redacted_thinking blocks in the latest assistant message cannot be modified. These blocks must remain as they were in the original response." Fix: serialise the complete block list as JSON in encrypted_content so every block type (thinking and redacted_thinking) survives the round-trip unchanged. A legacy fallback decodes the old "\n"-joined signatures format so any in-flight sessions are not broken. The condition that gated thinking-block reconstruction on content_items being non-empty is also removed — redacted_thinking blocks produce no content items, so the old guard prevented reconstruction entirely. Tests: two new cases cover the pure-redacted_thinking path and the mixed thinking + redacted_thinking path.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a9dbad820d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Address review feedback: the previous change replaced with as the reconstruction guard, which silently dropped thinking blocks for older persisted turns where signatures were absent and only content text was stored. Fix by using so both paths are covered: - encrypted_content present: try JSON first, fall back to legacy '\n'-joined signatures format - encrypted_content absent but content_items present: reconstruct signature-less thinking blocks from content text only, preserving the prior behaviour for those older sessions
|
Good catch. The guard did regress the case where older persisted turns have content items but no encrypted_content (e.g. thinking blocks whose signature was absent). Fixed in the latest commit by using as the gate, then branching inside:
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4e46151014
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…locks - Import ResponseReasoningItem for proper type narrowing - Replace hasattr guards with assert isinstance(reasoning_item, ResponseReasoningItem) and cast(ResponseReasoningItem, ...) so mypy can verify attribute access - Add explicit 'is not None' assertions before json.loads calls to satisfy the str | None -> str narrowing mypy requires - Run ruff format to satisfy the format check
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 36eebafbda
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c126f17ab6
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
The last-resort fallback in LitellmConverter produces dicts with only a 'thinking' key and no 'type'. Including these in the JSON payload caused the replay-side validation (which requires every block to have a 'type') to fall back to legacy signature parsing, misinterpreting the whole JSON string as a newline-joined signature and producing malformed thinking blocks. Filter to only serialise blocks that carry a 'type' field, making the serialisation and validation paths consistent.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a09b7e175d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Rather than filtering out blocks that lack a 'type' field (which silently
drops thinking content and can still cause Bedrock to reject subsequent
turns), inject 'type': 'thinking' for any block that has a 'thinking' key
but no 'type'. This covers the last-resort fallback path in
LitellmConverter that emits {"thinking": str(block)} dicts.
Blocks that are not dicts, or have neither 'type' nor 'thinking', are
still discarded as unrecoverable.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 363a59484e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
reasoning_item.get('content', []) returns None when the key is present
but explicitly set to None (e.g. streamed Anthropic responses where
chatcmpl_stream_handler sets a signature without populating content).
The old 'and content_items' guard would short-circuit before iterating;
the new '(content_items or encrypted_content)' gate no longer does.
Use 'or []' to ensure content_items is always iterable regardless of
whether the key is absent or explicitly None.
Problem
When using Claude with extended thinking in multi-turn agent conversations, thinking blocks are silently dropped during history reconstruction, causing providers to reject subsequent turns.
Two concrete failure modes, both caused by the same bug:
1. AWS Bedrock —
redacted_thinkingblocksBedrock returns
redacted_thinkingblocks with adatafield (base64-encrypted content) instead ofthinking/signature:{"type": "redacted_thinking", "data": "SGVsbG8..."}These blocks were completely invisible to the serialisation code — they have no
"thinking"key and no"signature"key — so they were silently dropped. Bedrock then rejected the next turn with:2. Anthropic API —
display: "omitted"thinking blocksWhen thinking is configured with
display: "omitted", Claude returns blocks with an emptythinkingfield but a validsignature:{"type": "thinking", "thinking": "", "signature": "EosnCkYICxIMMb3..."}Reference: https://platform.claude.com/docs/en/build-with-claude/extended-thinking#redacted-thinking-blocks
These blocks were filtered out of
content_items(becausethinking == ""is falsy), meaningcontent_itemswas empty. The reconstruction guardand content_itemsthen short-circuited and skipped the entire block — even thoughencrypted_contentstill held the signature.Root cause (shared)
In
message_to_output_items, blocks were decomposed into two separate fields:reasoning_item.content— thinking text (filtered: skips empty and missingthinking)reasoning_item.encrypted_content— signatures joined by"\n"(skips missingsignature)In
items_to_messages, reconstruction requiredcontent_itemsto be non-empty as a guard. Any scenario wherecontent_itemswas empty (omitted thinking, redacted thinking) caused the entire block list to be silently dropped.The Anthropic API requirement is clear: thinking and redacted_thinking blocks must be passed back verbatim in multi-turn conversations. Any modification or omission is rejected.
Fix
Serialise the complete block list as JSON in
encrypted_contentso every block type survives the round-trip unchanged:message_to_output_items: replace per-field extraction withjson.dumps(blocks_as_dicts). Still populatescontentwith visible thinking text for display/summary purposes.items_to_messages: tryjson.loads(encrypted_content)first (new format, handles all block types). Fall back to the legacy"\n"-joined signatures format so any in-flight sessions are not broken. Replaceand content_itemsguard withand encrypted_content— omitted/redacted blocks have no content items but do have encrypted_content.Tests
encrypted_content == "sig1\nsig2"to verify the JSON round-trip instead.test_redacted_thinking_blocks_preserved_across_turns: pureredacted_thinkingblock survivesmessage_to_output_items→items_to_messagesverbatim.test_mixed_thinking_and_redacted_thinking_blocks_preserved: mixedthinking+redacted_thinkingblocks are all preserved in original order.All 44 tests in the affected files pass.