test: harden MCP shell approval continuation by starr-openai · Pull Request #18038 · openai/codex

starr-openai · 2026-04-16T01:00:10Z

Summary

keep the shared mock responses server strict by default
add a relaxed mock helper only for the shell approval test with explicit request-count assertions
assert the continuation request contains structured shell function output before expecting the final response

What was wrong

The affected MCP test covers shell-command approval continuation.

The intended behavior is:

MCP invokes a shell command.
The shell command requires approval/elicitation.
After approval, Codex runs the command.
Codex sends structured function_call_output back to the model.
The model continues to the final response.

The old test tried to accommodate this multi-request flow by relaxing mock response-server request-count expectations too broadly. That avoided some timing failures, but it weakened the harness in places where strict request-count checking is valuable.

Why it was flaky

The approval flow legitimately spans more than one model request. A test that expects only one request, or that does not explicitly wait for the continuation request, can race the real protocol.

Depending on timing, it could assert before command output was posted back to the model, miss the second request, or fail because a legitimate extra request arrived. Broadly relaxing request counting made this less noisy but also made it easier to miss real regressions.

How this fixes it

The change models the real multi-request protocol explicitly and keeps the relaxation local to this one test.

The shared create_mock_responses_server is strict by default again.
A narrowly named create_mock_responses_server_without_expected_count helper is used only where the approval flow needs it.
The test asserts the shell side effect occurred.
The test waits for the expected two model requests.
It asserts the continuation request contains structured function_call_output for call1234 before expecting the final response.

This preserves strictness for the broader suite while making this test deterministic against the real approval-continuation behavior.

Validation

Remote Bazel before final review loop: //codex-rs/mcp-server:mcp-server-all-test --test_filter=suite::codex_tool::test_shell_command_approval_triggers_elicitation passed on codex-flaky-mcp-0415
Read-only subagent review loop converged with no remaining findings

Co-authored-by: Codex noreply@openai.com

Co-authored-by: Codex <noreply@openai.com>

test: harden MCP shell approval continuation

21f7653

Co-authored-by: Codex <noreply@openai.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: harden MCP shell approval continuation#18038

test: harden MCP shell approval continuation#18038
starr-openai wants to merge 1 commit intomainfrom
starr/flaky-mcp-approval-20260415

starr-openai commented Apr 16, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

starr-openai commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What was wrong

Why it was flaky

How this fixes it

Validation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

starr-openai commented Apr 16, 2026 •

edited

Loading