fix: clean up benchmark MCP sessions#94
Conversation
Greptile SummaryThis PR introduces Confidence Score: 4/5Safe to merge for most cases, but the unbounded All findings are P2 (no confirmed runtime breakage in normal conditions), but two of them describe reliability gaps in the resource-cleanup guarantee that is the primary purpose of this PR — a hang with no timeout and a race that can make tests flaky. That warrants a 4 rather than 5 to flag them for attention before merge. scripts/lib/managed-mcp-session.mjs — the finally block cleanup sequence and spawnNotification loop both have edge cases that could undermine the zombie-guard guarantee.
|
| Filename | Overview |
|---|---|
| scripts/lib/managed-mcp-session.mjs | New managed session wrapper; two concerns: connectPromise awaited without a timeout after cleanup (potential hang), and a settling race that can silently drop the onSpawn callback |
| scripts/benchmark-comparators.mjs | Routes MCP execution through the new managed wrapper; dead-code guard startsWith('Failed to connect to') in catch handler is harmless but misleading |
| tests/benchmark-comparators.test.ts | New regression tests for timeout and post-connect failure cleanup; timeout test could be flaky on slow systems if onSpawn race fires before pid is captured |
| tests/fixtures/mcp/echo-server.mjs | Minimal MCP echo server fixture for post-connect failure test; straightforward and correct |
| tests/fixtures/mcp/hanging-server.mjs | Single-line fixture that keeps the process alive indefinitely for the connect-timeout test |
Sequence Diagram
sequenceDiagram
participant Caller
participant withManagedStdioClientSession
participant MCP Client
participant StdioClientTransport
participant ChildProcess
Caller->>withManagedStdioClientSession: call(options, callback)
withManagedStdioClientSession->>StdioClientTransport: new StdioClientTransport(...)
withManagedStdioClientSession->>MCP Client: new Client(clientInfo)
withManagedStdioClientSession->>MCP Client: connect(transport) → connectPromise
MCP Client->>StdioClientTransport: spawn child
StdioClientTransport->>ChildProcess: spawn
Note over withManagedStdioClientSession: spawnNotification polls transport.pid every 10ms
alt Connect succeeds within timeout
withManagedStdioClientSession->>Caller: callback({ client, transport })
Caller-->>withManagedStdioClientSession: result or throw
else Connect timeout
withManagedStdioClientSession->>withManagedStdioClientSession: withTimeout rejects
end
Note over withManagedStdioClientSession: finally block runs
withManagedStdioClientSession->>withManagedStdioClientSession: settling = true (stops poll loop)
withManagedStdioClientSession->>MCP Client: client.close() (if connected)
withManagedStdioClientSession->>StdioClientTransport: transport.close()
StdioClientTransport->>ChildProcess: SIGTERM / kill
ChildProcess-->>StdioClientTransport: exit
withManagedStdioClientSession->>withManagedStdioClientSession: await spawnNotification
withManagedStdioClientSession->>withManagedStdioClientSession: await connectPromise (no timeout ⚠️)
withManagedStdioClientSession-->>Caller: return result or rethrow
Reviews (1): Last reviewed commit: "fix: clean up benchmark MCP sessions" | Re-trigger Greptile
scripts/lib/managed-mcp-session.mjs
Outdated
| } finally { | ||
| settling = true; | ||
| await safeClose(client, transport, connected); | ||
| await spawnNotification.catch(() => undefined); | ||
| await connectPromise.catch(() => undefined); |
There was a problem hiding this comment.
connectPromise awaited without a timeout after cleanup
After safeClose terminates the child process, await connectPromise.catch(() => undefined) is called with no deadline. If the MCP SDK's connect promise doesn't reject when the transport is force-closed (e.g., due to an SDK buffering bug or a process that ignores SIGTERM), this line will hang indefinitely and withManagedStdioClientSession will never return — defeating the resource-cleanup guarantee. A defensive timeout here would bound the worst-case hang:
} finally {
settling = true;
await safeClose(client, transport, connected);
await spawnNotification.catch(() => undefined);
await Promise.race([
connectPromise,
new Promise((resolve) => setTimeout(resolve, 5_000))
]).catch(() => undefined);
}
| let connected = false; | ||
| let settling = false; | ||
| const connectPromise = client.connect(transport); | ||
| const spawnNotification = (async () => { | ||
| if (typeof onSpawn !== 'function') { | ||
| return; | ||
| } | ||
|
|
||
| while (!settling) { | ||
| if (transport.pid !== null) { | ||
| onSpawn(transport.pid); | ||
| return; | ||
| } | ||
| await new Promise((resolve) => setTimeout(resolve, 10)); | ||
| } | ||
| })(); |
There was a problem hiding this comment.
onSpawn can be silently dropped due to settling race
The polling loop exits the moment settling = true is set in the finally block. If the child process spawns during the 10 ms sleep that happens to span the settling = true assignment, the loop re-evaluates while (!settling), sees true, and exits without ever calling onSpawn — even though transport.pid is now populated. In the connect-timeout test this leaves pid === null, causing expect(pid).toBeTypeOf('number') to fail.
In practice, with a 200 ms timeout and a local process that spawns in < 50 ms, the window is very narrow. But under CI load or on slow machines the test can become flaky. A robust fix is to perform one final pid check after the loop exits:
const spawnNotification = (async () => {
if (typeof onSpawn !== 'function') return;
while (!settling) {
if (transport.pid !== null) {
onSpawn(transport.pid);
return;
}
await new Promise((resolve) => setTimeout(resolve, 10));
}
// One final check: process may have spawned while we were asleep
if (transport.pid !== null) {
onSpawn(transport.pid);
}
})();| ).catch((err) => { | ||
| if (err.message.startsWith('Failed to connect to')) { | ||
| throw err; | ||
| } | ||
|
|
||
| const elapsedMs = Date.now() - startMs; | ||
| const payloadBytes = countUtf8Bytes(payload); | ||
| const estimatedTokens = estimateTokens(payloadBytes); | ||
| const { usefulnessScore, matchedSignals, missingSignals } = matchSignals( | ||
| payload, | ||
| task.expectedSignals, | ||
| task.forbiddenSignals | ||
| throw new Error( | ||
| err.message.includes('timed out') | ||
| ? `Failed to connect to ${adapter.name} MCP server: ${err.message}` | ||
| : err.message | ||
| ); | ||
|
|
||
| taskResults.push({ | ||
| taskId: task.id, | ||
| job: task.job, | ||
| surface: task.surface, | ||
| usefulnessScore, | ||
| matchedSignals, | ||
| missingSignals, | ||
| payloadBytes, | ||
| estimatedTokens, | ||
| toolCallCount, | ||
| elapsedMs | ||
| }); | ||
| } | ||
|
|
||
| await client.close(); | ||
| return taskResults; | ||
| }); |
There was a problem hiding this comment.
Dead-code branch in catch handler
err.message.startsWith('Failed to connect to') can never be true here. withManagedStdioClientSession either throws "MCP client connect timed out after Xs" (timeout path) or re-throws errors from the callback, none of which start with "Failed to connect to". The guard is harmless but misleading; removing it makes the intent clearer:
| ).catch((err) => { | |
| if (err.message.startsWith('Failed to connect to')) { | |
| throw err; | |
| } | |
| const elapsedMs = Date.now() - startMs; | |
| const payloadBytes = countUtf8Bytes(payload); | |
| const estimatedTokens = estimateTokens(payloadBytes); | |
| const { usefulnessScore, matchedSignals, missingSignals } = matchSignals( | |
| payload, | |
| task.expectedSignals, | |
| task.forbiddenSignals | |
| throw new Error( | |
| err.message.includes('timed out') | |
| ? `Failed to connect to ${adapter.name} MCP server: ${err.message}` | |
| : err.message | |
| ); | |
| taskResults.push({ | |
| taskId: task.id, | |
| job: task.job, | |
| surface: task.surface, | |
| usefulnessScore, | |
| matchedSignals, | |
| missingSignals, | |
| payloadBytes, | |
| estimatedTokens, | |
| toolCallCount, | |
| elapsedMs | |
| }); | |
| } | |
| await client.close(); | |
| return taskResults; | |
| }); | |
| }).catch((err) => { | |
| throw new Error( | |
| err.message.includes('timed out') | |
| ? `Failed to connect to ${adapter.name} MCP server: ${err.message}` | |
| : err.message | |
| ); | |
| }); |
Summary
Verification