Skip to content

fix: clean up benchmark MCP sessions#94

Merged
PatrickSys merged 2 commits intomasterfrom
fix/benchmark-lifecycle-cleanup
Apr 10, 2026
Merged

fix: clean up benchmark MCP sessions#94
PatrickSys merged 2 commits intomasterfrom
fix/benchmark-lifecycle-cleanup

Conversation

@PatrickSys
Copy link
Copy Markdown
Owner

Summary

  • add a managed stdio MCP session helper that guarantees child cleanup on timeout and failure paths
  • route benchmark comparator execution through the managed session wrapper
  • add regression fixtures/tests for connect-timeout and post-connect failure cleanup

Verification

  • pnpm run type-check
  • pnpm test -- tests/benchmark-comparators.test.ts tests/zombie-guard.test.ts
  • full pre-push test suite passed during git push

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 10, 2026

Greptile Summary

This PR introduces withManagedStdioClientSession — a lifecycle wrapper that guarantees child-process cleanup on connect timeout and callback failure — and routes the benchmark comparator through it, replacing ad-hoc client.close() calls. Two issues in the new helper are worth addressing before relying on it heavily: connectPromise is awaited after safeClose with no deadline, so a misbehaving MCP SDK or an unkillable child could cause the finally block to hang indefinitely; and the spawnNotification polling loop exits as soon as settling = true without a final pid check, creating a narrow race that can leave onSpawn uncalled — making the connect-timeout test flaky under load.

Confidence Score: 4/5

Safe to merge for most cases, but the unbounded connectPromise await in the finally block and the onSpawn race are worth hardening before widespread use.

All findings are P2 (no confirmed runtime breakage in normal conditions), but two of them describe reliability gaps in the resource-cleanup guarantee that is the primary purpose of this PR — a hang with no timeout and a race that can make tests flaky. That warrants a 4 rather than 5 to flag them for attention before merge.

scripts/lib/managed-mcp-session.mjs — the finally block cleanup sequence and spawnNotification loop both have edge cases that could undermine the zombie-guard guarantee.

Important Files Changed

Filename Overview
scripts/lib/managed-mcp-session.mjs New managed session wrapper; two concerns: connectPromise awaited without a timeout after cleanup (potential hang), and a settling race that can silently drop the onSpawn callback
scripts/benchmark-comparators.mjs Routes MCP execution through the new managed wrapper; dead-code guard startsWith('Failed to connect to') in catch handler is harmless but misleading
tests/benchmark-comparators.test.ts New regression tests for timeout and post-connect failure cleanup; timeout test could be flaky on slow systems if onSpawn race fires before pid is captured
tests/fixtures/mcp/echo-server.mjs Minimal MCP echo server fixture for post-connect failure test; straightforward and correct
tests/fixtures/mcp/hanging-server.mjs Single-line fixture that keeps the process alive indefinitely for the connect-timeout test

Sequence Diagram

sequenceDiagram
    participant Caller
    participant withManagedStdioClientSession
    participant MCP Client
    participant StdioClientTransport
    participant ChildProcess

    Caller->>withManagedStdioClientSession: call(options, callback)
    withManagedStdioClientSession->>StdioClientTransport: new StdioClientTransport(...)
    withManagedStdioClientSession->>MCP Client: new Client(clientInfo)
    withManagedStdioClientSession->>MCP Client: connect(transport) → connectPromise
    MCP Client->>StdioClientTransport: spawn child
    StdioClientTransport->>ChildProcess: spawn
    Note over withManagedStdioClientSession: spawnNotification polls transport.pid every 10ms

    alt Connect succeeds within timeout
        withManagedStdioClientSession->>Caller: callback({ client, transport })
        Caller-->>withManagedStdioClientSession: result or throw
    else Connect timeout
        withManagedStdioClientSession->>withManagedStdioClientSession: withTimeout rejects
    end

    Note over withManagedStdioClientSession: finally block runs
    withManagedStdioClientSession->>withManagedStdioClientSession: settling = true (stops poll loop)
    withManagedStdioClientSession->>MCP Client: client.close() (if connected)
    withManagedStdioClientSession->>StdioClientTransport: transport.close()
    StdioClientTransport->>ChildProcess: SIGTERM / kill
    ChildProcess-->>StdioClientTransport: exit
    withManagedStdioClientSession->>withManagedStdioClientSession: await spawnNotification
    withManagedStdioClientSession->>withManagedStdioClientSession: await connectPromise (no timeout ⚠️)
    withManagedStdioClientSession-->>Caller: return result or rethrow
Loading

Reviews (1): Last reviewed commit: "fix: clean up benchmark MCP sessions" | Re-trigger Greptile

Comment on lines +87 to +91
} finally {
settling = true;
await safeClose(client, transport, connected);
await spawnNotification.catch(() => undefined);
await connectPromise.catch(() => undefined);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 connectPromise awaited without a timeout after cleanup

After safeClose terminates the child process, await connectPromise.catch(() => undefined) is called with no deadline. If the MCP SDK's connect promise doesn't reject when the transport is force-closed (e.g., due to an SDK buffering bug or a process that ignores SIGTERM), this line will hang indefinitely and withManagedStdioClientSession will never return — defeating the resource-cleanup guarantee. A defensive timeout here would bound the worst-case hang:

  } finally {
    settling = true;
    await safeClose(client, transport, connected);
    await spawnNotification.catch(() => undefined);
    await Promise.race([
      connectPromise,
      new Promise((resolve) => setTimeout(resolve, 5_000))
    ]).catch(() => undefined);
  }

Comment on lines +66 to +81
let connected = false;
let settling = false;
const connectPromise = client.connect(transport);
const spawnNotification = (async () => {
if (typeof onSpawn !== 'function') {
return;
}

while (!settling) {
if (transport.pid !== null) {
onSpawn(transport.pid);
return;
}
await new Promise((resolve) => setTimeout(resolve, 10));
}
})();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 onSpawn can be silently dropped due to settling race

The polling loop exits the moment settling = true is set in the finally block. If the child process spawns during the 10 ms sleep that happens to span the settling = true assignment, the loop re-evaluates while (!settling), sees true, and exits without ever calling onSpawn — even though transport.pid is now populated. In the connect-timeout test this leaves pid === null, causing expect(pid).toBeTypeOf('number') to fail.

In practice, with a 200 ms timeout and a local process that spawns in < 50 ms, the window is very narrow. But under CI load or on slow machines the test can become flaky. A robust fix is to perform one final pid check after the loop exits:

const spawnNotification = (async () => {
  if (typeof onSpawn !== 'function') return;

  while (!settling) {
    if (transport.pid !== null) {
      onSpawn(transport.pid);
      return;
    }
    await new Promise((resolve) => setTimeout(resolve, 10));
  }
  // One final check: process may have spawned while we were asleep
  if (transport.pid !== null) {
    onSpawn(transport.pid);
  }
})();

Comment on lines +392 to +401
).catch((err) => {
if (err.message.startsWith('Failed to connect to')) {
throw err;
}

const elapsedMs = Date.now() - startMs;
const payloadBytes = countUtf8Bytes(payload);
const estimatedTokens = estimateTokens(payloadBytes);
const { usefulnessScore, matchedSignals, missingSignals } = matchSignals(
payload,
task.expectedSignals,
task.forbiddenSignals
throw new Error(
err.message.includes('timed out')
? `Failed to connect to ${adapter.name} MCP server: ${err.message}`
: err.message
);

taskResults.push({
taskId: task.id,
job: task.job,
surface: task.surface,
usefulnessScore,
matchedSignals,
missingSignals,
payloadBytes,
estimatedTokens,
toolCallCount,
elapsedMs
});
}

await client.close();
return taskResults;
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Dead-code branch in catch handler

err.message.startsWith('Failed to connect to') can never be true here. withManagedStdioClientSession either throws "MCP client connect timed out after Xs" (timeout path) or re-throws errors from the callback, none of which start with "Failed to connect to". The guard is harmless but misleading; removing it makes the intent clearer:

Suggested change
).catch((err) => {
if (err.message.startsWith('Failed to connect to')) {
throw err;
}
const elapsedMs = Date.now() - startMs;
const payloadBytes = countUtf8Bytes(payload);
const estimatedTokens = estimateTokens(payloadBytes);
const { usefulnessScore, matchedSignals, missingSignals } = matchSignals(
payload,
task.expectedSignals,
task.forbiddenSignals
throw new Error(
err.message.includes('timed out')
? `Failed to connect to ${adapter.name} MCP server: ${err.message}`
: err.message
);
taskResults.push({
taskId: task.id,
job: task.job,
surface: task.surface,
usefulnessScore,
matchedSignals,
missingSignals,
payloadBytes,
estimatedTokens,
toolCallCount,
elapsedMs
});
}
await client.close();
return taskResults;
});
}).catch((err) => {
throw new Error(
err.message.includes('timed out')
? `Failed to connect to ${adapter.name} MCP server: ${err.message}`
: err.message
);
});

@PatrickSys PatrickSys merged commit ad5db8b into master Apr 10, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant