You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fix(benchmarks): make all comparator lanes cross-platform on Windows (#97) (#97)
* fix(benchmarks): make all comparator lanes cross-platform on Windows
All five comparator adapters in scripts/benchmark-comparators.mjs were
setup_failed on Windows 11 due to Unix-only shell constructs. This fixes
the root causes per-lane so EVAL-02 (real benchmark data) is achievable.
Changes by lane:
- raw Claude Code: drop `2>/dev/null` from checkInstalled, switch
runRawClaudeCode() from execAsync (shell, brittle quoting) to
execFileAsync (no shell), add `--output-format json`, raise timeout
60s→120s, change pending_evidence fallback to hard setup_failed
- codebase-memory-mcp: replace `which ... 2>/dev/null || npx` with
npx-only check (cross-platform), raise initTimeout 5s→10s
- jCodeMunch: replace hardcoded python3 with pythonCmd (python on
Windows), use `python -m pip install`, raise initTimeout 8s→15s
- CodeGraphContext: same pythonCmd consistency fix as jCodeMunch
- GrepAI: replace `which grepai 2>/dev/null` with `grepai --version`
Adds execFile import + execFileAsync, adds pythonCmd platform constant.
* fix(benchmarks): resolve execFileAsync .cmd issue and drop dead exec import
On Windows, execFile does not use a shell and cannot resolve npm's .cmd
wrappers (e.g. claude.cmd). checkInstalled() succeeded via execSync (which
goes through cmd.exe) but runRawClaudeCode threw ENOENT, returning
setup_failed on the very platform the previous commit targeted.
Add shell: process.platform === 'win32' to the execFileAsync call so
cmd.exe is used on Windows (resolves .cmd) while POSIX keeps shell: false
(no injection risk, args are already an array).
Also removes the dead exec / execAsync imports left over from the
shell-interpolated execAsync refactor.
'raw Claude Code baseline requires the Claude Code CLI (claude) to be installed and authenticated. '+
273
-
'This is the manual-log-capture baseline — record as pending_evidence if claude CLI is unavailable.'
271
+
'raw Claude Code baseline requires the claude CLI. Install: npm install -g @anthropic-ai/claude-code'
274
272
);
275
273
},
276
274
// raw Claude Code is not an MCP server; handled separately via claude -p
@@ -411,11 +409,17 @@ async function runRawClaudeCode(rootPath, tasks) {
411
409
412
410
try{
413
411
constprompt=`You are exploring a codebase at ${path.resolve(rootPath)}. Answer this question using only grep, glob, and read file operations: ${task.prompt}`;
0 commit comments