Skip to content

Commit b0aecda

Browse files
authored
fix(wasm): isolate tree-sitter parsing in worker thread (#965) (#975)
* fix(wasm): isolate tree-sitter parsing in worker thread (#965) The WASM grammar can trigger uncatchable V8 fatal errors that kill whichever thread runs it. Running parses in a worker_threads worker means the crash kills only the worker — the pool detects the exit, marks the in-flight file as skipped, respawns the worker, and continues with the rest of the build. - wasm-worker-entry.ts: worker entry that owns all tree-sitter WASM calls; runs extractor + AST visitor walk before tree.delete() so the main thread never holds a live Tree - wasm-worker-pool.ts: single-worker pool with crash-respawn logic - wasm-worker-protocol.ts: typed message protocol with serializable ExtractorOutput (no _tree field) - parser.ts: route WASM parses through the worker pool - tests/parsers/wasm-hardening.test.ts: coverage for skip-and-continue behavior on extractor errors docs check acknowledged: internal plumbing only — no new CLI surface, no new language support, no architectural tier changes visible to users Impact: 80 functions changed, 30 affected * fix(wasm): resolve worker entry from dist, add timeout, fix tests (#975) The worker pool previously tried to load the worker entry from src/*.ts under Node's strip-types runtime, but Node's worker_threads loader does not rewrite .js import specifiers to .ts — so the worker exited with "module not found" for every file, breaking all WASM engine tests and the engine-parity CI jobs. Changes: - wasm-worker-pool.ts: resolveWorkerEntry now always points at the compiled .js. If the sibling .js exists (dist build) use it; otherwise walk up to dist/domain/wasm-worker-entry.js (tests/dev from src/). Throws a clear error instead of silently crashing the worker. - wasm-worker-pool.ts: add per-job timeout (60s) so a hung WASM grammar (infinite loop instead of crash) can no longer stall the whole build; remove dead `workerReady` field and its no-op reset in onExit. - wasm-worker-entry.ts: add a gated test-crash hook (CODEGRAPH_WASM_WORKER_TEST_CRASH=1 + magic token in code) so we can unit-test pool crash recovery without a real V8 abort. Remove dead `shutdown` message handler — the pool uses Worker.terminate() directly. - wasm-worker-protocol.ts: drop unused WorkerShutdownRequest. - tests/parsers/wasm-hardening.test.ts: rewrite. The old tests mutated jsEntry.extractor on the main-thread registry, but the worker has its own independent registry in a separate V8 isolate, so the mock was never exercised. New tests use the worker-side crash hook to verify single-file skip+warn, batch skip-and-continue, and repeated-crash recovery with worker respawn. Impact: 10 functions changed, 10 affected * fix(wasm): dispose parser pool on CLI shutdown to prevent hang (#975) Impact: 1 functions changed, 1 affected * fix: guard onExit against stale timed-out job in wasm worker pool (#975) Impact: 5 functions changed, 9 affected
1 parent 18dc944 commit b0aecda

6 files changed

Lines changed: 1441 additions & 80 deletions

File tree

src/cli.ts

Lines changed: 31 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,37 @@
11
#!/usr/bin/env node
22

33
import { run } from './cli/index.js';
4+
import { disposeParsers } from './domain/parser.js';
45
import { CodegraphError, toErrorMessage } from './shared/errors.js';
56

6-
run().catch((err: unknown) => {
7-
if (err instanceof CodegraphError) {
8-
console.error(`codegraph [${err.code}]: ${err.message}`);
9-
if (err.file) console.error(` file: ${err.file}`);
10-
} else {
11-
console.error(`codegraph: fatal error — ${toErrorMessage(err)}`);
7+
/**
8+
* After the CLI command finishes, tear down any cached WASM parsers and the
9+
* worker thread pool. The WASM parse worker (see `domain/wasm-worker-pool.ts`)
10+
* keeps the event loop alive until `worker.terminate()` is called, so without
11+
* this teardown short-lived commands like `codegraph build` would hang for
12+
* minutes before Node gives up — surfacing in CI as `spawnSync ETIMEDOUT`
13+
* even though the command's work is already complete.
14+
*
15+
* `disposeParsers` is safe to call when the pool was never instantiated
16+
* (e.g. native engine, or commands that never parse): it no-ops cleanly.
17+
*/
18+
async function shutdown(): Promise<void> {
19+
try {
20+
await disposeParsers();
21+
} catch {
22+
/* don't mask the real exit status over a teardown failure */
1223
}
13-
process.exit(1);
14-
});
24+
}
25+
26+
run()
27+
.then(shutdown)
28+
.catch(async (err: unknown) => {
29+
if (err instanceof CodegraphError) {
30+
console.error(`codegraph [${err.code}]: ${err.message}`);
31+
if (err.file) console.error(` file: ${err.file}`);
32+
} else {
33+
console.error(`codegraph: fatal error — ${toErrorMessage(err)}`);
34+
}
35+
await shutdown();
36+
process.exit(1);
37+
});

src/domain/parser.ts

Lines changed: 115 additions & 72 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,24 @@ import type {
1313
LanguageRegistryEntry,
1414
TypeMapEntry,
1515
} from '../types.js';
16+
import { disposeWasmWorkerPool, getWasmWorkerPool } from './wasm-worker-pool.js';
17+
import type { WorkerAnalysisOpts } from './wasm-worker-protocol.js';
18+
19+
/** Default worker opts: run all analyses so output matches parseFilesFull. */
20+
const FULL_ANALYSIS: WorkerAnalysisOpts = {
21+
ast: true,
22+
complexity: true,
23+
cfg: true,
24+
dataflow: true,
25+
};
26+
27+
/** Extract-only opts: skip visitor walk for typeMap backfill / similar fast paths. */
28+
const EXTRACT_ONLY: WorkerAnalysisOpts = {
29+
ast: false,
30+
complexity: false,
31+
cfg: false,
32+
dataflow: false,
33+
};
1634

1735
// Re-export all extractors for backward compatibility
1836
export {
@@ -262,7 +280,7 @@ function disposeMapEntries(entries: Iterable<[string, any]>, label: string): voi
262280
}
263281
}
264282

265-
export function disposeParsers(): void {
283+
export async function disposeParsers(): Promise<void> {
266284
if (_cachedParsers) {
267285
disposeMapEntries(_cachedParsers, 'parser');
268286
_cachedParsers = null;
@@ -276,6 +294,7 @@ export function disposeParsers(): void {
276294
_initialized = false;
277295
_allParsersLoaded = false;
278296
_loadingPromises.clear();
297+
await disposeWasmWorkerPool();
279298
}
280299

281300
export function getParser(parsers: Map<string, Parser | null>, filePath: string): Parser | null {
@@ -286,45 +305,79 @@ export function getParser(parsers: Map<string, Parser | null>, filePath: string)
286305
}
287306

288307
/**
289-
* Pre-parse files missing `_tree` via WASM so downstream phases (CFG, dataflow)
290-
* don't each need to create parsers and re-parse independently.
291-
* Only parses files whose extension is in SUPPORTED_EXTENSIONS.
308+
* Backfill missing AST-analysis data (astNodes, dataflow, def.complexity,
309+
* def.cfg) via the WASM worker pool for files that were parsed by the native
310+
* engine but are missing one or more analyses.
311+
*
312+
* Historically this function populated `symbols._tree` so the main-thread
313+
* visitor walk in `ast-analysis/engine.ts` could run. After the worker-isolation
314+
* refactor (#965), the worker runs every visitor itself and returns pre-computed
315+
* analysis data — `_tree` is never set on the main thread.
316+
*
317+
* Name is preserved for caller compatibility; the function now ensures
318+
* *analysis data* rather than *trees*.
292319
*/
293320
export async function ensureWasmTrees(
294321
fileSymbols: Map<string, any>,
295322
rootDir: string,
296323
): Promise<void> {
297-
// Single pass: collect absolute paths for files that need parsing
298-
const filePaths: string[] = [];
324+
// Collect files that still need analysis data and are parseable by WASM.
325+
const pending: Array<{ relPath: string; absPath: string; symbols: any }> = [];
299326
for (const [relPath, symbols] of fileSymbols) {
300-
if (!symbols._tree && _extToLang.has(path.extname(relPath).toLowerCase())) {
301-
filePaths.push(path.join(rootDir, relPath));
302-
}
327+
if (symbols._tree) continue; // legacy path — leave existing trees alone
328+
if (!_extToLang.has(path.extname(relPath).toLowerCase())) continue;
329+
pending.push({ relPath, absPath: path.join(rootDir, relPath), symbols });
303330
}
304-
if (filePaths.length === 0) return;
305-
const parsers = await ensureParsersForFiles(filePaths);
331+
if (pending.length === 0) return;
306332

307-
for (const [relPath, symbols] of fileSymbols) {
308-
if (symbols._tree) continue;
309-
const ext = path.extname(relPath).toLowerCase();
310-
const entry = _extToLang.get(ext);
311-
if (!entry) continue;
312-
const parser = parsers.get(entry.id);
313-
if (!parser) continue;
314-
315-
const absPath = path.join(rootDir, relPath);
333+
const pool = getWasmWorkerPool();
334+
for (const { relPath, absPath, symbols } of pending) {
316335
let code: string;
317336
try {
318337
code = fs.readFileSync(absPath, 'utf-8');
319338
} catch (e: unknown) {
320339
debug(`ensureWasmTrees: cannot read ${relPath}: ${(e as Error).message}`);
321340
continue;
322341
}
323-
try {
324-
symbols._tree = parser.parse(code);
325-
symbols._langId = entry.id;
326-
} catch (e: unknown) {
327-
debug(`ensureWasmTrees: parse failed for ${relPath}: ${(e as Error).message}`);
342+
const output = await pool.parse(absPath, code, FULL_ANALYSIS);
343+
if (!output) continue; // worker crashed or returned null — skip silently
344+
mergeAnalysisData(symbols, output);
345+
}
346+
}
347+
348+
/**
349+
* Merge pre-computed analysis data from a worker result onto existing symbols.
350+
* Only fills gaps — never overwrites fields the caller already populated.
351+
* Used to patch native-parsed symbols with worker-produced astNodes / dataflow /
352+
* per-definition complexity and cfg.
353+
*/
354+
function mergeAnalysisData(symbols: any, worker: ExtractorOutput): void {
355+
if (!symbols._langId && worker._langId) symbols._langId = worker._langId;
356+
if (!symbols._lineCount && worker._lineCount) symbols._lineCount = worker._lineCount;
357+
if (!Array.isArray(symbols.astNodes) && Array.isArray(worker.astNodes)) {
358+
symbols.astNodes = worker.astNodes;
359+
}
360+
if (!symbols.dataflow && worker.dataflow) symbols.dataflow = worker.dataflow;
361+
if (worker.typeMap && worker.typeMap.size > 0) {
362+
if (!symbols.typeMap || !(symbols.typeMap instanceof Map)) {
363+
symbols.typeMap = new Map(worker.typeMap);
364+
} else {
365+
for (const [k, v] of worker.typeMap) {
366+
if (!symbols.typeMap.has(k)) symbols.typeMap.set(k, v);
367+
}
368+
}
369+
}
370+
const existingDefs: any[] = Array.isArray(symbols.definitions) ? symbols.definitions : [];
371+
const workerDefs: any[] = Array.isArray(worker.definitions) ? worker.definitions : [];
372+
// Index existing defs by (kind, name, line) — mirrors engine.ts matching key.
373+
const byKey = new Map<string, any>();
374+
for (const d of existingDefs) byKey.set(`${d.kind}|${d.name}|${d.line}`, d);
375+
for (const wd of workerDefs) {
376+
const existing = byKey.get(`${wd.kind}|${wd.name}|${wd.line}`);
377+
if (!existing) continue;
378+
if (!existing.complexity && wd.complexity) existing.complexity = wd.complexity;
379+
if ((!existing.cfg || !Array.isArray(existing.cfg.blocks)) && wd.cfg?.blocks) {
380+
existing.cfg = wd.cfg;
328381
}
329382
}
330383
}
@@ -742,23 +795,13 @@ async function backfillTypeMap(
742795
return { typeMap: new Map(), backfilled: false };
743796
}
744797
}
745-
const parsers = await ensureParsersForFiles([filePath]);
746-
const extracted = wasmExtractSymbols(parsers, filePath, code);
747-
try {
748-
if (!extracted || extracted.symbols.typeMap.size === 0) {
749-
return { typeMap: new Map(), backfilled: false };
750-
}
751-
return { typeMap: extracted.symbols.typeMap, backfilled: true };
752-
} finally {
753-
// Free the WASM tree to prevent memory accumulation across repeated builds
754-
if (extracted?.tree && typeof extracted.tree.delete === 'function') {
755-
try {
756-
extracted.tree.delete();
757-
} catch (e) {
758-
debug(`backfillTypeMap: WASM tree cleanup failed: ${toErrorMessage(e)}`);
759-
}
760-
}
798+
const pool = getWasmWorkerPool();
799+
// Extract-only — no visitor walk, we only need the typeMap from this pass.
800+
const output = await pool.parse(filePath, code, EXTRACT_ONLY);
801+
if (!output || output.typeMap.size === 0) {
802+
return { typeMap: new Map(), backfilled: false };
761803
}
804+
return { typeMap: output.typeMap, backfilled: true };
762805
}
763806

764807
/**
@@ -826,10 +869,9 @@ export async function parseFileAuto(
826869
return patched;
827870
}
828871

829-
// WASM path
830-
const parsers = await ensureParsersForFiles([filePath]);
831-
const extracted = wasmExtractSymbols(parsers, filePath, source);
832-
return extracted ? extracted.symbols : null;
872+
// WASM path — dispatch to isolated worker
873+
const pool = getWasmWorkerPool();
874+
return pool.parse(filePath, source, FULL_ANALYSIS);
833875
}
834876

835877
/** Backfill typeMap via WASM for TS/TSX files parsed by the native engine. */
@@ -842,54 +884,55 @@ async function backfillTypeMapBatch(
842884
);
843885
if (tsFiles.length === 0) return;
844886

845-
const parsers = await ensureParsersForFiles(tsFiles.map((f) => f.filePath));
887+
const pool = getWasmWorkerPool();
846888
for (const { filePath, relPath } of tsFiles) {
847-
let extracted: WasmExtractResult | null | undefined;
889+
let code: string;
848890
try {
849-
const code = fs.readFileSync(filePath, 'utf-8');
850-
extracted = wasmExtractSymbols(parsers, filePath, code);
851-
if (extracted?.symbols && extracted.symbols.typeMap.size > 0) {
852-
const symbols = result.get(relPath);
853-
if (!symbols) continue;
854-
symbols.typeMap = extracted.symbols.typeMap;
855-
symbols._typeMapBackfilled = true;
856-
}
891+
code = fs.readFileSync(filePath, 'utf-8');
857892
} catch (e) {
858-
debug(`batchExtract: typeMap backfill failed: ${toErrorMessage(e)}`);
859-
} finally {
860-
if (extracted?.tree && typeof extracted.tree.delete === 'function') {
861-
try {
862-
extracted.tree.delete();
863-
} catch (e) {
864-
debug(`batchExtract: WASM tree cleanup failed: ${toErrorMessage(e)}`);
865-
}
866-
}
893+
debug(`batchExtract: cannot read ${filePath}: ${toErrorMessage(e)}`);
894+
continue;
867895
}
896+
const output = await pool.parse(filePath, code, EXTRACT_ONLY);
897+
if (!output || output.typeMap.size === 0) continue;
898+
const symbols = result.get(relPath);
899+
if (!symbols) continue;
900+
symbols.typeMap = output.typeMap;
901+
symbols._typeMapBackfilled = true;
868902
}
869903
}
870904

871-
/** Parse files via WASM engine, returning a Map<relPath, symbols>. */
905+
/**
906+
* Parse files via WASM engine, returning a Map<relPath, symbols>.
907+
*
908+
* Each file is dispatched to the WASM worker pool. The worker parses, extracts,
909+
* and runs all AST analyses (complexity, CFG, dataflow, ast-store) in its own
910+
* thread, returning fully pre-computed ExtractorOutput. V8 fatal errors from
911+
* tree-sitter WASM (#965) kill only the worker — the pool skips the file and
912+
* restarts the worker for the next one.
913+
*
914+
* `_tree` is NEVER set by this path. All downstream analyses operate on the
915+
* pre-computed `astNodes` / `dataflow` / `def.complexity` / `def.cfg` fields.
916+
*/
872917
async function parseFilesWasm(
873918
filePaths: string[],
874919
rootDir: string,
875920
): Promise<Map<string, ExtractorOutput>> {
876921
const result = new Map<string, ExtractorOutput>();
877-
const parsers = await ensureParsersForFiles(filePaths);
922+
const pool = getWasmWorkerPool();
878923
for (const filePath of filePaths) {
924+
if (!_extToLang.has(path.extname(filePath).toLowerCase())) continue;
879925
let code: string;
880926
try {
881927
code = fs.readFileSync(filePath, 'utf-8');
882928
} catch (err: unknown) {
883929
warn(`Skipping ${path.relative(rootDir, filePath)}: ${(err as Error).message}`);
884930
continue;
885931
}
886-
const extracted = wasmExtractSymbols(parsers, filePath, code);
887-
if (extracted) {
932+
const output = await pool.parse(filePath, code, FULL_ANALYSIS);
933+
if (output) {
888934
const relPath = path.relative(rootDir, filePath).split(path.sep).join('/');
889-
extracted.symbols._tree = extracted.tree;
890-
extracted.symbols._langId = extracted.langId;
891-
extracted.symbols._lineCount = code.split('\n').length;
892-
result.set(relPath, extracted.symbols);
935+
result.set(relPath, output);
893936
}
894937
}
895938
return result;

0 commit comments

Comments
 (0)