Skip to content

Commit fc5bfe9

Browse files
authored
fix(extractor): gate member_expression callback args on callee allowlist (#974)
* fix(extractor): gate member_expression callback args on callee allowlist PR #947's extractCallbackReferenceCalls emitted a dynamic call edge for every identifier or member_expression argument of every call expression. That was correct for real callbacks (router.use, promise.then, items.map) but produced false positives for plain property reads passed as data, e.g. `store.set(user.id, user)` — the `user.id` arg wrongly resolved to `User.id@types.ts`, dropping TS resolution precision from 100% → 93.8% and tripping the regression guard. Fix: gate member_expression args on a CALLBACK_ACCEPTING_CALLEES allowlist covering router/middleware, promises, array methods, event emitters, and scheduling APIs. Identifier args remain unchanged. Adds positive and negative tests; updates the 3.9.4 resolution benchmark to reflect restored precision. Fixes #971 Impact: 2 functions changed, 7 affected * fix(extractor): require string-literal path for HTTP-verb callback gating (#974) Addresses Greptile review feedback on PR #974: - HTTP-verb callees (get/post/put/delete/patch/options/head/all) double as Map/cache/repository method names. Require a string-literal first argument (Express route path) for member-expr args to be emitted as dynamic calls, so `cache.get(user.id)` and `repo.put(record.key, value)` no longer leak `id`/`key` as false-positive dynamic calls while `router.get('/path', h)` still works. - Document that optional-chaining callees (`obj?.on(handlers.fn)`) are handled transparently: tree-sitter-javascript/typescript represent them as `member_expression` with an `optional_chain` child, so the existing extraction returns the property name correctly. Add a regression test. - Tests: three new cases in `tests/parsers/javascript.test.ts`: - negative: `cache.get(user.id)`, `repo.put(record.key, ...)`, `map.delete(entry.id)` - positive: `router.get('/path', auth.check)`, `app.post(\`/api\`, handlers.create)` - optional-chaining: `emitter?.on('tick', handlers.fn)` still emits All JS parser, regression-guard, and TS/JS resolution benchmarks stay green (TS precision 1.0, JS precision 1.0). Impact: 2 functions changed, 7 affected
1 parent f6f5482 commit fc5bfe9

3 files changed

Lines changed: 207 additions & 4 deletions

File tree

generated/benchmarks/BUILD-BENCHMARKS.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1134,12 +1134,12 @@ pre-parse that previously added ~388ms on native builds.
11341134
}
11351135
},
11361136
"typescript": {
1137-
"precision": 0.938,
1137+
"precision": 1,
11381138
"recall": 0.75,
11391139
"truePositives": 15,
1140-
"falsePositives": 1,
1140+
"falsePositives": 0,
11411141
"falseNegatives": 5,
1142-
"totalResolved": 16,
1142+
"totalResolved": 15,
11431143
"totalExpected": 20,
11441144
"byMode": {
11451145
"same-file": {

src/extractors/javascript.ts

Lines changed: 144 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1277,16 +1277,159 @@ function extractSubscriptCallInfo(fn: TreeSitterNode, callNode: TreeSitterNode):
12771277
return null;
12781278
}
12791279

1280+
/**
1281+
* Callee names that idiomatically accept callback references. Used to gate
1282+
* member_expression args in {@link extractCallbackReferenceCalls}: arguments
1283+
* like `user.id` are only emitted as dynamic callback calls when the callee
1284+
* is a known callback-accepting API (router/middleware, promises, array
1285+
* methods, event emitters, scheduling APIs). This avoids false positives
1286+
* from plain property reads passed as data, e.g. `store.set(user.id, user)`.
1287+
*
1288+
* Identifier args (e.g. `router.use(handleToken)`) are always emitted — the
1289+
* collateral damage of dropping them is larger than the FP risk, since plain
1290+
* identifier data args rarely collide with real function names.
1291+
*/
1292+
const CALLBACK_ACCEPTING_CALLEES: ReadonlySet<string> = new Set([
1293+
// Express / router / middleware
1294+
'use',
1295+
'get',
1296+
'post',
1297+
'put',
1298+
'delete',
1299+
'patch',
1300+
'options',
1301+
'head',
1302+
'all',
1303+
// Promises
1304+
'then',
1305+
'catch',
1306+
'finally',
1307+
// Array iteration / reduction
1308+
'map',
1309+
'filter',
1310+
'forEach',
1311+
'find',
1312+
'findIndex',
1313+
'findLast',
1314+
'findLastIndex',
1315+
'some',
1316+
'every',
1317+
'reduce',
1318+
'reduceRight',
1319+
'flatMap',
1320+
'sort',
1321+
// Event emitters / DOM
1322+
'on',
1323+
'once',
1324+
'off',
1325+
'addListener',
1326+
'removeListener',
1327+
'addEventListener',
1328+
'removeEventListener',
1329+
'subscribe',
1330+
'unsubscribe',
1331+
// Scheduling / plain function callbacks
1332+
'setTimeout',
1333+
'setInterval',
1334+
'setImmediate',
1335+
'queueMicrotask',
1336+
'requestAnimationFrame',
1337+
'requestIdleCallback',
1338+
'nextTick',
1339+
// Commander / yargs / hooks
1340+
'action',
1341+
'command',
1342+
]);
1343+
1344+
/**
1345+
* HTTP-verb callees that double as Map/cache/repository method names (`get`,
1346+
* `post`, `put`, `delete`, `patch`, `options`, `head`, `all`). Express/router
1347+
* invocations always take a string-literal route path as the first argument
1348+
* (`app.get('/path', handler)`), whereas Map-like APIs pass values/keys
1349+
* (`cache.get(user.id)`). Requiring a string-literal first arg keeps real
1350+
* route handlers covered while dropping the Map/cache false-positive surface.
1351+
*
1352+
* `use` and `all` without a path are legitimate middleware registrations, so
1353+
* `use` is intentionally excluded here — it stays in the general allowlist.
1354+
*/
1355+
const HTTP_VERB_CALLEES: ReadonlySet<string> = new Set([
1356+
'get',
1357+
'post',
1358+
'put',
1359+
'delete',
1360+
'patch',
1361+
'options',
1362+
'head',
1363+
'all',
1364+
]);
1365+
1366+
/**
1367+
* Extract the callee's final name (function identifier or member expression
1368+
* property) for callback-eligibility filtering. Returns null if the callee
1369+
* shape is not analyzable (e.g. computed subscripts, IIFEs).
1370+
*
1371+
* Optional-chaining (`obj?.method(...)`) is handled transparently: in both
1372+
* tree-sitter-javascript and tree-sitter-typescript grammars `obj?.method` is
1373+
* still a `member_expression` (the `?.` appears as an `optional_chain` child),
1374+
* so the property extraction below returns `method` as expected.
1375+
*/
1376+
function extractCalleeName(callNode: TreeSitterNode): string | null {
1377+
const fn = callNode.childForFieldName('function');
1378+
if (!fn) return null;
1379+
if (fn.type === 'identifier') return fn.text;
1380+
if (fn.type === 'member_expression') {
1381+
const prop = fn.childForFieldName('property');
1382+
return prop ? prop.text : null;
1383+
}
1384+
return null;
1385+
}
1386+
1387+
/**
1388+
* True iff the first argument of an arguments node is a string literal.
1389+
* Used to distinguish Express/router route handlers (`app.get('/path', h)`)
1390+
* from Map/cache APIs that reuse the same verb names (`cache.get(user.id)`).
1391+
*/
1392+
function firstArgIsStringLiteral(argsNode: TreeSitterNode): boolean {
1393+
for (let i = 0; i < argsNode.childCount; i++) {
1394+
const child = argsNode.child(i);
1395+
if (!child) continue;
1396+
// Skip parens and commas; the first non-punctuation child is the first arg.
1397+
if (child.type === '(' || child.type === ',' || child.type === ')') continue;
1398+
return child.type === 'string' || child.type === 'template_string';
1399+
}
1400+
return false;
1401+
}
1402+
12801403
/**
12811404
* Extract Call entries for named function references passed as arguments.
12821405
* e.g. `router.use(handleToken, checkAuth)` yields calls to handleToken and checkAuth.
12831406
* `app.use(auth.validate)` yields a call to validate with receiver auth.
12841407
* Skips literals, objects, arrays, anonymous functions, and call expressions (already handled).
1408+
*
1409+
* To avoid false positives where plain property reads are passed as data
1410+
* (e.g. `store.set(user.id, user)` — `user.id` is a value, not a callback),
1411+
* member_expression args are only emitted when the callee is in
1412+
* {@link CALLBACK_ACCEPTING_CALLEES}. Identifier args are always emitted.
1413+
*
1414+
* HTTP-verb callees (`get`, `post`, `put`, `delete`, `patch`, `options`,
1415+
* `head`, `all`) double as Map/cache/repository method names, so their
1416+
* member-expr args are only emitted when the first argument is a string
1417+
* literal route path — matching Express/router shape and skipping
1418+
* `cache.get(user.id)`-style calls.
12851419
*/
12861420
function extractCallbackReferenceCalls(callNode: TreeSitterNode): Call[] {
12871421
const args = callNode.childForFieldName('arguments') || findChild(callNode, 'arguments');
12881422
if (!args) return [];
12891423

1424+
const calleeName = extractCalleeName(callNode);
1425+
let memberExprArgsAllowed = calleeName !== null && CALLBACK_ACCEPTING_CALLEES.has(calleeName);
1426+
if (memberExprArgsAllowed && calleeName !== null && HTTP_VERB_CALLEES.has(calleeName)) {
1427+
// HTTP verbs require a string-literal route path to be treated as a
1428+
// callback-accepting API; otherwise `cache.get(user.id)` etc. would
1429+
// still emit `id` as a dynamic call.
1430+
memberExprArgsAllowed = firstArgIsStringLiteral(args);
1431+
}
1432+
12901433
const result: Call[] = [];
12911434
const callLine = callNode.startPosition.row + 1;
12921435

@@ -1296,7 +1439,7 @@ function extractCallbackReferenceCalls(callNode: TreeSitterNode): Call[] {
12961439

12971440
if (child.type === 'identifier') {
12981441
result.push({ name: child.text, line: callLine, dynamic: true });
1299-
} else if (child.type === 'member_expression') {
1442+
} else if (child.type === 'member_expression' && memberExprArgsAllowed) {
13001443
const prop = child.childForFieldName('property');
13011444
const obj = child.childForFieldName('object');
13021445
if (prop) {

tests/parsers/javascript.test.ts

Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -321,6 +321,66 @@ describe('JavaScript parser', () => {
321321
expect(dynamicCalls).toHaveLength(0);
322322
});
323323

324+
it('does not treat member_expression args as callbacks for non-allowlisted callees', () => {
325+
// `store.set(user.id, user)` — `user.id` is a property read passed as a
326+
// value (map key), NOT a callback. Only allowlisted callees (use, then,
327+
// map, addEventListener, etc.) get member_expression args emitted as
328+
// dynamic calls. See issue #971.
329+
const symbols = parseJS(`store.set(user.id, user);`);
330+
const dynamicMemberCalls = symbols.calls.filter((c) => c.dynamic && c.name === 'id');
331+
expect(dynamicMemberCalls).toHaveLength(0);
332+
});
333+
334+
it('still emits member_expression args for allowlisted callees (regression guard)', () => {
335+
// Positive companion to the test above: `app.use(auth.validate)` and
336+
// `promise.then(handlers.onSuccess)` must still produce dynamic calls,
337+
// because `use` and `then` are callback-accepting APIs.
338+
const useSymbols = parseJS(`app.use(auth.validate);`);
339+
expect(useSymbols.calls).toContainEqual(
340+
expect.objectContaining({ name: 'validate', receiver: 'auth', dynamic: true }),
341+
);
342+
const thenSymbols = parseJS(`promise.then(handlers.onSuccess);`);
343+
expect(thenSymbols.calls).toContainEqual(
344+
expect.objectContaining({ name: 'onSuccess', receiver: 'handlers', dynamic: true }),
345+
);
346+
});
347+
348+
it('does not treat cache/Map .get/.put as callback-accepting (HTTP-verb guard)', () => {
349+
// `cache.get(user.id)` shares the verb name `get` with Express routes,
350+
// but has no string-literal route path first arg — so member-expr args
351+
// must not be emitted as dynamic calls. Same for `repo.put`, `map.delete`.
352+
const cacheSymbols = parseJS(`cache.get(user.id);`);
353+
expect(cacheSymbols.calls.filter((c) => c.dynamic && c.name === 'id')).toHaveLength(0);
354+
const repoSymbols = parseJS(`repo.put(record.key, value);`);
355+
expect(repoSymbols.calls.filter((c) => c.dynamic && c.name === 'key')).toHaveLength(0);
356+
const mapSymbols = parseJS(`map.delete(entry.id);`);
357+
expect(mapSymbols.calls.filter((c) => c.dynamic && c.name === 'id')).toHaveLength(0);
358+
});
359+
360+
it('still emits member-expr args for Express HTTP routes with string path', () => {
361+
// Positive regression guard: HTTP-verb calls with a string-literal
362+
// first arg (Express route signature) must still emit member-expr args.
363+
const routerSymbols = parseJS(`router.get('/users/:id', auth.check);`);
364+
expect(routerSymbols.calls).toContainEqual(
365+
expect.objectContaining({ name: 'check', receiver: 'auth', dynamic: true }),
366+
);
367+
const templateSymbols = parseJS('app.post(`/api`, handlers.create);');
368+
expect(templateSymbols.calls).toContainEqual(
369+
expect.objectContaining({ name: 'create', receiver: 'handlers', dynamic: true }),
370+
);
371+
});
372+
373+
it('handles optional-chaining callees in allowlist (obj?.on)', () => {
374+
// `obj?.on(event, handler.fn)` — tree-sitter-javascript/typescript
375+
// represent `obj?.on` as a `member_expression` with an `optional_chain`
376+
// child, so `extractCalleeName` still returns `on` and the allowlist
377+
// gate works. Guards against a previously-flagged false-negative class.
378+
const symbols = parseJS(`emitter?.on('tick', handlers.fn);`);
379+
expect(symbols.calls).toContainEqual(
380+
expect.objectContaining({ name: 'fn', receiver: 'handlers', dynamic: true }),
381+
);
382+
});
383+
324384
it('extracts callback in plain function calls like setTimeout', () => {
325385
const symbols = parseJS(`setTimeout(tick, 1000);`);
326386
expect(symbols.calls).toContainEqual(

0 commit comments

Comments
 (0)