perf(core): add __perf__ scaffold + npm run bench + 3 calibrated assertions#53
perf(core): add __perf__ scaffold + npm run bench + 3 calibrated assertions#53antnewman wants to merge 3 commits into
Conversation
|
ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (2)
📝 WalkthroughWalkthroughAdds a perf benchmarking scaffold: root ChangesPerformance Regression Testing Infrastructure
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Possibly related issues
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/core/__perf__/_helpers.ts`:
- Around line 47-50: The current makeLinearChainSteps uses an unsafe cast (as
Record<string, Step>) to silence that LogicSpec.steps is optional; replace this
by fixing the types instead of casting: either update the LogicSpec definition
(or the specific return type of makeLinearChainSpec) to make steps non-optional,
or if you must keep LogicSpec.steps optional, use a non-null assertion when
accessing spec.steps in makeLinearChainSteps (spec.steps!) so the compiler sees
you intended a present value; change the type in LogicSpec or the
makeLinearChainSpec return type to reflect that steps is always populated, or
use spec.steps! in makeLinearChainSteps and remove the as Record<string, Step>
cast.
In `@packages/core/__perf__/compiler.perf.ts`:
- Around line 13-17: Update the "Threshold calibration methodology" comment
block so it matches the actual calibration multiplier used (×1.5) instead of the
stale ×1.25; locate the header titled "Threshold calibration methodology" in
packages/core/__perf__/compiler.perf.ts and change the step describing "Multiply
by 1.25 (Math.ceil) for slower-machine headroom." to mention "Multiply by 1.5"
(and update any parenthetical/math note accordingly) so the top-of-file
methodology no longer contradicts the later documentation that documents ×1.5.
In `@packages/core/__perf__/README.md`:
- Line 62: The README contains a stale multiplier in the "Adding a new bench"
guide: replace the phrase "lock the worst × 1.25" with the correct multiplier
"lock the worst × 1.5" so it matches the calibration methodology described
earlier (the "calibration methodology" section that specifies ×1.5); update any
nearby explanatory text or examples that reference ×1.25 to use ×1.5 to keep the
guide consistent.
In `@packages/core/vitest.perf.config.ts`:
- Around line 11-21: The test pool currently sets pool: "forks" but doesn't
enforce single-fork serialization; update the Vitest config created by
defineConfig so that the test section includes poolOptions with forks.singleFork
set to true (i.e., add poolOptions: { forks: { singleFork: true } }) so all
"**/__perf__/**/*.perf.ts" tests run in one fork and avoid parallel CPU
contention; ensure you modify the object under test in the exported default
config where pool is set.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: e8cc7d98-235d-4dbc-aee2-493e5c437bb9
📒 Files selected for processing (7)
package.jsonpackages/core/__perf__/README.mdpackages/core/__perf__/_helpers.tspackages/core/__perf__/compiler.perf.tspackages/core/__perf__/dag.perf.tspackages/core/__perf__/expression.perf.tspackages/core/vitest.perf.config.ts
| export function makeLinearChainSteps(n: number): Record<string, Step> { | ||
| const spec = makeLinearChainSpec(n); | ||
| return spec.steps as Record<string, Step>; | ||
| } |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Confirm the type of LogicSpec.steps in types.ts
rg -n "steps" --type ts -A 2 -B 2 packages/core/types.tsRepository: SingularityAI-Dev/logic-md
Length of output: 1661
🏁 Script executed:
#!/bin/bash
# Find makeLinearChainSpec implementation
rg -n "makeLinearChainSpec" --type ts -A 10 packages/core/__perf__/_helpers.ts | head -40Repository: SingularityAI-Dev/logic-md
Length of output: 969
Replace the as Record<string, Step> cast with a proper type fix.
The cast suppresses a real type mismatch: makeLinearChainSpec always populates steps, but TypeScript sees LogicSpec.steps as optional (Record<string, Step> | undefined). While the function logic guarantees steps are present, the type definition doesn't reflect this. Either make steps non-optional in LogicSpec (or in a dedicated return type from makeLinearChainSpec), or use a non-null assertion (spec.steps!) if the optional definition must remain. The silent cast hides a type safety gap.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@packages/core/__perf__/_helpers.ts` around lines 47 - 50, The current
makeLinearChainSteps uses an unsafe cast (as Record<string, Step>) to silence
that LogicSpec.steps is optional; replace this by fixing the types instead of
casting: either update the LogicSpec definition (or the specific return type of
makeLinearChainSpec) to make steps non-optional, or if you must keep
LogicSpec.steps optional, use a non-null assertion when accessing spec.steps in
makeLinearChainSteps (spec.steps!) so the compiler sees you intended a present
value; change the type in LogicSpec or the makeLinearChainSpec return type to
reflect that steps is always populated, or use spec.steps! in
makeLinearChainSteps and remove the as Record<string, Step> cast.
There was a problem hiding this comment.
3 issues found across 7 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="packages/core/vitest.perf.config.ts">
<violation number="1" location="packages/core/vitest.perf.config.ts:17">
P2: Benchmark config claims single-worker/serialized execution, but `pool: "forks"` alone does not prevent parallel workers, so perf timings can still be noisy.</violation>
</file>
<file name="packages/core/__perf__/README.md">
<violation number="1" location="packages/core/__perf__/README.md:62">
P3: The new-bench checklist conflicts with the calibration methodology: it says worst × 1.25 while the documented policy is worst × 1.5.</violation>
</file>
<file name="packages/core/__perf__/compiler.perf.ts">
<violation number="1" location="packages/core/__perf__/compiler.perf.ts:16">
P2: Stale methodology comment: line says "Multiply by 1.25" but the actual calibration used ×1.5 (as documented just below at line 23). This contradiction in the same header will mislead future recalibrators.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
|
Solid execution. Two small things from the bot reviews worth addressing before merge:
Address those and I'll merge. Otherwise the scaffold + 3 calibrated assertions are exactly the shape we discussed on #46. |
…rtions Refs SingularityAI-Dev#46. Scaffolds the perf-assertion discipline as PR 1 of the four-PR sequence agreed on the issue: scaffold first (this PR), then compiler fix, expression cache, and DAG sort tightening as separate fixes that re-run the bench to widen the margin. What this adds: - packages/core/__perf__/_helpers.ts — synthetic linear-chain spec generators - packages/core/__perf__/compiler.perf.ts — compileWorkflow on a 200-step linear chain (threshold 1749ms) - packages/core/__perf__/expression.perf.ts — evaluate × 10,000 calls on a varying context (threshold 479ms) - packages/core/__perf__/dag.perf.ts — resolve on a 1000-step linear chain (threshold 269ms) - packages/core/vitest.perf.config.ts — picks up only **/*.perf.ts, pool: forks for stable timings - package.json — npm run bench wired Threshold calibration follows the methodology agreed on SingularityAI-Dev#46: 5 runs on main, take the worst, multiply by 1.25 for slower-machine headroom. Calibration data is documented in the header of each *.perf.ts file so recalibration after a fix is auditable. After PR 2 (compiler fix), the compile assertion's margin should widen ~100x — that widening IS the proof-of-fix signal, so the threshold should NOT be tightened in PR 2. Linear chains are deliberately the worst-case shape (depth = node count). compileWorkflow on a 200-step chain currently runs in ~1.3s on main because every compileStep call re-resolves the full DAG (Candidate 1 in SingularityAI-Dev#46). Pinning at 200 (rather than 1000) keeps total bench duration under ~5s; once the per-step DAG re-resolution is removed, the same workload should drop to <50ms. The bench is OPT-IN: default `npm test` does not run *.perf.ts. Only `npm run bench` does. This avoids slowing the main test suite while giving a clear pre-merge regression signal when needed.
…stability Initial calibration (worst x1.25, single-machine) was too tight: re-runs on a more loaded developer machine showed 2-3x variance on compileWorkflow and ~2x on resolve, which pushed the assertions over their thresholds even though no algorithmic regression had occurred. Updated to use the worst observed across multiple sessions (quiet and loaded) x1.5 for headroom. Verified stable across 3 consecutive runs. The +50% trade-off (vs the +25% in the original SingularityAI-Dev#46 review) is justified by the bench being opt-in (npm run bench, NOT default npm test) — stable execution matters more than tight regression sensitivity. Once the algorithmic fixes in PRs 2-4 land, the assertion margin widens substantially (~100x for the compiler fix), so proof-of-fix remains strong despite the wider initial headroom. Each perf file's header comment documents both the quiet-run and loaded-run timings so the trade-off is auditable.
…r consistency + Vitest 4 singleFork Two small fixes per Rain's review on PR SingularityAI-Dev#53: 1. Multiplier consistency: compiler.perf.ts header methodology block and __perf__/README.md 'Adding a new bench' step both said x1.25; the calibration block in compiler.perf.ts (and README's main methodology section) had already moved to x1.5. Brought the inconsistent locations into line with x1.5 across the file. CodeRabbit and cubic-dev-ai both flagged this. 2. Vitest pool config: 'pool: forks' on its own does not actually serialise execution; in Vitest 4 the singleFork option moved from poolOptions.forks to a top-level forks block. Added 'forks: { singleFork: true }' so the comment claim ('One fork, serialised') matches behaviour. Verified: bench runs 3x consecutively, all green; no Vitest deprecation warnings.
ebdcd1a to
3275667
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (1)
packages/core/vitest.perf.config.ts (1)
15-18:⚠️ Potential issue | 🟠 Major | ⚡ Quick winUse
poolOptions.forks.singleForkinstead of top-levelforks(current shape is type-invalid).Line 18 is causing the config type error and does not match the expected Vitest config object in this codebase. Move
singleForkback underpoolOptions.forks.🔧 Proposed fix
// One fork, serialised, to minimise cross-test interference on timings. - // In Vitest 4, the `singleFork` option moved from `poolOptions.forks` - // to a top-level `forks` block — this is the new shape. pool: "forks", - forks: { singleFork: true }, + poolOptions: { + forks: { singleFork: true }, + },🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/core/vitest.perf.config.ts` around lines 15 - 18, The top-level `forks: { singleFork: true }` is type-invalid; move `singleFork` back under `poolOptions.forks` and remove the top-level `forks` block so the config uses `pool: "forks"` with `poolOptions: { forks: { singleFork: true } }` (update the object that currently contains `pool` and `forks` accordingly).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In `@packages/core/vitest.perf.config.ts`:
- Around line 15-18: The top-level `forks: { singleFork: true }` is
type-invalid; move `singleFork` back under `poolOptions.forks` and remove the
top-level `forks` block so the config uses `pool: "forks"` with `poolOptions: {
forks: { singleFork: true } }` (update the object that currently contains `pool`
and `forks` accordingly).
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 3beb74d4-1bb9-4c41-bfb2-67ca9ad35b37
📒 Files selected for processing (7)
package.jsonpackages/core/__perf__/README.mdpackages/core/__perf__/_helpers.tspackages/core/__perf__/compiler.perf.tspackages/core/__perf__/dag.perf.tspackages/core/__perf__/expression.perf.tspackages/core/vitest.perf.config.ts
✅ Files skipped from review due to trivial changes (2)
- packages/core/perf/README.md
- package.json
🚧 Files skipped from review as they are similar to previous changes (4)
- packages/core/perf/dag.perf.ts
- packages/core/perf/expression.perf.ts
- packages/core/perf/_helpers.ts
- packages/core/perf/compiler.perf.ts
…r consistency + Vitest 4 singleFork Two small fixes per Rain's review on PR SingularityAI-Dev#53: 1. Multiplier consistency: compiler.perf.ts header methodology block and __perf__/README.md 'Adding a new bench' step both said x1.25; the calibration block in compiler.perf.ts (and README's main methodology section) had already moved to x1.5. Brought the inconsistent locations into line with x1.5 across the file. CodeRabbit and cubic-dev-ai both flagged this. 2. Vitest pool config: 'pool: forks' on its own does not actually serialise execution; in Vitest 4 the singleFork option moved from poolOptions.forks to a top-level forks block. Added 'forks: { singleFork: true }' so the comment claim ('One fork, serialised') matches behaviour. Verified: bench runs 3x consecutively, all green; no Vitest deprecation warnings.
3275667 to
8923135
Compare
Closes #46 partially — this is PR 1 of the four-PR sequence you outlined in the issue. Establishes the perf-assertion discipline without coupling it to any algorithmic fix.
What this adds
packages/core/__perf__/_helpers.ts— synthetic linear-chain spec generatorspackages/core/__perf__/compiler.perf.ts— `compileWorkflow` on a 200-step linear chain (threshold 4500ms)packages/core/__perf__/expression.perf.ts— `evaluate` × 10,000 calls on a varying context (threshold 1000ms)packages/core/__perf__/dag.perf.ts— `resolve` on a 1000-step linear chain (threshold 800ms)packages/core/__perf__/README.md— methodology + how-to-runpackages/core/vitest.perf.config.ts— picks up only**/*.perf.ts, single-fork for stable timingsThreshold calibration — variance discovered
Your original methodology was 5 runs on main, take worst, ×1.25 headroom. I followed it on first calibration, but re-runs across multiple developer-machine sessions revealed substantial variance — single-shot timings ranged from ~750ms to ~2900ms for
compileWorkflowon the same 200-step chain depending on background load. The original ×1.25 thresholds were flaky on a loaded machine.After two iterations:
The +50% trade-off is justified by the bench being opt-in (
npm run bench, not defaultnpm test) — stable execution matters more than tight regression sensitivity. Once the algorithmic fixes in PRs 2-4 land, the assertion margin will widen substantially (~100× for the compiler fix), so proof-of-fix remains strong despite the wider initial headroom.If you'd prefer a different trade-off (e.g. tighter thresholds + accept some flake on loaded machines, or a
--strictflag that uses ×1.25 thresholds), I'll happily revise. Each*.perf.tsfile's header documents the calibration data so recalibration is one comment-edit away.Why 200-step chain on the compiler bench (not 1000)
A 1000-step linear chain takes 127 seconds to compile on
main— direct empirical demonstration of Candidate 1's quadratic shape. Useful as a finding, but bench tests can't reasonably take 2 minutes per run. 200-step chain runs in ~1-3s onmain, still showing the shape clearly enough that Candidate 1's fix will widen the margin from ~1.5× to ~100× — exactly the proof-of-fix signal you wanted.Verification
Verified stable across 3 consecutive runs. Default
npm testis unaffected — `.perf.ts` is outside the default `**/.test.ts` glob.Two unrelated observations made during this work
Worth raising once #45 has merged (not bundling here):
@logic-md/coreonly exports./from its package.jsonexportsfield. When I tried to write a baseline-measurement script that importedresolvedirectly from./dist/dag.js, node refused withERR_PACKAGE_PATH_NOT_EXPORTED. The public-API import (`import { resolve } from '@logic-md/core'`) works becauseindex.tsre-exports. Not a bug, but worth a small docs note for runtime implementers writing their own perf tools.packages/cli/tests/init.test.tshas 2 tests that fail on Windows checkouts (`should have valid YAML frontmatter in all templates`, `should have markdown content after frontmatter`). The failure is regex^---\nnot matching CRLF line endings. CI on `ubuntu-latest` is unaffected. Not addressing in this PR.Sequencing — what I'm NOT doing in this PR
Per your sequence comment on #46:
This PR is just the scaffold. No algorithmic changes. Each follow-up fix can be a small targeted PR that re-runs the bench to widen the margin.
Will open PR 2 (compiler fix) once this is merged.
Summary by cubic
Adds a
__perf__benchmark scaffold for@logic-md/corewith three calibrated assertions and annpm run benchcommand. This is PR 1/4 for #46 to set up an opt-in perf regression net with no algorithm changes.New Features
packages/core/__perf__with helpers and README; benchmarks:compileWorkflowon a 200-step linear chain (<4500ms),evaluate×10,000 (<1000ms), andresolveon a 1000-step chain (<800ms), using worst-on-main×1.5 headroom.packages/core/vitest.perf.config.tsto run only**/__perf__/**/*.perf.tsserially for stable timings; wiresnpm run bench. Defaultnpm testis unchanged.Bug Fixes
Written for commit 8923135. Summary will update on new commits.
Summary by CodeRabbit
npm run bench.