Skip to content

chore(deps): bump @tangle-network/agent-eval 0.20.12 → ^0.23.0#100

Merged
drewstone merged 1 commit into
mainfrom
chore/bump-agent-eval-0.23
May 10, 2026
Merged

chore(deps): bump @tangle-network/agent-eval 0.20.12 → ^0.23.0#100
drewstone merged 1 commit into
mainfrom
chore/bump-agent-eval-0.23

Conversation

@drewstone
Copy link
Copy Markdown
Contributor

Summary

Bumps @tangle-network/agent-eval from 0.20.12 to ^0.23.0.

Note: the request framed this as a six-version leap from 0.17.3, but the
current main pin is already 0.20.12. The actual leap is three minor
releases (0.21, 0.22, 0.23):

  • 0.21 — capture-integrity directives for launch-grade benchmark runs.
  • 0.22 — EvalCampaign + replay + always-valid + outcome calibration.
  • 0.23 — RL primitives bridging eval to policy training.

API drift

None observed. The only consumer in this repo is
bench/research/webvoyager-agent-eval-loop.mjs, which imports six symbols:

Import 0.23 status
aggregateRunScore exported, signature compatible (src/run-score.ts)
AxGepaSteeringOptimizer exported (src/steering-optimizer.ts)
JsonlTrialCache exported (src/jsonl-trial-cache.ts)
PairwiseSteeringOptimizer exported (src/steering-optimizer.ts)
runPromptEvolution exported (src/prompt-evolution.ts)
validateRunRecord exported (src/run-record.ts)

A runtime resolution check confirms all six load as functions/classes; no
source changes required.

The lockfile shrank by ~23 lines because 0.23 has fewer transitive deps.

Test plan

  • pnpm install clean (worked under pnpm 10.23 / Node lockfile).
  • pnpm lint (tsc --noEmit) — passes with no errors.
  • pnpm build (tsc + asset copy) — passes.
  • pnpm test — 127 files / 1543 tests passing.
  • Runtime import probe of all six consumed symbols against the installed
    0.23.0 package.

E2E / external-bench scripts (WebVoyager loop, design-bench, etc.) were not
exercised here; they require external API keys and live infra. The static
verifications above cover the surface area touched by this bump.

Pulls in 0.21 capture-integrity, 0.22 EvalCampaign + replay + always-valid +
outcome calibration, and 0.23 RL primitives. All consumer imports
(aggregateRunScore, AxGepaSteeringOptimizer, JsonlTrialCache,
PairwiseSteeringOptimizer, runPromptEvolution, validateRunRecord) remain
exported with compatible signatures, so no source changes were required.

Verified: pnpm install, pnpm lint (tsc --noEmit), pnpm build, pnpm test
(127 files / 1543 tests passing).
@drewstone drewstone merged commit 08376a7 into main May 10, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant