Skip to content

feat(ci): Mythos delta-pass auto-runner (single-actor, OAuth-token)#162

Merged
avrabe merged 1 commit into
mainfrom
feat/mythos-auto-gate
May 17, 2026
Merged

feat(ci): Mythos delta-pass auto-runner (single-actor, OAuth-token)#162
avrabe merged 1 commit into
mainfrom
feat/mythos-auto-gate

Conversation

@avrabe
Copy link
Copy Markdown
Contributor

@avrabe avrabe commented May 17, 2026

Summary

Automates the Mythos discover protocol that mythos-gate.yml
currently enforces by label only. On every PR that touches a Tier-5
file, anthropics/claude-code-action (SHA-pinned) runs against each
touched file with scripts/mythos/discover.md as the prompt, emits
a structured JSON verdict (NO_FINDINGS or FINDING), and the
aggregate job posts a sticky <!-- mythos-auto-gate --> PR comment

  • applies mythos-pass-done on all-pass.

📝 Opened as draft. Workflow needs the
CLAUDE_CODE_OAUTH_TOKEN repo secret before its first run will
succeed. See "Phase A" below.

Authorization stack — "only avrabe can trigger this"

Layer What it does What it blocks
1 if: github.actor == 'avrabe' && github.actor_id == '10056645' All other actors; immune to username-reassignment because actor_id is permanent
2 Trigger = pull_request (not pull_request_target) Fork PRs don't get secrets per GitHub default policy
3 claude-code-action pinned by commit SHA 51ea8ea7... Tag-hijack of v1 doesn't change what we run
4 Explicit minimal permissions: (PR write, contents read) Token-scope minimization
5 concurrency: cancel-in-progress per PR head Rapid push cycles don't burn budget
6 Detect job path-shape-validates Tier-5 files ${{ matrix.file }} interpolation injection blocked even if a hostile filename slips through

Phase A — your one-time setup

# On your machine:
claude update            # ensure v1.0.44+
claude setup-token       # prints CLAUDE_CODE_OAUTH_TOKEN

Then in browser: Repo Settings → Secrets and variables → Actions → New repository secret

  • Name: CLAUDE_CODE_OAUTH_TOKEN
  • Value: token from above

Once added, mark this PR ready for review and the workflow will fire on the next push.

Files

  • .github/workflows/mythos-auto.yml — workflow (detect → scan matrix → aggregate)
  • AGENTS.md — new "Auto-runner" subsection under Mythos pipeline
  • CHANGELOG.md[Unreleased] / Added entry

How this fits with mythos-gate.yml

mythos-gate.yml (label-only check) stays as source of truth.
The auto-runner is one way the mythos-pass-done label gets
applied — not the only way. Contributors without OAuth access (or
non-avrabe actors) continue to use the documented honor-system flow:
run discover.md in a fresh Claude Code session, post findings/NO
FINDINGS comment, apply label manually.

Test plan

  • Workflow YAML validates (actionlint if available)
  • On a Tier-5-touching PR by avrabe: workflow runs, posts comment, applies/withholds label per verdict
  • On a PR by anyone else: workflow's first job is skipped (job-level if: fails); no token leaked, no comment posted
  • On a PR with no Tier-5 changes: detect job sets any=false, downstream jobs skip cleanly
  • Hostile filename test: a path like meld-core/src/parser.rs;evil doesn't pass the path-shape filter and is logged as a warning

Cost / quota note

Token usage draws from the Max-plan subscription quota, shared with
interactive Claude Code use. A burst of Tier-5 PRs could starve
interactive sessions during the same window. Refresh-token gap
tracked at anthropics/claude-code-action#727.

🤖 Generated with Claude Code

Automates the human-driven discover protocol that mythos-gate.yml
currently enforces by label. On every PR that touches a Tier-5
file, runs anthropics/claude-code-action (SHA-pinned) per touched
file with scripts/mythos/discover.md as the prompt and captures a
structured `{verdict: NO_FINDINGS | FINDING}` JSON via the action's
--json-schema input. Posts a sticky <!-- mythos-auto-gate --> PR
comment with per-file results; applies mythos-pass-done on all-pass,
fails the job (without the label) on any FINDING.

Authorization stack (defense-in-depth, "only avrabe can trigger"):

1. Job-level if: requires both `github.actor == 'avrabe'` AND the
   immutable `github.actor_id == '10056645'`. Usernames can be
   reassigned after account deletion; numeric IDs cannot.
2. Trigger is pull_request (not pull_request_target). GitHub's
   default policy keeps secrets away from fork-repo PRs.
3. claude-code-action pinned by full commit SHA, not the floating
   v1 tag. Hijacking the tag does not change what we run.
4. Explicit minimal permissions: pull-requests write (sticky comment
   + label), contents read.
5. concurrency: cancel-in-progress per PR head — no budget burn on
   rapid push cycles.
6. Detect job path-shape-validates every Tier-5 file
   (^[a-zA-Z0-9/_.-]+$) before piping into the matrix so a hostile
   filename cannot inject through ${{ matrix.file }} downstream;
   matrix.file is read via env: in run blocks, not direct
   interpolation.

Auth flow uses CLAUDE_CODE_OAUTH_TOKEN from avrabe's Max plan; no
separate API billing. Token usage draws from the subscription rate
limit shared with interactive Claude Code use.

Label-only mythos-gate.yml remains source-of-truth — the auto-runner
is one way the label gets applied, not the only way. Contributors
without OAuth access continue using the honor-system flow per
AGENTS.md.

Setup (one-time, on maintainer machine):
  claude update           # ensure v1.0.44+
  claude setup-token      # prints CLAUDE_CODE_OAUTH_TOKEN
Then add the token as repo secret CLAUDE_CODE_OAUTH_TOKEN.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@avrabe avrabe marked this pull request as ready for review May 17, 2026 05:16
@github-actions
Copy link
Copy Markdown

LS-N verification gate

⚠️ 15/19 verified — 4 missing regression tests

count
Passed (≥1 test, all green) 15
Failed (≥1 test failure) 0
Missing (no ls_*_NN_* test found) 4

Approved loss-scenarios.yaml entries are expected to have a
regression test named ls_<letter>_<num>_* (e.g. LS-A-11
ls_a_11_*). The gate runs each prefix via cargo test --lib --no-fail-fast and aggregates pass/fail/missing.

Failed LS entries

(none)

Missing regression tests
  • LS-CP-4
  • LS-A-8
  • LS-A-9
  • LS-A-19

Updated automatically by tools/post_verification_comment.py.
Source of truth: safety/stpa/loss-scenarios.yaml.

@avrabe
Copy link
Copy Markdown
Contributor Author

avrabe commented May 17, 2026

Admin-merge per #139 (smithy capacity)

9 checks green + 2 expected skips (Mythos pass, Aggregate findings + label — correctly skipped because this PR touches no Tier-5 source). The remaining 3 — Clippy, Coverage, fuzz_resolver_terminates — have been queued ~2h40m against the rust-cpu pool, which is 7/7 busy on org-wide work (the documented #139 §4 cross-org contention pattern).

This is the same admin-merge case as PR #161 yesterday. The workflow added here is single-actor-scoped (only avrabe can trigger it), and the new Detect Tier-5 changes job ran green proving the actor gate is wired correctly.

Admin-merge counter for #139 since last reset:

Will track the reset back into #139 after merge.

@avrabe avrabe merged commit aaeb90c into main May 17, 2026
14 checks passed
@avrabe avrabe deleted the feat/mythos-auto-gate branch May 17, 2026 06:03
avrabe added a commit that referenced this pull request May 18, 2026
…#163)

PR #156 fixed the `imp.name.ends_with(rn)` suffix-collision bug in
`Merger::add_unresolved_imports` (the dedup-skip path propagating
`resource_rep_by_component` / `resource_new_by_component` entries)
but landed without a regression test. The LS-N verification gate
(#161) surfaced this gap as missing coverage on the next-to-last
sweep.

Extracts the exact-match lookup into a private helper
`find_exact_resource_import_idx` and adds three regression tests:

- `ls_a_19_exact_match_picks_float_not_bigfloat` — both `float` and
  `bigfloat` in tracking; asking for `[resource-rep]float` must
  return float's index, not bigfloat's. The buggy `ends_with` form
  would match bigfloat under some iteration orders.
- `ls_a_19_no_match_returns_none_even_with_suffix_collision` — only
  bigfloat in tracking, caller asks for plain float. Exact match
  must return None; the buggy `ends_with` form would return
  bigfloat's index.
- `ls_a_19_resource_new_lookup_is_also_exact` — same suffix-
  collision case for the `[resource-new]` table.

LS-N gate result moves from 15/19 verified (4 missing) to 16/19
verified (3 missing). Remaining missing-bucket entries are LS-CP-4
(likely subsumed by #130 Phase 2), LS-A-8, LS-A-9 — tracked
separately as research items.

This is also the first real PR exercising the `mythos-auto.yml`
workflow added in #162: it touches a Tier-5 file (`merger.rs`) so
the auto-runner will fire end-to-end on PR open.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
avrabe added a commit that referenced this pull request May 18, 2026
PR #163 was the first end-to-end test of mythos-auto.yml (added in
#162). It surfaced three plumbing issues:

1. The action's `oven-sh/setup-bun` step requires `unzip`, which is
   not installed by default on the rust-cpu runners. Without it the
   action's bun-based post-step entrypoints exit 127, and the whole
   scan-step exits failure before emitting structured output.

2. The `Slugify file path for artifact name` step sat AFTER the
   discover step with no `if: always()`. When discover failed, the
   slug step was skipped, leaving `steps.slug.outputs.slug` empty.
   Downstream `if: always()` steps then wrote
   `mythos-out/.json` (no slug) and `upload-artifact` complained
   "No files were found with the provided path: mythos-out/.json".

3. The `Save structured output as artifact` step embedded
   `${{ steps.slug.outputs.slug }}` in the run-block via direct
   interpolation. Silently substituting an empty slug into a file
   path is a footgun even if the slug step had run — better to read
   slug from an env var and fail loudly on empty.

Fixes:

- Slugify step moves BEFORE the discover step, so it always runs
  (no `if: always()` needed because both detect+slug are the
  precondition for everything below).
- New `Install unzip (required by setup-bun)` step, best-effort
  apt install mirroring the action's own subprocess-isolation
  install pattern. `continue-on-error: true` so non-Debian runners
  don't break the workflow.
- `Save structured output as artifact` reads slug from env (`SLUG`)
  rather than `${{ }}` interpolation; explicitly errors out if SLUG
  is empty rather than silently writing to a malformed path.
- `upload-artifact` step gains an extra `steps.slug.outputs.slug
  != ''` guard so it never tries to upload with an empty name.

The placeholder-FINDING fallback (the part that surfaced these
issues by writing "discover step failed before emitting structured
output" into the aggregate comment) is intentional and stays — it
guarantees the gate blocks on workflow failure rather than silently
passing.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
avrabe added a commit that referenced this pull request May 21, 2026
P3 cross-component stream-pair detection foundation + a fully
operational Mythos delta-pass auto-runner. 12 commits since v0.8.1.

Headline changes:

- Cross-component stream<T> pairing detection (#141, ADR-3). The
  StreamPairGraph foundation for the in-module stream adapter: meld
  now inventories at resolve time which fused components form
  producer -> consumer stream pairings. The ring-buffer / copy-chain
  emitter is a runtime-verified follow-up (ADR-3 Path N).

- Mythos delta-pass auto-runner (#162, #164, #170, #173, #175). The
  AI-driven discover protocol now runs automatically on every
  Tier-5 PR by the maintainer, via claude-code-action on a Max-plan
  OAuth token. Five plumbing fixes brought it to a working
  end-to-end state: scan -> NO_FINDINGS verdict -> sticky comment ->
  mythos-pass-done label.

- LS-N verification gate (#161, #165). Every approved loss-scenario
  in safety/stpa/loss-scenarios.yaml is now enforced to have a
  matching ls_<letter>_<num>_* regression test; 19/19 verified.

- DWARF / witness-mapping discovery (#131) — Phase 1 of the #130
  epic; pins today's lossy passthrough as the green-to-red oracle
  for the Phase 2 remap work.

- Regression coverage for LS-A-8/9/19 and LS-CP-4 (#163/165/166/169)
  — closed every missing-test entry the LS-N gate surfaced.

- CI footprint reduction (#171) — bench/fuzz/ci skip on docs- and
  safety-only PRs; meld is a leaner consumer of the shared fleet.

- fuzz.yml musl-target drop (#170, closes #168) — fixes the
  recurring "sanitizer incompatible with statically linked libc"
  fuzz failures.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant