feat(ci): LS-N verification gate (spar-pattern port)#161
Conversation
PR-time gate that enforces meld's STPA test-naming contract: every `status: approved` entry in `safety/stpa/loss-scenarios.yaml` must have at least one `#[test] fn ls_<letter>_<num>_*` regression test in `meld-core` (e.g. LS-A-11 -> `ls_a_11_*`). Adapted from spar's rivet-driven verification gate (pulseengine/spar@ba329f3d). meld has no rivet-style executable artifact, but loss-scenarios pair with regression tests by the established naming convention; this gate makes that pairing a verifiable contract. Three files: - tools/run_ls_verification.py — Python (stdlib + PyYAML). Iterates approved LS IDs, runs `cargo test --lib --no-fail-fast <prefix>` per ID, buckets results as passed / failed / missing, writes verification-results.json. - tools/post_verification_comment.py — Marker-tagged sticky PR comment upsert via GitHub REST API. Pure stdlib (urllib). First run creates the comment, subsequent runs PATCH the body. Marker: `<!-- meld-ls-verification-gate -->`. - .github/workflows/verification-gate.yml — PR + workflow_dispatch trigger. Fail-on-failure but advisory-on-missing so the 10 older approved entries with ad-hoc test names (e.g. PR #114's `test_canonical_abi_size_fixed_size_list_saturates_on_overflow` for LS-P-4) can be migrated incrementally rather than blocking every PR. Smoke-tested locally against current main: 19 approved LS, 10 passed (LS-A-7/11/15/17/18/20/12/13/14/16), 9 missing (the older v0.7.0-era and PR-#114-era entries). No failures. Same script runs locally: python3 tools/run_ls_verification.py Inputs are integer/metadata only (PR number via env, head_ref in concurrency); no untrusted free-form text from PR titles/bodies/ comments is read in run: blocks. AGENTS.md gains a "LS-N verification gate" section under "Mythos Bug-Hunt Pipeline". Refs: pulseengine/spar@ba329f3d Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Self-hosted runners (Debian/Ubuntu Python 3.12) enforce PEP 668 and reject `pip install --user pyyaml` with "externally-managed-environment". `--break-system-packages` is the documented PEP 668 opt-out for CI environments where the runner's Python install is disposable per workflow run. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
LS-N verification gate
Approved Failed LS entries(none) Missing regression tests
Updated automatically by |
The LS-N verification gate (this PR) discovered 9 approved loss-scenarios without a matching `ls_<letter>_<num>_*` regression test. Five of those already had regression tests pinning the fix under historical names; this commit adds thin convention aliases so the gate's discovery query finds them. The original tests stay in place (single source of truth, preserves git blame / grep continuity); each alias is a `#[test] fn` that delegates to the original test body. | LS | Original test | Alias | |-----|---------------|-------| | LS-P-4 | test_canonical_abi_size_fixed_size_list_saturates_on_overflow | ls_p_4_canonical_abi_size_saturates_on_overflow | | LS-P-5 | test_parser_rejects_truncated_module_section_issue_118 | ls_p_5_parser_rejects_truncated_module_section | | LS-R-10 | test_issue112_item5_intra_adapter_preserves_from_import_module | ls_r_10_intra_adapter_preserves_from_import_module | | LS-CP-3 | test_issue112_item4_sort_adapter_sites_is_canonical | ls_cp_3_sort_adapter_sites_is_canonical | | LS-A-10 | cabi_alignment_stackful_retptr_writes_i64_at_offset_8 | ls_a_10_cabi_align_retptr_writeback | Gate result drops from 10 passed / 9 missing to 15 passed / 4 missing. The remaining four (LS-CP-4, LS-A-8, LS-A-9, LS-A-19) genuinely lack regression tests and land in follow-up PRs: - LS-CP-4: DWARF passthrough emits address-incorrect debug info - LS-A-8 : Inner-list rep_func selected by HashMap iteration order - LS-A-9 : Async callback POLL falls through to YIELD path - LS-A-19: Resource import dedup uses ends_with() suffix match The LS-CP-3 alias only covers the adapter_sites-order half of the scenario; the caller_encoding_fallback half also still needs a dedicated regression test (tracked alongside LS-A-8/9/19/CP-4). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Mythos delta-pass requiredThis PR modifies one or more Tier-5 source files (per Before merge, run the Mythos discover protocol on the
Why this gate exists: LS-A-10 The gate check on this PR will pass once the label is |
Mythos delta-pass: NO FINDINGSThe latest commit ( #[test]
fn ls_p_4_canonical_abi_size_saturates_on_overflow() {
test_canonical_abi_size_fixed_size_list_saturates_on_overflow();
}No production code path is modified. No new logic to scan. The Adding |
Admin-merge per #139 (smithy capacity)8 of 11 checks green; the 3 remaining This is the documented #139
Same handling as PR #159 earlier today (cap-starved fuzz queue, 50+ Admin-merge counter for #139:
Tracking the reset back into the issue separately. |
…#163) PR #156 fixed the `imp.name.ends_with(rn)` suffix-collision bug in `Merger::add_unresolved_imports` (the dedup-skip path propagating `resource_rep_by_component` / `resource_new_by_component` entries) but landed without a regression test. The LS-N verification gate (#161) surfaced this gap as missing coverage on the next-to-last sweep. Extracts the exact-match lookup into a private helper `find_exact_resource_import_idx` and adds three regression tests: - `ls_a_19_exact_match_picks_float_not_bigfloat` — both `float` and `bigfloat` in tracking; asking for `[resource-rep]float` must return float's index, not bigfloat's. The buggy `ends_with` form would match bigfloat under some iteration orders. - `ls_a_19_no_match_returns_none_even_with_suffix_collision` — only bigfloat in tracking, caller asks for plain float. Exact match must return None; the buggy `ends_with` form would return bigfloat's index. - `ls_a_19_resource_new_lookup_is_also_exact` — same suffix- collision case for the `[resource-new]` table. LS-N gate result moves from 15/19 verified (4 missing) to 16/19 verified (3 missing). Remaining missing-bucket entries are LS-CP-4 (likely subsumed by #130 Phase 2), LS-A-8, LS-A-9 — tracked separately as research items. This is also the first real PR exercising the `mythos-auto.yml` workflow added in #162: it touches a Tier-5 file (`merger.rs`) so the auto-runner will fire end-to-end on PR open. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Two related fixes for the LS-N verification gate's coverage of
LS-CP-4 ("DWARF passthrough emits address-incorrect debug info").
1. `meld-core/tests/dwarf_strip.rs` already had three tests pinning
Phase 1.5's Strip-default policy (the mitigation LS-CP-4 calls
for): `default_strips_dwarf`, `passthrough_preserves_dwarf`,
`default_is_strip`. The gate didn't see them because the names
don't match the `ls_cp_4_*` convention. Adds three convention
aliases that delegate to the existing test bodies — same pattern
as the five aliases in PR #161.
2. `tools/run_ls_verification.py` was invoking `cargo test --lib`,
which excludes integration tests under `<package>/tests/`. Drops
the `--lib` filter so both lib and integration-test binaries
participate. Each cargo target prints its own `test result:`
line; the parser already sums across multiple matches, so this
is a one-line change with no other plumbing impact.
Gate result moves from 16/19 verified (3 missing: LS-CP-4, LS-A-8,
LS-A-9) to 17/19 verified (2 missing: LS-A-8, LS-A-9). The
remaining two need net-new tests, not aliases — surveyed in task
#52, scoped at ~1-2h (LS-A-9) and ~2-4h (LS-A-8).
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
P3 cross-component stream-pair detection foundation + a fully operational Mythos delta-pass auto-runner. 12 commits since v0.8.1. Headline changes: - Cross-component stream<T> pairing detection (#141, ADR-3). The StreamPairGraph foundation for the in-module stream adapter: meld now inventories at resolve time which fused components form producer -> consumer stream pairings. The ring-buffer / copy-chain emitter is a runtime-verified follow-up (ADR-3 Path N). - Mythos delta-pass auto-runner (#162, #164, #170, #173, #175). The AI-driven discover protocol now runs automatically on every Tier-5 PR by the maintainer, via claude-code-action on a Max-plan OAuth token. Five plumbing fixes brought it to a working end-to-end state: scan -> NO_FINDINGS verdict -> sticky comment -> mythos-pass-done label. - LS-N verification gate (#161, #165). Every approved loss-scenario in safety/stpa/loss-scenarios.yaml is now enforced to have a matching ls_<letter>_<num>_* regression test; 19/19 verified. - DWARF / witness-mapping discovery (#131) — Phase 1 of the #130 epic; pins today's lossy passthrough as the green-to-red oracle for the Phase 2 remap work. - Regression coverage for LS-A-8/9/19 and LS-CP-4 (#163/165/166/169) — closed every missing-test entry the LS-N gate surfaced. - CI footprint reduction (#171) — bench/fuzz/ci skip on docs- and safety-only PRs; meld is a leaner consumer of the shared fleet. - fuzz.yml musl-target drop (#170, closes #168) — fixes the recurring "sanitizer incompatible with statically linked libc" fuzz failures. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Summary
Adapts spar's rivet-driven verification gate
(pulseengine/spar@ba329f3d)
to meld's STPA loss-scenario artifacts.
PR-time gate that enforces meld's test-naming contract: every
status: approvedentry insafety/stpa/loss-scenarios.yamlmusthave at least one
#[test] fn ls_<letter>_<num>_*inmeld-core(e.g.
LS-A-11→ls_a_11_*). Posts a single sticky PR commentwith passed / failed / missing counts.
Bucket semantics
ls_<>_<n>_*testsMissing is advisory (warning, not block) so older approved entries
with ad-hoc test names can be migrated incrementally rather than
blocking every PR.
Gate state after this PR
19 approved LS entries, 15 passed / 0 failed / 4 missing.
The 5 newly-passing entries got thin convention aliases (last
commit) so their pre-existing regression tests are discoverable:
test_canonical_abi_size_fixed_size_list_saturates_on_overflowls_p_4_canonical_abi_size_saturates_on_overflowtest_parser_rejects_truncated_module_section_issue_118ls_p_5_parser_rejects_truncated_module_sectiontest_issue112_item5_intra_adapter_preserves_from_import_modulels_r_10_intra_adapter_preserves_from_import_moduletest_issue112_item4_sort_adapter_sites_is_canonicalls_cp_3_sort_adapter_sites_is_canonical(adapter-sites half only)cabi_alignment_stackful_retptr_writes_i64_at_offset_8ls_a_10_cabi_align_retptr_writebackThe 4 still-missing genuinely lack regression tests and will
be addressed in follow-up PRs (one per subsystem):
rep_funcselected by HashMap iteration orderends_with()suffix matchcaller_encoding_fallbackhalf — same family)Files
tools/run_ls_verification.py— runner (stdlib + PyYAML); local-runnabletools/post_verification_comment.py— sticky comment upsert (pure stdlib urllib).github/workflows/verification-gate.yml— workflow (PR + workflow_dispatch)meld-core/src/{parser,resolver,adapter/fact}.rs— 5 convention aliasesAGENTS.md— new "LS-N verification gate" section under Mythos pipelineCHANGELOG.md— Unreleased / Added entry.gitignore— ignore localverification-results.jsonLocal run
Test plan
🤖 Generated with Claude Code