Skip to content

Commit 76650fc

Browse files
hyperpolymathclaude
andcommitted
feat(toolchain-readiness-grades): complete the directory — fill in templates, .a2ml, SELF-ASSESSMENT
The initial TRG commit (1880521) shipped the spec and the README but left the five templates referenced from the spec as "forthcoming" and omitted the machine-readable counterpart and the self-assessment. That's exactly the "default contractile left in" pattern that should not survive into git. This commit fixes it. New files: * templates/AUDIT-TEMPLATE.adoc — the canonical four-tier MoSCoW audit template that every component audit must fill in. Mapped directly onto the TRG grade gates (Must -> E, Should -> D, Could -> C, Aspirational -> B). Required header and footer fields specified. Style rules: every PASSED claim must be supported by a verifiable artefact; the audit document is itself subject to the invariant-path doc-claims grounder. * templates/CANONICAL-PROOF-SUITE.adoc — the M/S/E proof suite v1.0 with 15 entries (5 mathematics, 5 science, 5 engineering). Each entry cites where existing formal proofs exist in mathlib4, the Coq stdlib, the Idris2 stdlib, the Agda stdlib, etc. so the porting work is bounded. This is the gate to leave grade E. Pass condition, versioning policy, and re-baseline cadence specified. * templates/QUALIFYING-PROVERS.adoc — the v1 qualifying-prover set: Idris2, Lean 4, Rocq, Agda. Qualifying criteria (active maintenance, published kernel design, public test suite, public manual, reproducible builds, >=1000 distinct users, no outstanding soundness bugs, banned-construct discipline). A-grade kernel-soundness criterion: at least one of the three provers in cross-validation must have a published kernel-soundness proof. As of v1.0: Lean 4 and Rocq unambiguously satisfy; Idris2 and Agda are in the set but should not be the sole kernel-soundness anchor until their meta-theoretic verification matures. Annual re-baseline. * templates/FUZZING-CORPUS-FLOOR.adoc — per-grade and per-component fuzzing minimums. D requires harness exists + 1hr CI + corpus 100; C requires 14 days continuous + 1k corpus + 70% branch coverage; B requires 30 days + 10k corpus + 85% coverage + structure-aware fuzzer; A requires 90 days + 100k corpus + 95% coverage + structure-aware AND differential fuzzer. Per-component matrix covering lexer, parser, AST, macro expander, import resolver, diagnostics, semantic analyser, type checker, IR, codegen, linker, runtime, REPL, formatter, LSP, stdlib. Differential fuzzing requirements at A-grade (the LangSec / parser-differential class). * templates/A-GRADE-LLM-PANEL.adoc — binding composition rules for the six-LLM panel. Vendor distinctness (one per major lab), composition by role (2 reasoning + 2 specialist coding + 1 instruct + 1 MoE), reproducibility requirement (every signoff archived as byte-reproducible session transcript with PGP-style tamper evidence), independent submission (no coordinated prompting), unanimous required, evidence archival. Qualifying-vendor matrix (v1.0, 2026-Q2): Anthropic, OpenAI, Google DeepMind, Meta, Mistral, DeepSeek, Alibaba, Cohere, xAI. Quarterly re-baseline. Defence against the "category error" objection: the LLM panel supplements human reviewers, never replaces them. * SELF-ASSESSMENT.adoc — TRG applied to itself. Honest current grade: *X (Untested)*. Standard exists and is internally consistent but no toolchain component has yet been audited against it. Path-to-D laid out (audit at least one hyperpolymath toolchain component; 007 lexer is the obvious first candidate since the TRG audit format is derived from the 007 lexer MK2 audit archived in references/007/audit-lexer-mk2-tier-format.md). * TOOLCHAIN-READINESS-GRADES.a2ml — machine-readable counterpart of the spec, matching the CRG `.a2ml` pattern. Includes grade definitions with ordinal + release stage + stability posture + evidence required, release-stage mapping, required toolchain components, banned constructs, standing priority order, grade aggregation rule, qualifying provers, canonical proof suite metadata, A-grade LLM panel composition, demotion triggers, per-language extension hook, and self-assessment. Modified files: * README.adoc — replaced "Forthcoming" entry for templates/ with the actual inventory of the five templates. Added inventory entries for SELF-ASSESSMENT.adoc and TOOLCHAIN-READINESS-GRADES.a2ml. Bare-filename references like `AUDIT-TEMPLATE.adoc` rewritten as `templates/AUDIT-TEMPLATE.adoc` so the doc-claims grounder finds them. Same fix applied to SELF-ASSESSMENT.adoc and templates/CANONICAL-PROOF-SUITE.adoc. Verification: re-ran invariant-path doc-claims on the directory after this fix. Grounded count went from 0 (templates didn't exist) to 41/64 = 64% within the directory's scope. The remaining 23 ungrounded references are all legitimate cross-estate references (memory entries, sibling directories in standards/, per-language repo conventions) that the v0 single-root grounder cannot follow — multi-root grounding is on the invariant-path Phase A roadmap. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 1880521 commit 76650fc

8 files changed

Lines changed: 1068 additions & 7 deletions

toolchain-readiness-grades/README.adoc

Lines changed: 27 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -49,13 +49,33 @@ Three reasons:
4949
that originated the four-tier format.
5050
These are the canonical exemplars of
5151
the per-component audit format.
52-
| `templates/` | (Forthcoming) `AUDIT-TEMPLATE.adoc`,
53-
`CANONICAL-PROOF-SUITE.adoc`,
54-
`QUALIFYING-PROVERS.adoc`,
55-
`FUZZING-CORPUS-FLOOR.adoc`,
56-
`A-GRADE-LLM-PANEL.adoc`. Skeletons
57-
for the templates referenced in the
58-
spec.
52+
| `templates/` | The five canonical templates
53+
referenced from the spec, all
54+
v1.0 as of 2026-04-12:
55+
`templates/AUDIT-TEMPLATE.adoc`
56+
(the four-tier MoSCoW audit
57+
template),
58+
`templates/CANONICAL-PROOF-SUITE.adoc`
59+
(the M/S/E proof suite, 15
60+
entries — the gate to leave
61+
grade E),
62+
`templates/QUALIFYING-PROVERS.adoc`
63+
(the v1 qualifying-prover set:
64+
Idris2, Lean 4, Rocq, Agda),
65+
`templates/FUZZING-CORPUS-FLOOR.adoc`
66+
(the per-component fuzzing
67+
minimums),
68+
`templates/A-GRADE-LLM-PANEL.adoc`
69+
(the 6+6 panel composition rules).
70+
| `SELF-ASSESSMENT.adoc` | TRG applied to itself. Honest
71+
current grade: *X* (untested).
72+
The standard has been written
73+
but not yet applied to any
74+
toolchain component. Path to D
75+
and beyond is laid out.
76+
| `TOOLCHAIN-READINESS-GRADES.a2ml` | Machine-readable counterpart of
77+
the spec, matching the CRG `.a2ml`
78+
pattern.
5979
|===
6080

6181
== Quick mental model
Lines changed: 171 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,171 @@
1+
// SPDX-License-Identifier: PMPL-1.0-or-later
2+
// Copyright (c) 2026 Jonathan D.A. Jewell (hyperpolymath) <j.d.a.jewell@open.ac.uk>
3+
4+
= Toolchain Readiness Grades — Self-Assessment
5+
:toc: auto
6+
:status: Active
7+
:date: 2026-04-12
8+
9+
The standards repo grades itself. This document applies the TRG criteria
10+
to *the TRG document set itself*. TRG is brand new, so the honest grade
11+
is low — and that honesty is the point.
12+
13+
== Current Grade: X (Untested)
14+
15+
*Assessed: 2026-04-12* +
16+
*Honest verdict: TRG v1.0 is itself at grade X.* The standard exists,
17+
the document set is internally consistent, but no toolchain component
18+
has yet been audited against it, no machine-checkable claims have been
19+
re-validated by the doc-claims grounder, and no external review has
20+
taken place.
21+
22+
This is the *correct* starting grade. Anything else would be exactly
23+
the kind of premature self-grading that TRG itself is built to prevent.
24+
25+
== Why X is the right grade right now
26+
27+
[cols="1,1,3", options="header"]
28+
|===
29+
| Criterion | Met? | Evidence
30+
31+
| *X (Untested)* — default
32+
| (n/a)
33+
| The default state for any newly-created standard before any audit
34+
is run. TRG v1.0 was published 2026-04-12 and has not yet been
35+
applied to any component.
36+
37+
| *F (Harmful / Wasteful)*
38+
| PASSED
39+
| TRG is not harmful. It is internally consistent, CRG-compatible,
40+
and addresses a real gap (CRG grades general components; TRG
41+
grades toolchain components specifically). The estate has multiple
42+
toolchain projects (007, ephapax, AffineScript, my-lang, etc.)
43+
that need a stricter rubric than CRG provides.
44+
45+
| *E (Bare Minimum)*
46+
| FAILED
47+
| Requires "all Must rows passed" against the four-tier MoSCoW format
48+
defined in §4. *No toolchain component has been audited against TRG
49+
yet*, so this row cannot be PASSED for any concrete component, and
50+
therefore the standard itself has not been demonstrated to be
51+
meaningfully applicable.
52+
|
53+
|
54+
| *Path to E:* run a TRG audit on at least one hyperpolymath toolchain
55+
component (the obvious first candidate is 007's lexer, since the
56+
TRG audit format is derived from the 007 lexer MK2 audit which is
57+
archived in `references/007/audit-lexer-mk2-tier-format.md`).
58+
Then promote 007's lexer to TRG-graded status. That single audit
59+
demonstrates the standard is *applicable* and lifts TRG itself out
60+
of grade X.
61+
|===
62+
63+
== Path to D (Should-Have Complete)
64+
65+
The next non-trivial promotion. Requires:
66+
67+
. *All Must AND all Should rows PASSED* against the four-tier format,
68+
for at least one component.
69+
. *Canonical Mathematical / Scientific / Engineering Proof Suite* (§6.6)
70+
machine-checked through the audited component. The suite definition
71+
is now at `templates/CANONICAL-PROOF-SUITE.adoc` (v1.0, 15 entries).
72+
. *2 fully-proven programs* per §6.2, each cross-validated by at
73+
least one qualifying prover from `templates/QUALIFYING-PROVERS.adoc`.
74+
. *RSR-FULL* — all 17 RSR workflows present, SHA-pinned, and currently
75+
green. Standards repo is RSR-compliant; need to verify the same for
76+
the host repo of the audited component.
77+
. *Banned constructs ZERO* — verified by `panic-attack assail` clean
78+
across all severities (promoted from "Critical and High only" per §5.4).
79+
. *A2ML files + 0-AI-MANIFEST + Immaculate Guide* — already required
80+
at E for the standard's adopting repos.
81+
. *Fuzzing harness exists* per `templates/FUZZING-CORPUS-FLOOR.adoc`
82+
(Grade D row).
83+
84+
*The single biggest blocker* between grade X and grade D is the
85+
Canonical Proof Suite. Until at least one toolchain in the estate has
86+
machine-checked all 15 entries of the M/S/E-CPS through itself, the
87+
standard's gate to leave grade E is unenforceable in practice.
88+
89+
== Path to C (Self-Validated, alpha gate)
90+
91+
After grade D, the path to C requires:
92+
93+
. *All Could rows PASSED.*
94+
. *Full Testing-Taxonomy 16 × 14 coverage* per
95+
`testing-and-benchmarking/TESTING-TAXONOMY.adoc`.
96+
. *Six-Sigma test passing* — all tests in the Six-Sigma range
97+
(≤ 3.4 failures per million test runs) for at least 14 consecutive
98+
days.
99+
. *Continuous fuzzing for at least 14 days*.
100+
. *5 fully-proven programs.*
101+
. *10 dogfood users.*
102+
. *3-nines on the standing priority axes* (dependability, security,
103+
interop, usability).
104+
. *VeriSimDB instance present and used by tests.*
105+
. *`launch-scaffolder mint` wired.*
106+
. *Stapeln containers built from Chainguard base.*
107+
108+
== Path to B and A
109+
110+
See `TOOLCHAIN-READINESS-GRADES.adoc` §5.6 and §5.7 for the full
111+
requirements. The honest expectation is *zero toolchain components in
112+
the estate reach A in year one*. That is the correct order of
113+
magnitude. Anything that reaches B in year one is already
114+
extraordinary by industry standards.
115+
116+
== Standards conformance of the TRG document set itself
117+
118+
[cols="2,1,4",options="header"]
119+
|===
120+
| Conformance item | Status | Notes
121+
122+
| Main spec exists in `.adoc` | ✅ | `TOOLCHAIN-READINESS-GRADES.adoc` (~40 KB)
123+
| README.adoc | ✅ | `README.adoc`
124+
| Self-assessment | ✅ | This document
125+
| Machine-readable counterpart (`.a2ml`) | ✅ | `TOOLCHAIN-READINESS-GRADES.a2ml`
126+
| Templates referenced from spec | ✅ | All 5 templates created: `templates/AUDIT-TEMPLATE.adoc`, `templates/CANONICAL-PROOF-SUITE.adoc`, `templates/QUALIFYING-PROVERS.adoc`, `templates/FUZZING-CORPUS-FLOOR.adoc`, `templates/A-GRADE-LLM-PANEL.adoc`
127+
| References (canonical examples) | ✅ | 25 audit files copied verbatim from 007 in `references/007/`
128+
| SPDX header on every file | ✅ | Verified
129+
| invariant-path doc-claims grounder run | ✅ | First run on `standards/` produced 519 grounded / 2597 ungrounded / 1 unknown / 3117 total — drives this self-assessment's gap analysis
130+
| All file-path claims in spec resolve | 🟡 | Templates now exist; cross-repo references (e.g. to `feedback_branch_protection.md` in memory, `immaculate-guide/IMMACULATE-GUIDE.adoc` in standards) should resolve under multi-root grounding once that lands in invariant-path Phase A
131+
| External review by formal-methods community | ❌ | Not yet — Phase A of invariant-path must complete first; then a coordinated submission to LangSec / NCSC research / NIST / formal-methods Zulip
132+
| Adoption by at least one toolchain component | ❌ | 007 lexer is the obvious first candidate; not yet started
133+
|===
134+
135+
== Honest acknowledgments
136+
137+
* TRG is not modest. The A grade is intentionally extraordinary —
138+
Six Sigma vs trailing-12-month IPO cohort, 5M external processor-
139+
hours, 21 fully-proven programs cross-validated by 3 independent
140+
provers, type-theory upstream contribution, full hyperpolymath
141+
standards conformance, unanimous 6+6 (humans + LLMs) signoff. The
142+
author expects approximately *zero* hyperpolymath toolchain
143+
components to hit A-grade in the first year of TRG operation.
144+
That is the correct order of magnitude.
145+
146+
* The Six Sigma vs IPO cohort methodology is novel. It has not been
147+
peer-reviewed by industrial statisticians. §10 of the spec
148+
explicitly invites challenge.
149+
150+
* The 6+6 LLM panel is novel in software-readiness rubrics. The
151+
combination of binding composition rules + reproducibility
152+
requirement + unanimous requirement is the load-bearing claim;
153+
no single piece is unprecedented but the combination is.
154+
155+
* The "21 fully-proven programs cross-validated by 3 provers"
156+
threshold for A-grade is plucked from intuition, not derived. It
157+
is acknowledged as a number-not-derivation in §10 and §11. Likely
158+
re-baselined against real telemetry from the first A-grade
159+
candidate before TRG v2.
160+
161+
* Per-language extension hook at §1.1 is the mechanism by which
162+
language-specific obligations (Eclexia tropical-semiring, AffineScript
163+
region/affinity, Idris2 totality discipline) are layered on top
164+
of the TRG baseline. Profiles MAY only tighten, never loosen.
165+
166+
== Standing rule
167+
168+
This self-assessment is itself a claim and is itself subject to the
169+
invariant-path doc-claims grounder. Re-run it after any change to
170+
this directory. If any file-path claim in this document fails to
171+
ground, the self-assessment is invalid until the gap is closed.

0 commit comments

Comments
 (0)