Commit 7c34289
authored
* docs: adopt Option 2 provenance columns for concept index (issue #58 finding 1)
Resolves the blocker from the issue #58 doc review: the 12-hex
``compile_id`` is not wide enough to serve as the strict-verification
integrity token. Under a 48-bit collision (or an out-of-band swap that
keeps both tables' short IDs aligned to a stale cache), the documented
checks terminate at ``__meta`` and never touch a ``main`` row — main
data from a different compile would pass as "verified."
Option 2 splits the provenance surface into two columns with distinct
roles, decided in issue #58#issuecomment-4311684137 and refined in
#issuecomment-4311716470:
- ``compile_id`` — 12 hex chars, display/debug token only. Used in
reports, queue rows, error messages, log lines. **Never** the sole
freshness check.
- ``compile_fingerprint`` — 64 hex chars, canonical integrity key.
Full SHA-256 over
``ontology_fingerprint || binding_fingerprint || compiler_version``.
Used for main↔meta pair consistency and runtime verification
against cached local fingerprints.
Structural invariant: ``compile_id == compile_fingerprint[:12]``,
enforced at the ``_fingerprint.py`` module boundary (short form
derived from full form, not the reverse). Makes it impossible to
ship a row where the display token does not correspond to the
integrity key.
Changes:
- ``entity_resolution_primitives.md`` §4.2: concept-index schema
gains ``compile_fingerprint STRING NOT NULL`` column; add a
two-role table pinning the column contract; state the invariant.
- §5 verification queries: rewritten to use ``compile_fingerprint``
on all three checks. No short-ID arithmetic on the verification
path.
- §10 W2 watchpoint: rewritten under the new contract. Old "48-bit
collision" caveat retired because the collision vector no longer
exists — strict queries never read ``compile_id``.
- §11 Decisions pinned: Option 2 recorded as a closed decision.
- ``implementation_plan_concept_index_runtime.md`` A1: updated to
describe the three exports (``fingerprint_model``,
``compile_fingerprint``, ``compile_id``) and the structural
derivation.
- A2, A4, A5: both provenance columns now explicitly listed.
- C4: TTL re-check query rewritten against ``compile_fingerprint``.
- D11, D14: test scope updated to match.
- W2 watchpoint (plan): rewritten to match the RFC.
Downstream sequencing unchanged:
1. ``_fingerprint.py`` extension (PR #71 additive update) — adds
``compile_fingerprint()`` as the primitive; ``compile_id()`` is
derived. Non-breaking.
2. A2 row builder consumes the two-column contract.
3. A3 emission SQL writes both columns.
4. B1/C4 verifier queries ``compile_fingerprint``.
No ``_fingerprint.py`` changes in this PR — that lands on #71.
* docs: address PR #80 review findings
Three fixes:
1. **Compile vs execute permissions (finding 1).** The plan's BQ-permissions
rollout note said ``--emit-concept-index requires bigquery.tables.create``
and called that an "existing compile_graph() requirement." Both parts
misstate the contract: ``gm compile`` emits SQL to stdout/``--output``
and never calls BigQuery. Rewritten to separate compile-time (local
file access only) from execute-time (the emitted SQL, run via ``bq
query`` / console / Airflow, needs ``bigquery.tables.create``).
Matches the actual existing contract for ``compile_graph()``.
2. **Exact payload contract (finding 2).** Prior wording in the RFC
table and plan A1 description described the payload as
``ontology_fingerprint || binding_fingerprint || compiler_version``,
where ``||`` is ambiguous — a literal reading is plain concatenation,
which is not what the implementation does. Now pinned as "SHA-256
over the NUL-delimited UTF-8 of the three inputs" in both places,
with the explicit instruction to call ``compile_fingerprint()``
rather than reimplement the payload. Paired with the golden-vector
test landing on PR #71.
3. **§7 RFC status row (finding 4).** The row for ``_fingerprint.py``
listed only ``fingerprint_model`` and ``compile_id``. Option 2
ships three exports; the row now names all three and states the
derivation relationship.
* docs(rfc): remove last compile/execute ambiguity in §1 directions table
Final nit from PR #80 review. The RFC's 'How the three directions
compose' table still had 'gm compile → DDL + concept-index SQL
published to BQ' on the Direction 1 row, which reads as if the
compile step itself publishes to BigQuery. Matches the permissions
note in the plan doc: compile emits SQL, operator executes it.
1 parent b1f1d29 commit 7c34289
2 files changed
Lines changed: 60 additions & 26 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
50 | 50 | | |
51 | 51 | | |
52 | 52 | | |
53 | | - | |
| 53 | + | |
54 | 54 | | |
55 | 55 | | |
56 | 56 | | |
| |||
123 | 123 | | |
124 | 124 | | |
125 | 125 | | |
126 | | - | |
127 | | - | |
128 | | - | |
129 | | - | |
130 | | - | |
131 | | - | |
132 | | - | |
133 | | - | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
134 | 135 | | |
135 | 136 | | |
136 | 137 | | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
137 | 147 | | |
138 | 148 | | |
139 | 149 | | |
| |||
201 | 211 | | |
202 | 212 | | |
203 | 213 | | |
204 | | - | |
205 | | - | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
206 | 219 | | |
207 | 220 | | |
208 | 221 | | |
209 | | - | |
| 222 | + | |
210 | 223 | | |
211 | 224 | | |
212 | 225 | | |
| |||
225 | 238 | | |
226 | 239 | | |
227 | 240 | | |
228 | | - | |
| 241 | + | |
229 | 242 | | |
230 | 243 | | |
231 | 244 | | |
| |||
285 | 298 | | |
286 | 299 | | |
287 | 300 | | |
288 | | - | |
| 301 | + | |
289 | 302 | | |
290 | 303 | | |
291 | 304 | | |
| |||
305 | 318 | | |
306 | 319 | | |
307 | 320 | | |
| 321 | + | |
308 | 322 | | |
309 | 323 | | |
310 | 324 | | |
| |||
0 commit comments