Doc: V5 migration notebook storyboard for the four-guarantee decision-lineage narrative

# Doc: V5 migration notebook storyboard for the four-guarantee decision-lineage narrative

## Goal

Land a per-cell storyboard for `examples/migration_v5_demo_notebook.ipynb` that ties each narrative beat in the post-shipping decision-lineage demo to a concrete notebook cell. The storyboard ships **alongside** the last issue in the [working plan](https://github.com/GoogleCloudPlatform/BigQuery-Agent-Analytics-SDK/issues/96#issuecomment-4363301699), so the storytelling artifact is ready when the technical work is.

This issue does **not** rewrite the notebook. It produces a cell-by-cell plan (markdown + code-cell sketches) that a notebook author can apply mechanically.

## What the user actually has to provide (minimum hand-authored input)

Before describing the storyboard, lock down what the demo asks the user to **author**. The demo will create or consume several other artifacts (auto-scaffolded `binding.yaml`, generated `table_ddl.sql`, plugin-emitted `agent_events`) — those don't count toward the user's authored-input floor because they're produced by tooling. The introduction needs to say "you write one file; the SDK and plugin produce the rest."

### Mandatory today

1. **An ontology file.** Either `ontology.yaml` (hand-authored), or a `.ttl` (OWL/SKOS) file run through `gm import-owl` to produce `ontology.yaml`. Mandatory because it defines what entities/relationships the SDK extracts from `agent_events`. There is no path that skips it.
2. **A binding file.** Either hand-authored `binding.yaml` (mandatory if pointing at user-pre-defined tables — required for the #104 / "user owns the graph" beat) **or** auto-generated by `gm scaffold --ontology X.yaml --dataset Y --project Z --out outdir/` (works today, see [`bigquery_ontology/cli.py:453`](https://github.com/GoogleCloudPlatform/BigQuery-Agent-Analytics-SDK/blob/main/src/bigquery_ontology/cli.py#L453)) for the SDK-creates-tables case. The scaffolder writes `binding.yaml` + `table_ddl.sql` and the binding is immediately valid as input to `gm compile`.
3. **Populated `agent_events`.** Generated by the BQ AA plugin running an agent. Demo includes a sandbox-runner cell that produces this.

So the **minimum hand-authored input** is **one file** (an ontology in YAML or a `.ttl`). Generated/runtime artifacts that the demo also relies on, but that the user does not author: the auto-scaffolded `binding.yaml`, the generated `table_ddl.sql` (or SDK-created tables via `CREATE TABLE IF NOT EXISTS`), and the plugin-emitted `agent_events`. Two hand-authored YAML files are common but not required.

### Optional today

- Pre-existing BQ tables (the user owns DDL — used by Beat 1 of the post-shipping narrative).
- Pre-existing `CREATE PROPERTY GRAPH` (also Beat 1).

### After the five issues ship

The minimum input doesn't change. None of #58 / #75 / #76 / #104 / #105 reduces input requirements; they add safety, cost, and provenance guarantees over the same input set.

### Future input-reduction option (not in any current issue, flagged for discussion)

Ship a **default ADK-event ontology** packaged with the SDK that covers the standard plugin event types (`USER_MESSAGE_RECEIVED`, `LLM_RESPONSE`, `AGENT_COMPLETED`, `TOOL_STARTING`, `TOOL_COMPLETED`, plus natural relationships like `CAUSED`, `PRECEDED`). A user with no domain-specific extraction needs could then run `bq-agent-sdk ontology-build --ontology @builtin:adk-events --binding @scaffold:my-dataset` and get a baseline graph with **zero authored YAML**. Domain authors would still write their own ontology when they want `Decision`, `Customer`, etc. extracted. This is a separate issue if anyone wants it; not on the current roadmap, but worth filing as an input-ergonomics follow-up if the demo audience asks "do I really need to write any YAML?"

The notebook's intro cell should say plainly: **"Provide one ontology file (hand-authored YAML or OWL/SKOS TTL). Everything else — binding, table DDL, property graph — can be generated, owned by you, or both."**

## Four-guarantee narrative (recap)

Per the [working plan](https://github.com/GoogleCloudPlatform/BigQuery-Agent-Analytics-SDK/issues/96#issuecomment-4363301699), the post-shipping demo arc is **own → validate → extract cheaply → resolve**:

1. **Own** — *#104*: user owns `CREATE PROPERTY GRAPH`; SDK populates base tables only.
2. **Validate** — *#105 (pre-flight)* + *#76 (post-extract)*: structured failure classification at both gates.
3. **Extract cheaply** — *#75 Phase 1*: deterministic compiled extractors for structured events; AI fallback only for semantic gaps and per-field validator failures.
4. **Resolve** — *#58* (reader follow-on; emission already shipped in PR #92): SKOS concept-index lookups translate user-typed inputs to canonical concept names before the GQL query runs.

The storyboard maps each guarantee to specific cells.

## Per-cell storyboard

Cell numbering is suggestive; the notebook author can renumber. Cell types are `[md]` (markdown) and `[py]` (code).

### Section 0 — Setup and inputs

| Cell | Type | Purpose | Implementation note |
|---|---|---|---|
| 0.1 | md | Title + four-guarantee narrative diagram (own / validate / extract / resolve). | Reuse the table from the working plan summary. |
| 0.2 | md | **"What you need to bring."** State the minimum-input answer above: one ontology file + plugin-populated `agent_events`. Show three input shapes side by side: (a) hand-authored `ontology.yaml`, (b) `.ttl` source, (c) future `@builtin:adk-events`. | Anchors the audience expectation early. |
| 0.3 | py | Generate (or load) sandbox `agent_events` with multiple sessions emitting `BkaDecisionEvent`-like events across linked sessions. | Reuse the existing sandbox runner; pin session IDs for reproducibility. |
| 0.4 | py | Show the input ontology file. If using TTL, show `gm import-owl` invocation producing `ontology.yaml`. | Concrete file paths from the demo's ontology, e.g. `examples/ymgo.ontology.yaml`. |
| 0.5 | py | Run `gm scaffold --ontology X.yaml --dataset Y --project Z --out outdir/` to auto-generate `binding.yaml` + `table_ddl.sql`. Demonstrate the "one file in, two files out" minimum path. All four flags are required by the current CLI (`bigquery_ontology/cli.py:453`) — `--dataset` and `--project` are not optional. *(For the user-owns-tables narrative in Beat 1, replace this cell with a hand-authored binding pointing at pre-existing tables.)* | Calls out the auto-scaffold option explicitly so audiences who don't have pre-existing tables understand they don't need to hand-author both files. |

### Section 1 — Beat 1: "You own the graph definition" (#104)

| Cell | Type | Purpose | Implementation note |
|---|---|---|---|
| 1.1 | md | The pre-shipping demo had the SDK run `CREATE OR REPLACE PROPERTY GRAPH` on every build. Show the user's authored DDL instead. | Frame as "platform team owns DDL; SDK owns population." |
| 1.2 | py | Print the user's `CREATE PROPERTY GRAPH` DDL (loaded from a file in the notebook directory). Apply it to the demo dataset. | Use a static SQL file like `examples/migration_v5/property_graph.sql`. |
| 1.3 | py | Capture the "before" baseline **after** cell 1.2's authored DDL has finished and **immediately before** cell 1.4 runs the build. Bind a single timestamp variable: `before_skip_build_ts = CURRENT_TIMESTAMP()` (or read the most recent `INFORMATION_SCHEMA.JOBS_BY_PROJECT.creation_time` from this dataset). Then run a small GQL traversal against the user's existing property graph and save the row count. | **Critical that this timestamp is captured _after_ cell 1.2's `CREATE PROPERTY GRAPH` job has completed**, otherwise the JOBS_BY_PROJECT filter in cell 1.5 will catch the user's own authored DDL job and produce a false positive. `JOBS_BY_PROJECT`-based evidence is robust and column-stable. BigQuery does have an `INFORMATION_SCHEMA.PROPERTY_GRAPHS` view, but its column set isn't part of this demo's evidence so we don't depend on it. |
| 1.4 | py | Run the build with the new flag: `bq-agent-sdk ontology-build --skip-property-graph --ontology X --binding Y --session-ids ...` | Note: this flag is gated on **#104** landing. |
| 1.5 | py | Capture the "after" state with two reads: (a) query `INFORMATION_SCHEMA.JOBS_BY_PROJECT WHERE creation_time > @before_skip_build_ts AND query LIKE '%CREATE OR REPLACE PROPERTY GRAPH%'` (using the variable bound in 1.3) and assert zero rows, (b) re-run the GQL traversal and assert it still returns the same row count. | Bound timestamp guarantees we only count DDL jobs created during cell 1.4, not the authored DDL from cell 1.2. Two-pronged evidence: no SDK-issued DDL job ran, and the graph still works. Doesn't depend on `PROPERTY_GRAPHS` column shape. |
| 1.6 | md | One-paragraph callout: "the SDK populated the base tables; the graph object you defined is intact." | Closes Beat 1. |

### Section 2 — Beat 2: "Pre-flight catches binding drift before you spend a dollar" (#105)

| Cell | Type | Purpose | Implementation note |
|---|---|---|---|
| 2.1 | md | Set up the failure case: a column was renamed in the BQ table, but the binding YAML still references the old name. | Frame the cost claim: pre-flight runs in under a second. |
| 2.2 | py | Apply a one-line `ALTER TABLE ... RENAME COLUMN` to a sandbox node table. Show the diff. | Reproducible failure injection. |
| 2.3 | py | Run `bq-agent-sdk binding-validate --ontology X --binding Y --project ...`. Show the structured failure list with `binding_path` and `bq_ref` fields. | Gated on **#105 PR 2b** landing. |
| 2.4 | py | Restore the column or fix the binding. Re-run `binding-validate`; assert `report.ok == True`. | The "fix → re-run" loop. |
| 2.5 | py | Optional: show a cross-project binding (entity source fully qualified to a different project than `binding.target.project`) validating clean. | Demonstrates `_qualify_source` behavior; small but a common real-world setup. |
| 2.6 | py | Run `ontology-build --skip-property-graph --validate-binding --ontology X --binding Y --session-ids ...` end-to-end. | Combines Beats 1 and 2 in a single invocation. |
| 2.7 | md | Callout: "extraction never started until the physical and logical worlds lined up." | Closes Beat 2. |

### Section 3 — Beat 3: "Structured events extract deterministically; LLM only fills the gaps" (#75 Phase 1, gated on #76)

This is the largest section because it's the biggest narrative change.

| Cell | Type | Purpose | Implementation note |
|---|---|---|---|
| 3.1 | md | Today's cost story: ontology extraction is **session-aggregated** `AI.GENERATE` (one call per session, not one per event — see [`ontology_graph.py:100, 631`](https://github.com/GoogleCloudPlatform/BigQuery-Agent-Analytics-SDK/blob/main/src/bigquery_agent_analytics/ontology_graph.py#L100)). The cost driver is sessions × tokens-per-session, where token-per-session grows with the structured-event payload size. Set up the contrast. | |
| 3.2 | py | Show the per-**session** token table from a build *before* compiled extractors ran. Frame the cost claim as "sessions whose transcripts contain structured `BkaDecisionEvent` payloads contributed to the session-level AI extraction cost," not "each event cost N tokens." Per-event token attribution is not available from current architecture; the session is the cost unit. | Use stored output from a prior build; load from a file. |
| 3.3 | py | Run `gm compile-extractors --ontology X --binding Y --event-schemas Z` to compile a deterministic extractor for `BkaDecisionEvent`. Show the measurement report comparing F1 / fallback rate / latency vs. the hand-written and `AI.GENERATE` baselines on a real trace corpus. | Gated on **#75 PR 4b** (compile harness) + **PR 4c** (first compiled extractor + measurement report) landing. This is **C1** per the working plan. The cell stops at "compile + measure" — does not yet show the build using the compiled bundle. |
| 3.4 | py | Show the compiled bundle's fingerprint (`compile_fingerprint` 64 hex; `compile_id == compile_fingerprint[:12]`). Show the bundle's path on disk. Re-run compile; assert byte-identical output. | Reproducibility proof. Still C1 — no runtime path needed. |
| 3.5 | py | Re-run the build with compiled extractors actually loaded by the runtime. Show the new per-session token table. Per the [C2 decision (Option A — prune compiled-extractable payloads, validation-gated)](https://github.com/GoogleCloudPlatform/BigQuery-Agent-Analytics-SDK/issues/75#issuecomment-4363361603): spans whose compiled output passes the #76 validator are excluded from the `AI.GENERATE` transcript (`fully_handled_span_ids`); spans with partial coverage stay in the prompt with a focused hint (`partially_handled_span_ids`); validation-failed spans fall back to the AI path. Token cost drops in proportion to the fraction of the transcript whose compiled output validated cleanly. For sessions composed entirely of compiled-extractable + validating events, cost approaches zero. | Gated on **#75 C2 runtime bundle-loading path** — compiled-bundle loader emits a `StructuredExtractionResult` with `fully_handled` / `partially_handled` partitioning; `ontology_graph.py:540–571` already does the prune + hint. If C2 hasn't landed, this cell becomes a "skipped — requires #75 C2" placeholder. |
| 3.6 | py | Show the post-extract `ValidationReport` from #76: walk through one example each of `FailureCode.FIELD`, `NODE`, `EDGE`, `EVENT`, and the per-field fallback that re-extracted the FIELD case via AI. | Gated on **#76 PR 3a** landing. Use synthetic fixtures that exercise each scope. |
| 3.7 | py | Print the savings table. **Per-session rows** with columns `(session_id, transcript_chars_before, transcript_chars_after, estimated_prompt_tokens_before, estimated_prompt_tokens_after, compiled_spans, partial_fallback_count, full_fallback_count)`, plus a **job-level** row from `INFORMATION_SCHEMA.JOBS_BY_PROJECT` (`total_bytes_processed`, `total_slot_ms`, duration) for the build's BQ jobs. Note explicitly: per-session token columns are **prompt-size estimates** computed from the transcript that was actually sent to `AI.GENERATE` (after `excluded_span_ids` pruning), not exact billing usage. Real AI billing is **job-level** — the BQ AI query at `ontology_graph.py:90–136` returns only `(session_id, graph_json)` and exposes no per-row usage metadata. Aggregate roll-up grouped by **dominant structured-event-type coverage** is a coverage proxy, not a token claim. | Honest column set given current instrumentation. If C2 (or a follow-up) adds explicit per-session token usage capture (e.g., a `usage_metadata` column in the AI query output), this cell upgrades to exact billing — flag as a future enhancement, not blocking the demo. |
| 3.8 | md | Callout: "structured events are now deterministic and reproducible; AI extraction handles the open-ended events; per-field fallback keeps the safety net." | Closes Beat 3. |

### Section 4 — Beat 4: "User-typed inputs resolve to canonical concepts" (#58 reader follow-on; emission already in #92)

| Cell | Type | Purpose | Implementation note |
|---|---|---|---|
| 4.1 | md | The graph is populated; now the user wants to ask questions in their own words. Frame the resolution problem. | |
| 4.2 | py | Show the concept-index tables emitted by `gm compile --emit-concept-index ...` — main + `__meta` provenance sibling, with `compile_id` (12 hex) + `compile_fingerprint` (64 hex). Already shipped in #92. | Reuse the storage proof from #92. |
| 4.3 | py | Use the `OntologyRuntime` reader: resolve `"Consumer Banking"` → canonical `skos_RetailBanking` via `skos:altLabel` matching. Print the match score and the path that fired. | Gated on **#58 reader follow-on** landing. |
| 4.4 | py | Build a GQL traversal using the resolved canonical name. Show prior → current decision edges across sessions, scoped to retail-banking decisions. | This is the climax query of the original demo, now with concept-aware filtering. |
| 4.5 | md | Callout: "the user typed natural language; the SDK resolved it via the SKOS taxonomy you authored once and emitted alongside the property graph." | Closes Beat 4. |

### Section 5 — Close

| Cell | Type | Purpose |
|---|---|---|
| 5.1 | md | Recap: own / validate / extract cheaply / resolve, with the four "before vs after" claims from the working plan summary. |
| 5.2 | md | Pointers to each issue (#58, #75, #76, #104, #105) so audience members can dig into the technical work. |
| 5.3 | md | What's next: future minimum-input reduction (default ADK-event ontology), Phase 2 of #75 (session-aggregated compilation), additional resolver layers in #58. |

## Conditional cells (gating)

Some cells depend on issues that haven't shipped yet. The notebook author should put each beat behind a feature-flag cell at the top:

```python
FEATURES = {
    "skip_property_graph": True,    # gated on #104
    "binding_validate": True,       # gated on #105
    "validate_extracted_graph": True,   # gated on #76
    "compiled_extractors_c1": True, # gated on #75 PR 4b/4c — compile harness + measurement (cells 3.3, 3.4)
    "compiled_extractors_c2": True, # gated on #75 C2 — runtime bundle-loading (cell 3.5)
    "concept_index_reader": True,   # gated on #58 reader follow-on
}
```

The split between `_c1` and `_c2` lets the storyboard land "compile + measure" before "build uses compiled bundles" — the measurement report from C1 is itself a publishable artifact and doesn't require the runtime path.

When a feature is `False`, the notebook can `display(Markdown("Skipped: requires #X."))` instead of failing. Lets the storyboard land before all five issues ship.

## Acceptance criteria

- [ ] `examples/migration_v5_demo_notebook.ipynb` has a clear cell-by-cell structure matching Sections 0–5 above (some renumbering allowed).
- [ ] Section 0 includes the **"what you need to bring"** explanation: one ontology file + plugin-populated `agent_events`, with auto-scaffolded binding shown as the default path.
- [ ] Each of the four guarantee sections has both a markdown framing cell and at least one code cell that produces visible evidence (output table, fingerprint, validation report, or query result).
- [ ] Conditional gating cells let the notebook execute end-to-end with any subset of `FEATURES` enabled, so the storyboard can be merged before the last issue ships.
- [ ] Every code cell uses real (or representative) BQ data, not stubs, when its gating feature is `True`.
- [ ] Each beat has a one-paragraph closing markdown cell tying back to the four-guarantee narrative.
- [ ] The notebook's introduction explicitly states "two-YAML-file" is not required — link to `gm scaffold` and `gm import-owl` as input-reduction tools.

## Out of scope

- Implementing the future default ADK-event ontology. Flagged in Section 5.3 as a separate follow-up.
- Rewriting the original V5 demo (`examples/ontology_graph_v5_demo.ipynb`). The migration notebook is separate.
- Animating the four guarantees as a slide deck. The notebook is the authoritative artifact; downstream slide decks can lift content from it.
- Authoring the property-graph DDL fixture file (`examples/migration_v5/property_graph.sql`). Will need to be created as part of executing this issue but isn't a separate issue.

## Effort

~1.5 eng-days for a single notebook author. The cell structure is mechanical once the storyboard is approved; the time is mostly in producing real outputs (token tables, validation reports, fingerprints, GQL results) once the underlying issues land.

## Related

- Working plan: [#96, comment 4363301699](https://github.com/GoogleCloudPlatform/BigQuery-Agent-Analytics-SDK/issues/96#issuecomment-4363301699)
- #58, #75, #76, #104, #105 — the underlying issues; each gating their own section in the storyboard.
- PR #92 — concept-index emission already shipped; Beat 4 builds on it.
- `examples/migration_v5_demo_notebook.ipynb` — the target notebook this storyboard guides.






Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Doc: V5 migration notebook storyboard for the four-guarantee decision-lineage narrative #107

Doc: V5 migration notebook storyboard for the four-guarantee decision-lineage narrative

Goal

What the user actually has to provide (minimum hand-authored input)

Mandatory today

Optional today

After the five issues ship

Future input-reduction option (not in any current issue, flagged for discussion)

Four-guarantee narrative (recap)

Per-cell storyboard

Section 0 — Setup and inputs

Section 1 — Beat 1: "You own the graph definition" (#104)

Section 2 — Beat 2: "Pre-flight catches binding drift before you spend a dollar" (#105)

Section 3 — Beat 3: "Structured events extract deterministically; LLM only fills the gaps" (#75 Phase 1, gated on #76)

Section 4 — Beat 4: "User-typed inputs resolve to canonical concepts" (#58 reader follow-on; emission already in #92)

Section 5 — Close

Conditional cells (gating)

Acceptance criteria

Out of scope

Effort

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Cell	Type	Purpose	Implementation note
0.1	md	Title + four-guarantee narrative diagram (own / validate / extract / resolve).	Reuse the table from the working plan summary.
0.2	md	"What you need to bring." State the minimum-input answer above: one ontology file + plugin-populated `agent_events`. Show three input shapes side by side: (a) hand-authored `ontology.yaml`, (b) `.ttl` source, (c) future `@builtin:adk-events`.	Anchors the audience expectation early.
0.3	py	Generate (or load) sandbox `agent_events` with multiple sessions emitting `BkaDecisionEvent`-like events across linked sessions.	Reuse the existing sandbox runner; pin session IDs for reproducibility.
0.4	py	Show the input ontology file. If using TTL, show `gm import-owl` invocation producing `ontology.yaml`.	Concrete file paths from the demo's ontology, e.g. `examples/ymgo.ontology.yaml`.
0.5	py	Run `gm scaffold --ontology X.yaml --dataset Y --project Z --out outdir/` to auto-generate `binding.yaml` + `table_ddl.sql`. Demonstrate the "one file in, two files out" minimum path. All four flags are required by the current CLI (`bigquery_ontology/cli.py:453`) — `--dataset` and `--project` are not optional. (For the user-owns-tables narrative in Beat 1, replace this cell with a hand-authored binding pointing at pre-existing tables.)	Calls out the auto-scaffold option explicitly so audiences who don't have pre-existing tables understand they don't need to hand-author both files.

Cell	Type	Purpose	Implementation note
1.1	md	The pre-shipping demo had the SDK run `CREATE OR REPLACE PROPERTY GRAPH` on every build. Show the user's authored DDL instead.	Frame as "platform team owns DDL; SDK owns population."
1.2	py	Print the user's `CREATE PROPERTY GRAPH` DDL (loaded from a file in the notebook directory). Apply it to the demo dataset.	Use a static SQL file like `examples/migration_v5/property_graph.sql`.
1.3	py	Capture the "before" baseline after cell 1.2's authored DDL has finished and immediately before cell 1.4 runs the build. Bind a single timestamp variable: `before_skip_build_ts = CURRENT_TIMESTAMP()` (or read the most recent `INFORMATION_SCHEMA.JOBS_BY_PROJECT.creation_time` from this dataset). Then run a small GQL traversal against the user's existing property graph and save the row count.	*Critical that this timestamp is captured after* cell 1.2's `CREATE PROPERTY GRAPH` job has completed**, otherwise the JOBS_BY_PROJECT filter in cell 1.5 will catch the user's own authored DDL job and produce a false positive. `JOBS_BY_PROJECT`-based evidence is robust and column-stable. BigQuery does have an `INFORMATION_SCHEMA.PROPERTY_GRAPHS` view, but its column set isn't part of this demo's evidence so we don't depend on it.
1.4	py	Run the build with the new flag: `bq-agent-sdk ontology-build --skip-property-graph --ontology X --binding Y --session-ids ...`	Note: this flag is gated on #104 landing.
1.5	py	Capture the "after" state with two reads: (a) query `INFORMATION_SCHEMA.JOBS_BY_PROJECT WHERE creation_time > @before_skip_build_ts AND query LIKE '%CREATE OR REPLACE PROPERTY GRAPH%'` (using the variable bound in 1.3) and assert zero rows, (b) re-run the GQL traversal and assert it still returns the same row count.	Bound timestamp guarantees we only count DDL jobs created during cell 1.4, not the authored DDL from cell 1.2. Two-pronged evidence: no SDK-issued DDL job ran, and the graph still works. Doesn't depend on `PROPERTY_GRAPHS` column shape.
1.6	md	One-paragraph callout: "the SDK populated the base tables; the graph object you defined is intact."	Closes Beat 1.

Cell	Type	Purpose	Implementation note
2.1	md	Set up the failure case: a column was renamed in the BQ table, but the binding YAML still references the old name.	Frame the cost claim: pre-flight runs in under a second.
2.2	py	Apply a one-line `ALTER TABLE ... RENAME COLUMN` to a sandbox node table. Show the diff.	Reproducible failure injection.
2.3	py	Run `bq-agent-sdk binding-validate --ontology X --binding Y --project ...`. Show the structured failure list with `binding_path` and `bq_ref` fields.	Gated on #105 PR 2b landing.
2.4	py	Restore the column or fix the binding. Re-run `binding-validate`; assert `report.ok == True`.	The "fix → re-run" loop.
2.5	py	Optional: show a cross-project binding (entity source fully qualified to a different project than `binding.target.project`) validating clean.	Demonstrates `_qualify_source` behavior; small but a common real-world setup.
2.6	py	Run `ontology-build --skip-property-graph --validate-binding --ontology X --binding Y --session-ids ...` end-to-end.	Combines Beats 1 and 2 in a single invocation.
2.7	md	Callout: "extraction never started until the physical and logical worlds lined up."	Closes Beat 2.

Cell	Type	Purpose	Implementation note
3.1	md	Today's cost story: ontology extraction is session-aggregated `AI.GENERATE` (one call per session, not one per event — see `ontology_graph.py:100, 631`). The cost driver is sessions × tokens-per-session, where token-per-session grows with the structured-event payload size. Set up the contrast.
3.2	py	Show the per-session token table from a build before compiled extractors ran. Frame the cost claim as "sessions whose transcripts contain structured `BkaDecisionEvent` payloads contributed to the session-level AI extraction cost," not "each event cost N tokens." Per-event token attribution is not available from current architecture; the session is the cost unit.	Use stored output from a prior build; load from a file.
3.3	py	Run `gm compile-extractors --ontology X --binding Y --event-schemas Z` to compile a deterministic extractor for `BkaDecisionEvent`. Show the measurement report comparing F1 / fallback rate / latency vs. the hand-written and `AI.GENERATE` baselines on a real trace corpus.	Gated on #75 PR 4b (compile harness) + PR 4c (first compiled extractor + measurement report) landing. This is C1 per the working plan. The cell stops at "compile + measure" — does not yet show the build using the compiled bundle.
3.4	py	Show the compiled bundle's fingerprint (`compile_fingerprint` 64 hex; `compile_id == compile_fingerprint[:12]`). Show the bundle's path on disk. Re-run compile; assert byte-identical output.	Reproducibility proof. Still C1 — no runtime path needed.
3.5	py	Re-run the build with compiled extractors actually loaded by the runtime. Show the new per-session token table. Per the C2 decision (Option A — prune compiled-extractable payloads, validation-gated): spans whose compiled output passes the #76 validator are excluded from the `AI.GENERATE` transcript (`fully_handled_span_ids`); spans with partial coverage stay in the prompt with a focused hint (`partially_handled_span_ids`); validation-failed spans fall back to the AI path. Token cost drops in proportion to the fraction of the transcript whose compiled output validated cleanly. For sessions composed entirely of compiled-extractable + validating events, cost approaches zero.	Gated on #75 C2 runtime bundle-loading path — compiled-bundle loader emits a `StructuredExtractionResult` with `fully_handled` / `partially_handled` partitioning; `ontology_graph.py:540–571` already does the prune + hint. If C2 hasn't landed, this cell becomes a "skipped — requires #75 C2" placeholder.
3.6	py	Show the post-extract `ValidationReport` from #76: walk through one example each of `FailureCode.FIELD`, `NODE`, `EDGE`, `EVENT`, and the per-field fallback that re-extracted the FIELD case via AI.	Gated on #76 PR 3a landing. Use synthetic fixtures that exercise each scope.
3.7	py	Print the savings table. Per-session rows with columns `(session_id, transcript_chars_before, transcript_chars_after, estimated_prompt_tokens_before, estimated_prompt_tokens_after, compiled_spans, partial_fallback_count, full_fallback_count)`, plus a job-level row from `INFORMATION_SCHEMA.JOBS_BY_PROJECT` (`total_bytes_processed`, `total_slot_ms`, duration) for the build's BQ jobs. Note explicitly: per-session token columns are prompt-size estimates computed from the transcript that was actually sent to `AI.GENERATE` (after `excluded_span_ids` pruning), not exact billing usage. Real AI billing is job-level — the BQ AI query at `ontology_graph.py:90–136` returns only `(session_id, graph_json)` and exposes no per-row usage metadata. Aggregate roll-up grouped by dominant structured-event-type coverage is a coverage proxy, not a token claim.	Honest column set given current instrumentation. If C2 (or a follow-up) adds explicit per-session token usage capture (e.g., a `usage_metadata` column in the AI query output), this cell upgrades to exact billing — flag as a future enhancement, not blocking the demo.
3.8	md	Callout: "structured events are now deterministic and reproducible; AI extraction handles the open-ended events; per-field fallback keeps the safety net."	Closes Beat 3.

Cell	Type	Purpose	Implementation note
4.1	md	The graph is populated; now the user wants to ask questions in their own words. Frame the resolution problem.
4.2	py	Show the concept-index tables emitted by `gm compile --emit-concept-index ...` — main + `__meta` provenance sibling, with `compile_id` (12 hex) + `compile_fingerprint` (64 hex). Already shipped in #92.	Reuse the storage proof from #92.
4.3	py	Use the `OntologyRuntime` reader: resolve `"Consumer Banking"` → canonical `skos_RetailBanking` via `skos:altLabel` matching. Print the match score and the path that fired.	Gated on #58 reader follow-on landing.
4.4	py	Build a GQL traversal using the resolved canonical name. Show prior → current decision edges across sessions, scoped to retail-banking decisions.	This is the climax query of the original demo, now with concept-aware filtering.
4.5	md	Callout: "the user typed natural language; the SDK resolved it via the SKOS taxonomy you authored once and emitted alongside the property graph."	Closes Beat 4.

Cell	Type	Purpose
5.1	md	Recap: own / validate / extract cheaply / resolve, with the four "before vs after" claims from the working plan summary.
5.2	md	Pointers to each issue (#58, #75, #76, #104, #105) so audience members can dig into the technical work.
5.3	md	What's next: future minimum-input reduction (default ADK-event ontology), Phase 2 of #75 (session-aggregated compilation), additional resolver layers in #58.

Doc: V5 migration notebook storyboard for the four-guarantee decision-lineage narrative #107

Description

Doc: V5 migration notebook storyboard for the four-guarantee decision-lineage narrative

Goal

What the user actually has to provide (minimum hand-authored input)

Mandatory today

Optional today

After the five issues ship

Future input-reduction option (not in any current issue, flagged for discussion)

Four-guarantee narrative (recap)

Per-cell storyboard

Section 0 — Setup and inputs

Section 1 — Beat 1: "You own the graph definition" (#104)

Section 2 — Beat 2: "Pre-flight catches binding drift before you spend a dollar" (#105)

Section 3 — Beat 3: "Structured events extract deterministically; LLM only fills the gaps" (#75 Phase 1, gated on #76)

Section 4 — Beat 4: "User-typed inputs resolve to canonical concepts" (#58 reader follow-on; emission already in #92)

Section 5 — Close

Conditional cells (gating)

Acceptance criteria

Out of scope

Effort

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions