Skip to content

Doc: V5 migration notebook storyboard for the four-guarantee decision-lineage narrative #107

@caohy1988

Description

@caohy1988

Doc: V5 migration notebook storyboard for the four-guarantee decision-lineage narrative

Goal

Land a per-cell storyboard for examples/migration_v5_demo_notebook.ipynb that ties each narrative beat in the post-shipping decision-lineage demo to a concrete notebook cell. The storyboard ships alongside the last issue in the working plan, so the storytelling artifact is ready when the technical work is.

This issue does not rewrite the notebook. It produces a cell-by-cell plan (markdown + code-cell sketches) that a notebook author can apply mechanically.

What the user actually has to provide (minimum hand-authored input)

Before describing the storyboard, lock down what the demo asks the user to author. The demo will create or consume several other artifacts (auto-scaffolded binding.yaml, generated table_ddl.sql, plugin-emitted agent_events) — those don't count toward the user's authored-input floor because they're produced by tooling. The introduction needs to say "you write one file; the SDK and plugin produce the rest."

Mandatory today

  1. An ontology file. Either ontology.yaml (hand-authored), or a .ttl (OWL/SKOS) file run through gm import-owl to produce ontology.yaml. Mandatory because it defines what entities/relationships the SDK extracts from agent_events. There is no path that skips it.
  2. A binding file. Either hand-authored binding.yaml (mandatory if pointing at user-pre-defined tables — required for the Feat: bq-agent-sdk ontology-build --skip-property-graph to populate base tables only #104 / "user owns the graph" beat) or auto-generated by gm scaffold --ontology X.yaml --dataset Y --project Z --out outdir/ (works today, see bigquery_ontology/cli.py:453) for the SDK-creates-tables case. The scaffolder writes binding.yaml + table_ddl.sql and the binding is immediately valid as input to gm compile.
  3. Populated agent_events. Generated by the BQ AA plugin running an agent. Demo includes a sandbox-runner cell that produces this.

So the minimum hand-authored input is one file (an ontology in YAML or a .ttl). Generated/runtime artifacts that the demo also relies on, but that the user does not author: the auto-scaffolded binding.yaml, the generated table_ddl.sql (or SDK-created tables via CREATE TABLE IF NOT EXISTS), and the plugin-emitted agent_events. Two hand-authored YAML files are common but not required.

Optional today

  • Pre-existing BQ tables (the user owns DDL — used by Beat 1 of the post-shipping narrative).
  • Pre-existing CREATE PROPERTY GRAPH (also Beat 1).

After the five issues ship

The minimum input doesn't change. None of #58 / #75 / #76 / #104 / #105 reduces input requirements; they add safety, cost, and provenance guarantees over the same input set.

Future input-reduction option (not in any current issue, flagged for discussion)

Ship a default ADK-event ontology packaged with the SDK that covers the standard plugin event types (USER_MESSAGE_RECEIVED, LLM_RESPONSE, AGENT_COMPLETED, TOOL_STARTING, TOOL_COMPLETED, plus natural relationships like CAUSED, PRECEDED). A user with no domain-specific extraction needs could then run bq-agent-sdk ontology-build --ontology @builtin:adk-events --binding @scaffold:my-dataset and get a baseline graph with zero authored YAML. Domain authors would still write their own ontology when they want Decision, Customer, etc. extracted. This is a separate issue if anyone wants it; not on the current roadmap, but worth filing as an input-ergonomics follow-up if the demo audience asks "do I really need to write any YAML?"

The notebook's intro cell should say plainly: "Provide one ontology file (hand-authored YAML or OWL/SKOS TTL). Everything else — binding, table DDL, property graph — can be generated, owned by you, or both."

Four-guarantee narrative (recap)

Per the working plan, the post-shipping demo arc is own → validate → extract cheaply → resolve:

  1. OwnFeat: bq-agent-sdk ontology-build --skip-property-graph to populate base tables only #104: user owns CREATE PROPERTY GRAPH; SDK populates base tables only.
  2. ValidateFeat: validate ontology binding against existing BigQuery table schemas (pre-flight) #105 (pre-flight) + Feat: Ontology-aware validate_extracted_graph with fallback-scope classification (prerequisite for #75) #76 (post-extract): structured failure classification at both gates.
  3. Extract cheaplyEpic: Compile-time code generation for structured trace extractors (scoped rework) #75 Phase 1: deterministic compiled extractors for structured events; AI fallback only for semantic gaps and per-field validator failures.
  4. ResolveFeat: Runtime entity resolution primitives — OntologyRuntime, concept index, EntityResolver protocol (design proposal — feedback wanted) #58 (reader follow-on; emission already shipped in PR feat(ontology): Phase 1 — concept-index emission (issue #58, A1-A8 squashed) #92): SKOS concept-index lookups translate user-typed inputs to canonical concept names before the GQL query runs.

The storyboard maps each guarantee to specific cells.

Per-cell storyboard

Cell numbering is suggestive; the notebook author can renumber. Cell types are [md] (markdown) and [py] (code).

Section 0 — Setup and inputs

Cell Type Purpose Implementation note
0.1 md Title + four-guarantee narrative diagram (own / validate / extract / resolve). Reuse the table from the working plan summary.
0.2 md "What you need to bring." State the minimum-input answer above: one ontology file + plugin-populated agent_events. Show three input shapes side by side: (a) hand-authored ontology.yaml, (b) .ttl source, (c) future @builtin:adk-events. Anchors the audience expectation early.
0.3 py Generate (or load) sandbox agent_events with multiple sessions emitting BkaDecisionEvent-like events across linked sessions. Reuse the existing sandbox runner; pin session IDs for reproducibility.
0.4 py Show the input ontology file. If using TTL, show gm import-owl invocation producing ontology.yaml. Concrete file paths from the demo's ontology, e.g. examples/ymgo.ontology.yaml.
0.5 py Run gm scaffold --ontology X.yaml --dataset Y --project Z --out outdir/ to auto-generate binding.yaml + table_ddl.sql. Demonstrate the "one file in, two files out" minimum path. All four flags are required by the current CLI (bigquery_ontology/cli.py:453) — --dataset and --project are not optional. (For the user-owns-tables narrative in Beat 1, replace this cell with a hand-authored binding pointing at pre-existing tables.) Calls out the auto-scaffold option explicitly so audiences who don't have pre-existing tables understand they don't need to hand-author both files.

Section 1 — Beat 1: "You own the graph definition" (#104)

Cell Type Purpose Implementation note
1.1 md The pre-shipping demo had the SDK run CREATE OR REPLACE PROPERTY GRAPH on every build. Show the user's authored DDL instead. Frame as "platform team owns DDL; SDK owns population."
1.2 py Print the user's CREATE PROPERTY GRAPH DDL (loaded from a file in the notebook directory). Apply it to the demo dataset. Use a static SQL file like examples/migration_v5/property_graph.sql.
1.3 py Capture the "before" baseline after cell 1.2's authored DDL has finished and immediately before cell 1.4 runs the build. Bind a single timestamp variable: before_skip_build_ts = CURRENT_TIMESTAMP() (or read the most recent INFORMATION_SCHEMA.JOBS_BY_PROJECT.creation_time from this dataset). Then run a small GQL traversal against the user's existing property graph and save the row count. Critical that this timestamp is captured after cell 1.2's CREATE PROPERTY GRAPH job has completed, otherwise the JOBS_BY_PROJECT filter in cell 1.5 will catch the user's own authored DDL job and produce a false positive. JOBS_BY_PROJECT-based evidence is robust and column-stable. BigQuery does have an INFORMATION_SCHEMA.PROPERTY_GRAPHS view, but its column set isn't part of this demo's evidence so we don't depend on it.
1.4 py Run the build with the new flag: bq-agent-sdk ontology-build --skip-property-graph --ontology X --binding Y --session-ids ... Note: this flag is gated on #104 landing.
1.5 py Capture the "after" state with two reads: (a) query INFORMATION_SCHEMA.JOBS_BY_PROJECT WHERE creation_time > @before_skip_build_ts AND query LIKE '%CREATE OR REPLACE PROPERTY GRAPH%' (using the variable bound in 1.3) and assert zero rows, (b) re-run the GQL traversal and assert it still returns the same row count. Bound timestamp guarantees we only count DDL jobs created during cell 1.4, not the authored DDL from cell 1.2. Two-pronged evidence: no SDK-issued DDL job ran, and the graph still works. Doesn't depend on PROPERTY_GRAPHS column shape.
1.6 md One-paragraph callout: "the SDK populated the base tables; the graph object you defined is intact." Closes Beat 1.

Section 2 — Beat 2: "Pre-flight catches binding drift before you spend a dollar" (#105)

Cell Type Purpose Implementation note
2.1 md Set up the failure case: a column was renamed in the BQ table, but the binding YAML still references the old name. Frame the cost claim: pre-flight runs in under a second.
2.2 py Apply a one-line ALTER TABLE ... RENAME COLUMN to a sandbox node table. Show the diff. Reproducible failure injection.
2.3 py Run bq-agent-sdk binding-validate --ontology X --binding Y --project .... Show the structured failure list with binding_path and bq_ref fields. Gated on #105 PR 2b landing.
2.4 py Restore the column or fix the binding. Re-run binding-validate; assert report.ok == True. The "fix → re-run" loop.
2.5 py Optional: show a cross-project binding (entity source fully qualified to a different project than binding.target.project) validating clean. Demonstrates _qualify_source behavior; small but a common real-world setup.
2.6 py Run ontology-build --skip-property-graph --validate-binding --ontology X --binding Y --session-ids ... end-to-end. Combines Beats 1 and 2 in a single invocation.
2.7 md Callout: "extraction never started until the physical and logical worlds lined up." Closes Beat 2.

Section 3 — Beat 3: "Structured events extract deterministically; LLM only fills the gaps" (#75 Phase 1, gated on #76)

This is the largest section because it's the biggest narrative change.

Cell Type Purpose Implementation note
3.1 md Today's cost story: ontology extraction is session-aggregated AI.GENERATE (one call per session, not one per event — see ontology_graph.py:100, 631). The cost driver is sessions × tokens-per-session, where token-per-session grows with the structured-event payload size. Set up the contrast.
3.2 py Show the per-session token table from a build before compiled extractors ran. Frame the cost claim as "sessions whose transcripts contain structured BkaDecisionEvent payloads contributed to the session-level AI extraction cost," not "each event cost N tokens." Per-event token attribution is not available from current architecture; the session is the cost unit. Use stored output from a prior build; load from a file.
3.3 py Run gm compile-extractors --ontology X --binding Y --event-schemas Z to compile a deterministic extractor for BkaDecisionEvent. Show the measurement report comparing F1 / fallback rate / latency vs. the hand-written and AI.GENERATE baselines on a real trace corpus. Gated on #75 PR 4b (compile harness) + PR 4c (first compiled extractor + measurement report) landing. This is C1 per the working plan. The cell stops at "compile + measure" — does not yet show the build using the compiled bundle.
3.4 py Show the compiled bundle's fingerprint (compile_fingerprint 64 hex; compile_id == compile_fingerprint[:12]). Show the bundle's path on disk. Re-run compile; assert byte-identical output. Reproducibility proof. Still C1 — no runtime path needed.
3.5 py Re-run the build with compiled extractors actually loaded by the runtime. Show the new per-session token table. Per the C2 decision (Option A — prune compiled-extractable payloads, validation-gated): spans whose compiled output passes the #76 validator are excluded from the AI.GENERATE transcript (fully_handled_span_ids); spans with partial coverage stay in the prompt with a focused hint (partially_handled_span_ids); validation-failed spans fall back to the AI path. Token cost drops in proportion to the fraction of the transcript whose compiled output validated cleanly. For sessions composed entirely of compiled-extractable + validating events, cost approaches zero. Gated on #75 C2 runtime bundle-loading path — compiled-bundle loader emits a StructuredExtractionResult with fully_handled / partially_handled partitioning; ontology_graph.py:540–571 already does the prune + hint. If C2 hasn't landed, this cell becomes a "skipped — requires #75 C2" placeholder.
3.6 py Show the post-extract ValidationReport from #76: walk through one example each of FailureCode.FIELD, NODE, EDGE, EVENT, and the per-field fallback that re-extracted the FIELD case via AI. Gated on #76 PR 3a landing. Use synthetic fixtures that exercise each scope.
3.7 py Print the savings table. Per-session rows with columns (session_id, transcript_chars_before, transcript_chars_after, estimated_prompt_tokens_before, estimated_prompt_tokens_after, compiled_spans, partial_fallback_count, full_fallback_count), plus a job-level row from INFORMATION_SCHEMA.JOBS_BY_PROJECT (total_bytes_processed, total_slot_ms, duration) for the build's BQ jobs. Note explicitly: per-session token columns are prompt-size estimates computed from the transcript that was actually sent to AI.GENERATE (after excluded_span_ids pruning), not exact billing usage. Real AI billing is job-level — the BQ AI query at ontology_graph.py:90–136 returns only (session_id, graph_json) and exposes no per-row usage metadata. Aggregate roll-up grouped by dominant structured-event-type coverage is a coverage proxy, not a token claim. Honest column set given current instrumentation. If C2 (or a follow-up) adds explicit per-session token usage capture (e.g., a usage_metadata column in the AI query output), this cell upgrades to exact billing — flag as a future enhancement, not blocking the demo.
3.8 md Callout: "structured events are now deterministic and reproducible; AI extraction handles the open-ended events; per-field fallback keeps the safety net." Closes Beat 3.

Section 4 — Beat 4: "User-typed inputs resolve to canonical concepts" (#58 reader follow-on; emission already in #92)

Cell Type Purpose Implementation note
4.1 md The graph is populated; now the user wants to ask questions in their own words. Frame the resolution problem.
4.2 py Show the concept-index tables emitted by gm compile --emit-concept-index ... — main + __meta provenance sibling, with compile_id (12 hex) + compile_fingerprint (64 hex). Already shipped in #92. Reuse the storage proof from #92.
4.3 py Use the OntologyRuntime reader: resolve "Consumer Banking" → canonical skos_RetailBanking via skos:altLabel matching. Print the match score and the path that fired. Gated on #58 reader follow-on landing.
4.4 py Build a GQL traversal using the resolved canonical name. Show prior → current decision edges across sessions, scoped to retail-banking decisions. This is the climax query of the original demo, now with concept-aware filtering.
4.5 md Callout: "the user typed natural language; the SDK resolved it via the SKOS taxonomy you authored once and emitted alongside the property graph." Closes Beat 4.

Section 5 — Close

Cell Type Purpose
5.1 md Recap: own / validate / extract cheaply / resolve, with the four "before vs after" claims from the working plan summary.
5.2 md Pointers to each issue (#58, #75, #76, #104, #105) so audience members can dig into the technical work.
5.3 md What's next: future minimum-input reduction (default ADK-event ontology), Phase 2 of #75 (session-aggregated compilation), additional resolver layers in #58.

Conditional cells (gating)

Some cells depend on issues that haven't shipped yet. The notebook author should put each beat behind a feature-flag cell at the top:

FEATURES = {
    "skip_property_graph": True,    # gated on #104
    "binding_validate": True,       # gated on #105
    "validate_extracted_graph": True,   # gated on #76
    "compiled_extractors_c1": True, # gated on #75 PR 4b/4c — compile harness + measurement (cells 3.3, 3.4)
    "compiled_extractors_c2": True, # gated on #75 C2 — runtime bundle-loading (cell 3.5)
    "concept_index_reader": True,   # gated on #58 reader follow-on
}

The split between _c1 and _c2 lets the storyboard land "compile + measure" before "build uses compiled bundles" — the measurement report from C1 is itself a publishable artifact and doesn't require the runtime path.

When a feature is False, the notebook can display(Markdown("Skipped: requires #X.")) instead of failing. Lets the storyboard land before all five issues ship.

Acceptance criteria

  • examples/migration_v5_demo_notebook.ipynb has a clear cell-by-cell structure matching Sections 0–5 above (some renumbering allowed).
  • Section 0 includes the "what you need to bring" explanation: one ontology file + plugin-populated agent_events, with auto-scaffolded binding shown as the default path.
  • Each of the four guarantee sections has both a markdown framing cell and at least one code cell that produces visible evidence (output table, fingerprint, validation report, or query result).
  • Conditional gating cells let the notebook execute end-to-end with any subset of FEATURES enabled, so the storyboard can be merged before the last issue ships.
  • Every code cell uses real (or representative) BQ data, not stubs, when its gating feature is True.
  • Each beat has a one-paragraph closing markdown cell tying back to the four-guarantee narrative.
  • The notebook's introduction explicitly states "two-YAML-file" is not required — link to gm scaffold and gm import-owl as input-reduction tools.

Out of scope

  • Implementing the future default ADK-event ontology. Flagged in Section 5.3 as a separate follow-up.
  • Rewriting the original V5 demo (examples/ontology_graph_v5_demo.ipynb). The migration notebook is separate.
  • Animating the four guarantees as a slide deck. The notebook is the authoritative artifact; downstream slide decks can lift content from it.
  • Authoring the property-graph DDL fixture file (examples/migration_v5/property_graph.sql). Will need to be created as part of executing this issue but isn't a separate issue.

Effort

~1.5 eng-days for a single notebook author. The cell structure is mechanical once the storyboard is approved; the time is mostly in producing real outputs (token tables, validation reports, fingerprints, GQL results) once the underlying issues land.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions