Skip to content

Commit 292320b

Browse files
authored
feat(ontology): add --skip-property-graph for user-owned graph DDL (#104) (#108)
* feat(ontology): add --skip-property-graph for user-owned graph DDL (#104) Lets users with their own CREATE PROPERTY GRAPH DDL — managed by Terraform, dbt, or hand-authored — populate base tables from BQ AA traces without overwriting the graph object on every run. Changes - ontology_orchestrator.build_ontology_graph gains skip_property_graph: bool = False. When True, phase 5 is not invoked: no OntologyPropertyGraphCompiler is constructed, no CREATE OR REPLACE PROPERTY GRAPH runs. - Result dict gains property_graph_status with values "created" / "failed" / "skipped:user_requested", plus skipped_reason ("user_requested") when phase 5 was skipped. - ontology-build CLI gains --skip-property-graph and threads property_graph_status through to the curated output dict so JSON consumers can distinguish "skipped" from "failed" without parsing stderr. - Exit handling: skipped_reason == "user_requested" exits 0 silently; the existing exit-1-with-error behavior is preserved for actual graph-creation failures. Tests - test_skip_property_graph_does_not_construct_compiler asserts the compiler class is never called (mock.assert_not_called) when the flag is set. - test_property_graph_status_created_on_success and test_property_graph_status_failed_on_compiler_false cover the two default-mode status values. - CLI tests cover exit 0 with status="skipped:user_requested", default skip_property_graph=False threading, and exit 1 with status="failed" on actual creation failure. 135/135 tests in test_ontology_orchestrator.py + test_cli.py pass. * docs+test: ontology-build doc + live skip-property-graph test (#104) Closes the two #104 acceptance gaps flagged on PR #108 review: (1) Docs missing - New docs/ontology/ontology-build.md documents the bq-agent-sdk ontology-build orchestrator end-to-end and the new --skip-property-graph flag. - Includes a status-field reference table mapping property_graph_status (created / failed / skipped:user_requested) to property_graph_created and CLI exit code. - Includes Python API example showing skip_property_graph=True with expected result-dict shape. (2) No gated live integration test - New TestSkipPropertyGraph class in tests/test_integration_ontology_binding.py. - Gated on RUN_LIVE_BIGQUERY_TESTS=1 like the existing live tests. - Sequence: create authored CREATE PROPERTY GRAPH directly via SQL (simulating Terraform/dbt-managed DDL), capture the post-DDL CURRENT_TIMESTAMP(), run build_ontology_graph(..., skip_property_graph=True), then query JOBS_BY_PROJECT for any 'CREATE OR REPLACE PROPERTY GRAPH' jobs in the post-timestamp window — assert zero. Also re-runs the showcase GQL query to confirm the user's graph object still works after the SDK run. - The timestamp is captured AFTER the authored DDL specifically to avoid the false-positive trap called out in #107 cell 1.3. * test+docs: harden live test, add text-format check, link doc (#104) Addresses three review findings on PR #108: (1) Live test now exercises real extraction/materialization - Pass dataset_id=_DATASET, table_id=_TABLE so extraction reads the production agent_events table where YMGO ADCP session data lives. Materializer still writes to scratch_dataset because spec entity sources arrive 3-part-qualified to binding.target.dataset via _qualify_source (resolved_spec.py:141). - Assert sum(rows_materialized.values()) > 0 to catch the silent- empty-graph trap where ontology_graph.py:683 returns an empty ExtractedGraph if extraction fails (e.g. wrong source dataset). (2) JOBS_BY_PROJECT assertion narrowed to the test's own graph - Filter by both 'CREATE OR REPLACE PROPERTY GRAPH' keyword AND the fully-qualified graph reference ({_PROJECT}.{scratch_dataset}.{spec.name}). Prevents false-fail on unrelated CREATE OR REPLACE PROPERTY GRAPH jobs running concurrently in the same project from other tests/developers. (3) docs/README.md gains a row for the new ontology-build doc. (4) New CLI test test_skip_property_graph_status_visible_in_text_format asserts property_graph_status appears in --format=text output, pinning the contract that the status field is not JSON-only. 7/7 ontology-build CLI tests pass. * test+docs: harden DDL-detection filter, soften DDL claims (#104) Addresses three review findings on PR #108: (1) Live test DDL-detection blind spot The previous filter required the regressed CREATE OR REPLACE PROPERTY GRAPH to target _PROJECT.<scratch_dataset>.<spec.name>. But if skip_property_graph regressed, the compiler would actually target _PROJECT._DATASET.<spec.name> (the orchestrator's dataset_id argument is _DATASET in this test, used for extraction of agent_events). The blind spot: a regression could fire DDL that the test would not catch. Fixed by replacing the fully-qualified-graph-ref filter with two narrower constraints that catch the regression in either dataset: - graph name (spec.name) — present in the DDL string regardless of which dataset the compiler targets - sdk_feature='ontology-gql' label — only SDK-issued property-graph jobs carry this label per ontology_property_graph.py:465; the test's setup CREATE PROPERTY GRAPH (issued via direct SQL) does not, so it does not trip the assertion (2) docs/ontology/ontology-build.md: document graph_ref limitation Added a "Known limitation" section noting that result["graph_ref"] reports the extraction dataset, not the binding's target dataset, in split source/target setups. The materialized base tables themselves still go to the binding's target dataset per the resolved spec; only the reported string is affected. (3) docs/ontology/ontology-build.md: soften DDL-options wording "additional indexes, dialect-specific options" was overreaching for BigQuery property graphs; tightened to "custom labels or other DDL details the SDK's compiler doesn't generate." 136/136 tests pass. * test: correct comment on label-filter rationale (#104) The previous comment claimed the test's setup CREATE PROPERTY GRAPH job did not carry the sdk_feature='ontology-gql' label. That was factually wrong: setup goes through OntologyPropertyGraphCompiler.create_property_graph() (line 387), which does carry the label. The test logic was already correct — the setup job is excluded by the post-setup timestamp captured in step 2, not by the label filter. The label filter excludes user-authored raw SQL DDL jobs (without SDK labels), which is its actual purpose. Only the comment needed to change. No code change. * style: apply autoformat to test files Run bash autoformat.sh (isort + pyink). Fixes the Format check CI job that was failing on PR #108. No behavior change.
1 parent 8c42683 commit 292320b

7 files changed

Lines changed: 574 additions & 13 deletions

File tree

docs/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,7 @@ architecture, rationale, and implementation plans behind key SDK features.
3636
| [ontology/compilation.md](ontology/compilation.md) | Compilation — resolving ontology + binding into backend DDL |
3737
| [ontology/cli.md](ontology/cli.md) | CLI design for the `gm` tool (validate, compile, import-owl) |
3838
| [ontology/owl-import.md](ontology/owl-import.md) | OWL import — converting OWL ontologies to YAML format |
39+
| [ontology/ontology-build.md](ontology/ontology-build.md) | `bq-agent-sdk ontology-build` orchestrator + `--skip-property-graph` reference |
3940

4041
## Deployment Surfaces
4142

docs/ontology/ontology-build.md

Lines changed: 88 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,88 @@
1+
# `bq-agent-sdk ontology-build` — End-to-End Orchestrator
2+
3+
`bq-agent-sdk ontology-build` runs the SDK's full ontology pipeline end-to-end against a populated `agent_events` table:
4+
5+
1. Load the spec (`--ontology X.yaml --binding Y.yaml`).
6+
2. Extract an `ExtractedGraph` from agent telemetry via `AI.GENERATE`.
7+
3. Create physical entity/relationship tables (`CREATE TABLE IF NOT EXISTS`).
8+
4. Materialize extracted nodes/edges into those tables.
9+
5. Run `CREATE OR REPLACE PROPERTY GRAPH` to wire the BigQuery property graph object.
10+
11+
The Python entry point is `bigquery_agent_analytics.ontology_orchestrator.build_ontology_graph(...)`. The CLI is a thin wrapper.
12+
13+
## Skipping property-graph DDL
14+
15+
Use `--skip-property-graph` when **the caller owns their own `CREATE PROPERTY GRAPH` DDL** — e.g., the property graph is provisioned via Terraform, dbt, or hand-authored SQL — and only wants the SDK to populate base tables.
16+
17+
```
18+
bq-agent-sdk ontology-build \
19+
--project-id my-project \
20+
--dataset-id my-dataset \
21+
--ontology my.ontology.yaml \
22+
--binding my-bq-prod.binding.yaml \
23+
--session-ids sess-1,sess-2 \
24+
--skip-property-graph
25+
```
26+
27+
Behavior with the flag set:
28+
29+
- Phase 5 short-circuits. No `OntologyPropertyGraphCompiler` is constructed, no `CREATE OR REPLACE PROPERTY GRAPH` job runs. The user's existing graph object is unchanged.
30+
- Phases 1–4 run normally. Tables are created (`CREATE TABLE IF NOT EXISTS` is a no-op against pre-existing tables) and rows are materialized.
31+
- The CLI exits 0.
32+
- The output dict reports:
33+
34+
```json
35+
{
36+
"property_graph_created": false,
37+
"property_graph_status": "skipped:user_requested",
38+
...
39+
}
40+
```
41+
42+
JSON consumers should read `property_graph_status` (not just `property_graph_created`) to distinguish a deliberate skip from a creation failure.
43+
44+
## Status field reference
45+
46+
The CLI's `property_graph_status` field has three values:
47+
48+
| `property_graph_status` | `property_graph_created` | Exit code | Meaning |
49+
|---|---|---|---|
50+
| `"created"` | `true` | 0 | Phase 5 ran and BigQuery confirmed the graph object. |
51+
| `"failed"` | `false` | 1 | Phase 5 ran but the graph object was not created. The CLI prints "Property Graph creation failed" to stderr. Tables and rows were still materialized. |
52+
| `"skipped:user_requested"` | `false` | 0 | `--skip-property-graph` was set. Phase 5 did not run. No error message. |
53+
54+
Without `--skip-property-graph`, the existing exit-1 behavior on graph-create failure is preserved exactly.
55+
56+
## When to use this
57+
58+
- **You already manage `CREATE PROPERTY GRAPH` in Terraform / dbt / a SQL file.** The SDK's `CREATE OR REPLACE PROPERTY GRAPH` would clobber your DDL on every run.
59+
- **Your property graph definition uses DDL details the SDK compiler doesn't emit.** You hand-authored the graph DDL to express custom labels or other DDL details the SDK's compiler doesn't generate.
60+
- **You want to populate your tables on a different cadence than you redefine the graph.** The graph definition rarely changes; the data is refreshed continuously.
61+
62+
For all other cases, leave the flag off and let the SDK manage the property graph end-to-end.
63+
64+
## Python API
65+
66+
The flag is also available on `build_ontology_graph(...)`:
67+
68+
```python
69+
from bigquery_agent_analytics.ontology_orchestrator import build_ontology_graph
70+
71+
result = build_ontology_graph(
72+
spec=resolved_spec,
73+
session_ids=["sess-1"],
74+
project_id="my-project",
75+
dataset_id="my-dataset",
76+
skip_property_graph=True, # phase 5 skipped
77+
)
78+
79+
assert result["property_graph_status"] == "skipped:user_requested"
80+
assert result["skipped_reason"] == "user_requested"
81+
assert result["property_graph_created"] is False
82+
```
83+
84+
`skipped_reason` is only present when the phase was skipped; it is omitted when phase 5 ran (whether or not it succeeded).
85+
86+
## Known limitation: `result["graph_ref"]` in split source/target setups
87+
88+
`build_ontology_graph(...)` accepts a single `dataset_id` and uses it both for extraction (where `agent_events` lives) and for the `graph_ref` reported in the result dict (`{project_id}.{dataset_id}.{name}`). When `--skip-property-graph` is set and the caller's actual property graph lives in `binding.target.dataset` (different from the `dataset_id` used for extraction), `result["graph_ref"]` reports the **extraction dataset**, not the user-owned graph's dataset. The materialized base tables themselves still go to `binding.target.dataset` per the resolved spec — this only affects the reported `graph_ref` string. Tracked as a follow-up; not blocking for `--skip-property-graph` itself since the user already knows where their authored graph lives.

src/bigquery_agent_analytics/cli.py

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1238,6 +1238,16 @@ def ontology_build(
12381238
no_ai_generate: bool = typer.Option(
12391239
False, help="Skip AI.GENERATE; fetch raw payloads instead."
12401240
),
1241+
skip_property_graph: bool = typer.Option(
1242+
False,
1243+
"--skip-property-graph",
1244+
help=(
1245+
"Skip CREATE OR REPLACE PROPERTY GRAPH. Use when the caller "
1246+
"owns their own property-graph DDL and only wants the SDK to "
1247+
"populate base tables. CLI exits 0 with "
1248+
"property_graph_status='skipped:user_requested'."
1249+
),
1250+
),
12411251
fmt: str = typer.Option(
12421252
"json",
12431253
"--format",
@@ -1261,6 +1271,7 @@ def ontology_build(
12611271
table_id=table_id,
12621272
endpoint=endpoint,
12631273
use_ai_generate=not no_ai_generate,
1274+
skip_property_graph=skip_property_graph,
12641275
)
12651276

12661277
output = {
@@ -1271,9 +1282,19 @@ def ontology_build(
12711282
"tables_created": result["tables_created"],
12721283
"rows_materialized": result["rows_materialized"],
12731284
"property_graph_created": result["property_graph_created"],
1285+
"property_graph_status": result.get(
1286+
"property_graph_status",
1287+
"created" if result["property_graph_created"] else "failed",
1288+
),
12741289
}
12751290
typer.echo(format_output(output, fmt))
12761291

1292+
# Distinguish "user-requested skip" (exit 0) from "creation failed"
1293+
# (exit 1). Same property_graph_created=False, different operator
1294+
# intent — JSON consumers read property_graph_status to tell them
1295+
# apart without parsing stderr.
1296+
if result.get("skipped_reason") == "user_requested":
1297+
return
12771298
if not result["property_graph_created"]:
12781299
typer.echo(
12791300
"Error: Property Graph creation failed. "

src/bigquery_agent_analytics/ontology_orchestrator.py

Lines changed: 39 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -300,14 +300,16 @@ def build_ontology_graph(
300300
endpoint: str = "gemini-2.5-flash",
301301
use_ai_generate: bool = True,
302302
location: Optional[str] = None,
303+
skip_property_graph: bool = False,
303304
) -> dict[str, Any]:
304305
"""Run the full ontology graph pipeline end-to-end.
305306
306307
1. Load the YAML spec (or use pre-loaded ``spec``).
307308
2. Extract an ``ExtractedGraph`` from agent telemetry.
308309
3. Create physical tables (if not exists).
309310
4. Materialize extracted nodes/edges into tables.
310-
5. Create the BigQuery Property Graph.
311+
5. Create the BigQuery Property Graph (skipped when
312+
``skip_property_graph=True``).
311313
312314
Args:
313315
session_ids: Sessions to extract from.
@@ -323,10 +325,22 @@ def build_ontology_graph(
323325
endpoint: AI.GENERATE model endpoint.
324326
use_ai_generate: If True, uses server-side AI extraction.
325327
location: BigQuery location.
328+
skip_property_graph: When True, skip phase 5 (do not run
329+
``CREATE OR REPLACE PROPERTY GRAPH``). Use this when the
330+
caller owns their own property-graph DDL and only wants
331+
the SDK to populate base tables. The result dict reports
332+
``property_graph_created=False`` with
333+
``skipped_reason="user_requested"`` and
334+
``property_graph_status="skipped:user_requested"``, which
335+
callers (and the CLI) use to distinguish a deliberate
336+
skip from a creation failure.
326337
327338
Returns:
328339
A dict with keys: ``spec``, ``graph``, ``tables_created``,
329340
``rows_materialized``, ``property_graph_created``,
341+
``property_graph_status`` (one of ``"created"``, ``"failed"``,
342+
``"skipped:user_requested"``), ``skipped_reason`` (only set
343+
when phase 5 was skipped, e.g. ``"user_requested"``),
330344
``graph_name``, ``graph_ref``.
331345
"""
332346
from .ontology_graph import OntologyGraphManager
@@ -391,24 +405,36 @@ def build_ontology_graph(
391405
rows_materialized = materializer.materialize(graph, session_ids)
392406
logger.info("Rows materialized: %s", rows_materialized)
393407

394-
# 5. Create property graph.
395-
compiler = OntologyPropertyGraphCompiler(
396-
project_id=project_id,
397-
dataset_id=dataset_id,
398-
spec=spec,
399-
location=location,
400-
)
401-
pg_created = compiler.create_property_graph(graph_name=name)
402-
403408
graph_ref = f"{project_id}.{dataset_id}.{name}"
404-
logger.info("Property Graph %r created=%s.", graph_ref, pg_created)
405409

406-
return {
410+
# 5. Create property graph (or skip when caller owns the DDL).
411+
result: dict[str, Any] = {
407412
"spec": spec,
408413
"graph": graph,
409414
"tables_created": tables_created,
410415
"rows_materialized": rows_materialized,
411-
"property_graph_created": pg_created,
412416
"graph_name": name,
413417
"graph_ref": graph_ref,
414418
}
419+
if skip_property_graph:
420+
logger.info(
421+
"Property Graph creation skipped (skip_property_graph=True); "
422+
"caller owns the DDL for graph %r.",
423+
graph_ref,
424+
)
425+
result["property_graph_created"] = False
426+
result["skipped_reason"] = "user_requested"
427+
result["property_graph_status"] = "skipped:user_requested"
428+
else:
429+
compiler = OntologyPropertyGraphCompiler(
430+
project_id=project_id,
431+
dataset_id=dataset_id,
432+
spec=spec,
433+
location=location,
434+
)
435+
pg_created = compiler.create_property_graph(graph_name=name)
436+
logger.info("Property Graph %r created=%s.", graph_ref, pg_created)
437+
result["property_graph_created"] = pg_created
438+
result["property_graph_status"] = "created" if pg_created else "failed"
439+
440+
return result

tests/test_cli.py

Lines changed: 148 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2472,3 +2472,151 @@ def test_bad_spec_path_exit_2(self):
24722472
],
24732473
)
24742474
assert result.exit_code == 2
2475+
2476+
@patch("bigquery_agent_analytics.ontology_orchestrator.build_ontology_graph")
2477+
def test_skip_property_graph_exits_zero_with_status(self, mock_build):
2478+
"""--skip-property-graph: exit 0, status='skipped:user_requested'."""
2479+
from bigquery_agent_analytics.ontology_models import ExtractedGraph
2480+
2481+
mock_build.return_value = {
2482+
"graph_name": "g",
2483+
"graph_ref": "proj.ds.g",
2484+
"graph": ExtractedGraph(name="test"),
2485+
"tables_created": {"mako_DecisionPoint": "p.d.decision_points"},
2486+
"rows_materialized": {"mako_DecisionPoint": 2},
2487+
"property_graph_created": False,
2488+
"skipped_reason": "user_requested",
2489+
"property_graph_status": "skipped:user_requested",
2490+
"spec": MagicMock(),
2491+
}
2492+
2493+
result = runner.invoke(
2494+
app,
2495+
[
2496+
"ontology-build",
2497+
"--project-id=proj",
2498+
"--dataset-id=ds",
2499+
f"--spec-path={self._SPEC_PATH}",
2500+
"--session-ids=sess1",
2501+
"--env=p.d",
2502+
"--skip-property-graph",
2503+
],
2504+
)
2505+
assert result.exit_code == 0
2506+
# Skip path must NOT print the "Property Graph creation failed" stderr.
2507+
assert "Property Graph creation failed" not in result.output
2508+
parsed = json.loads(result.output)
2509+
assert parsed["property_graph_created"] is False
2510+
assert parsed["property_graph_status"] == "skipped:user_requested"
2511+
2512+
# Flag is threaded through to the orchestrator.
2513+
_, kwargs = mock_build.call_args
2514+
assert kwargs["skip_property_graph"] is True
2515+
2516+
@patch("bigquery_agent_analytics.ontology_orchestrator.build_ontology_graph")
2517+
def test_default_invocation_omits_skip_flag(self, mock_build):
2518+
"""Default invocation passes skip_property_graph=False."""
2519+
from bigquery_agent_analytics.ontology_models import ExtractedGraph
2520+
2521+
mock_build.return_value = {
2522+
"graph_name": "g",
2523+
"graph_ref": "proj.ds.g",
2524+
"graph": ExtractedGraph(name="test"),
2525+
"tables_created": {},
2526+
"rows_materialized": {},
2527+
"property_graph_created": True,
2528+
"property_graph_status": "created",
2529+
"spec": MagicMock(),
2530+
}
2531+
2532+
result = runner.invoke(
2533+
app,
2534+
[
2535+
"ontology-build",
2536+
"--project-id=proj",
2537+
"--dataset-id=ds",
2538+
f"--spec-path={self._SPEC_PATH}",
2539+
"--session-ids=sess1",
2540+
"--env=p.d",
2541+
],
2542+
)
2543+
assert result.exit_code == 0
2544+
parsed = json.loads(result.output)
2545+
assert parsed["property_graph_status"] == "created"
2546+
2547+
_, kwargs = mock_build.call_args
2548+
assert kwargs["skip_property_graph"] is False
2549+
2550+
@patch("bigquery_agent_analytics.ontology_orchestrator.build_ontology_graph")
2551+
def test_skip_property_graph_status_visible_in_text_format(self, mock_build):
2552+
"""--format=text exposes property_graph_status to non-JSON consumers.
2553+
2554+
Pins the contract that property_graph_status is not JSON-only:
2555+
--format=table renders dict keys; --format=text falls back to a
2556+
readable representation. The status string must appear in either.
2557+
"""
2558+
from bigquery_agent_analytics.ontology_models import ExtractedGraph
2559+
2560+
mock_build.return_value = {
2561+
"graph_name": "g",
2562+
"graph_ref": "proj.ds.g",
2563+
"graph": ExtractedGraph(name="test"),
2564+
"tables_created": {},
2565+
"rows_materialized": {},
2566+
"property_graph_created": False,
2567+
"skipped_reason": "user_requested",
2568+
"property_graph_status": "skipped:user_requested",
2569+
"spec": MagicMock(),
2570+
}
2571+
2572+
result = runner.invoke(
2573+
app,
2574+
[
2575+
"ontology-build",
2576+
"--project-id=proj",
2577+
"--dataset-id=ds",
2578+
f"--spec-path={self._SPEC_PATH}",
2579+
"--session-ids=sess1",
2580+
"--env=p.d",
2581+
"--skip-property-graph",
2582+
"--format=text",
2583+
],
2584+
)
2585+
assert result.exit_code == 0
2586+
# The status string must appear in the text-format output so non-
2587+
# JSON consumers can see why the graph was not created.
2588+
assert "skipped:user_requested" in result.output
2589+
2590+
@patch("bigquery_agent_analytics.ontology_orchestrator.build_ontology_graph")
2591+
def test_property_graph_failure_status_failed(self, mock_build):
2592+
"""When the orchestrator reports failure, exit 1 with status='failed'.
2593+
2594+
Distinguishes the failure path from the user-requested-skip path by
2595+
asserting the status field, not just the exit code.
2596+
"""
2597+
from bigquery_agent_analytics.ontology_models import ExtractedGraph
2598+
2599+
mock_build.return_value = {
2600+
"graph_name": "g",
2601+
"graph_ref": "proj.ds.g",
2602+
"graph": ExtractedGraph(name="test"),
2603+
"tables_created": {},
2604+
"rows_materialized": {},
2605+
"property_graph_created": False,
2606+
"property_graph_status": "failed",
2607+
"spec": MagicMock(),
2608+
}
2609+
2610+
result = runner.invoke(
2611+
app,
2612+
[
2613+
"ontology-build",
2614+
"--project-id=proj",
2615+
"--dataset-id=ds",
2616+
f"--spec-path={self._SPEC_PATH}",
2617+
"--session-ids=sess1",
2618+
"--env=p.d",
2619+
],
2620+
)
2621+
assert result.exit_code == 1
2622+
assert "Property Graph creation failed" in result.output

0 commit comments

Comments
 (0)