You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat(ontology): add --skip-property-graph for user-owned graph DDL (#104) (#108)
* feat(ontology): add --skip-property-graph for user-owned graph DDL (#104)
Lets users with their own CREATE PROPERTY GRAPH DDL — managed by
Terraform, dbt, or hand-authored — populate base tables from BQ AA
traces without overwriting the graph object on every run.
Changes
- ontology_orchestrator.build_ontology_graph gains
skip_property_graph: bool = False. When True, phase 5 is not
invoked: no OntologyPropertyGraphCompiler is constructed, no
CREATE OR REPLACE PROPERTY GRAPH runs.
- Result dict gains property_graph_status with values "created" /
"failed" / "skipped:user_requested", plus skipped_reason
("user_requested") when phase 5 was skipped.
- ontology-build CLI gains --skip-property-graph and threads
property_graph_status through to the curated output dict so JSON
consumers can distinguish "skipped" from "failed" without parsing
stderr.
- Exit handling: skipped_reason == "user_requested" exits 0 silently;
the existing exit-1-with-error behavior is preserved for actual
graph-creation failures.
Tests
- test_skip_property_graph_does_not_construct_compiler asserts the
compiler class is never called (mock.assert_not_called) when the
flag is set.
- test_property_graph_status_created_on_success and
test_property_graph_status_failed_on_compiler_false cover the two
default-mode status values.
- CLI tests cover exit 0 with status="skipped:user_requested",
default skip_property_graph=False threading, and exit 1 with
status="failed" on actual creation failure.
135/135 tests in test_ontology_orchestrator.py + test_cli.py pass.
* docs+test: ontology-build doc + live skip-property-graph test (#104)
Closes the two #104 acceptance gaps flagged on PR #108 review:
(1) Docs missing
- New docs/ontology/ontology-build.md documents the bq-agent-sdk
ontology-build orchestrator end-to-end and the new
--skip-property-graph flag.
- Includes a status-field reference table mapping
property_graph_status (created / failed / skipped:user_requested)
to property_graph_created and CLI exit code.
- Includes Python API example showing skip_property_graph=True with
expected result-dict shape.
(2) No gated live integration test
- New TestSkipPropertyGraph class in
tests/test_integration_ontology_binding.py.
- Gated on RUN_LIVE_BIGQUERY_TESTS=1 like the existing live tests.
- Sequence: create authored CREATE PROPERTY GRAPH directly via SQL
(simulating Terraform/dbt-managed DDL), capture the post-DDL
CURRENT_TIMESTAMP(), run build_ontology_graph(...,
skip_property_graph=True), then query JOBS_BY_PROJECT for any
'CREATE OR REPLACE PROPERTY GRAPH' jobs in the post-timestamp
window — assert zero. Also re-runs the showcase GQL query to
confirm the user's graph object still works after the SDK run.
- The timestamp is captured AFTER the authored DDL specifically to
avoid the false-positive trap called out in #107 cell 1.3.
* test+docs: harden live test, add text-format check, link doc (#104)
Addresses three review findings on PR #108:
(1) Live test now exercises real extraction/materialization
- Pass dataset_id=_DATASET, table_id=_TABLE so extraction reads
the production agent_events table where YMGO ADCP session data
lives. Materializer still writes to scratch_dataset because spec
entity sources arrive 3-part-qualified to binding.target.dataset
via _qualify_source (resolved_spec.py:141).
- Assert sum(rows_materialized.values()) > 0 to catch the silent-
empty-graph trap where ontology_graph.py:683 returns an empty
ExtractedGraph if extraction fails (e.g. wrong source dataset).
(2) JOBS_BY_PROJECT assertion narrowed to the test's own graph
- Filter by both 'CREATE OR REPLACE PROPERTY GRAPH' keyword AND
the fully-qualified graph reference
({_PROJECT}.{scratch_dataset}.{spec.name}). Prevents false-fail
on unrelated CREATE OR REPLACE PROPERTY GRAPH jobs running
concurrently in the same project from other tests/developers.
(3) docs/README.md gains a row for the new ontology-build doc.
(4) New CLI test test_skip_property_graph_status_visible_in_text_format
asserts property_graph_status appears in --format=text output, pinning
the contract that the status field is not JSON-only.
7/7 ontology-build CLI tests pass.
* test+docs: harden DDL-detection filter, soften DDL claims (#104)
Addresses three review findings on PR #108:
(1) Live test DDL-detection blind spot
The previous filter required the regressed CREATE OR REPLACE
PROPERTY GRAPH to target _PROJECT.<scratch_dataset>.<spec.name>.
But if skip_property_graph regressed, the compiler would actually
target _PROJECT._DATASET.<spec.name> (the orchestrator's
dataset_id argument is _DATASET in this test, used for extraction
of agent_events). The blind spot: a regression could fire DDL
that the test would not catch.
Fixed by replacing the fully-qualified-graph-ref filter with two
narrower constraints that catch the regression in either dataset:
- graph name (spec.name) — present in the DDL string regardless
of which dataset the compiler targets
- sdk_feature='ontology-gql' label — only SDK-issued
property-graph jobs carry this label per
ontology_property_graph.py:465; the test's setup CREATE
PROPERTY GRAPH (issued via direct SQL) does not, so it does
not trip the assertion
(2) docs/ontology/ontology-build.md: document graph_ref limitation
Added a "Known limitation" section noting that
result["graph_ref"] reports the extraction dataset, not the
binding's target dataset, in split source/target setups. The
materialized base tables themselves still go to the binding's
target dataset per the resolved spec; only the reported string is
affected.
(3) docs/ontology/ontology-build.md: soften DDL-options wording
"additional indexes, dialect-specific options" was overreaching for
BigQuery property graphs; tightened to "custom labels or other
DDL details the SDK's compiler doesn't generate."
136/136 tests pass.
* test: correct comment on label-filter rationale (#104)
The previous comment claimed the test's setup CREATE PROPERTY GRAPH
job did not carry the sdk_feature='ontology-gql' label. That was
factually wrong: setup goes through
OntologyPropertyGraphCompiler.create_property_graph() (line 387),
which does carry the label.
The test logic was already correct — the setup job is excluded by
the post-setup timestamp captured in step 2, not by the label
filter. The label filter excludes user-authored raw SQL DDL jobs
(without SDK labels), which is its actual purpose. Only the comment
needed to change.
No code change.
* style: apply autoformat to test files
Run bash autoformat.sh (isort + pyink). Fixes the Format check
CI job that was failing on PR #108.
No behavior change.
`bq-agent-sdk ontology-build` runs the SDK's full ontology pipeline end-to-end against a populated `agent_events` table:
4
+
5
+
1. Load the spec (`--ontology X.yaml --binding Y.yaml`).
6
+
2. Extract an `ExtractedGraph` from agent telemetry via `AI.GENERATE`.
7
+
3. Create physical entity/relationship tables (`CREATE TABLE IF NOT EXISTS`).
8
+
4. Materialize extracted nodes/edges into those tables.
9
+
5. Run `CREATE OR REPLACE PROPERTY GRAPH` to wire the BigQuery property graph object.
10
+
11
+
The Python entry point is `bigquery_agent_analytics.ontology_orchestrator.build_ontology_graph(...)`. The CLI is a thin wrapper.
12
+
13
+
## Skipping property-graph DDL
14
+
15
+
Use `--skip-property-graph` when **the caller owns their own `CREATE PROPERTY GRAPH` DDL** — e.g., the property graph is provisioned via Terraform, dbt, or hand-authored SQL — and only wants the SDK to populate base tables.
16
+
17
+
```
18
+
bq-agent-sdk ontology-build \
19
+
--project-id my-project \
20
+
--dataset-id my-dataset \
21
+
--ontology my.ontology.yaml \
22
+
--binding my-bq-prod.binding.yaml \
23
+
--session-ids sess-1,sess-2 \
24
+
--skip-property-graph
25
+
```
26
+
27
+
Behavior with the flag set:
28
+
29
+
- Phase 5 short-circuits. No `OntologyPropertyGraphCompiler` is constructed, no `CREATE OR REPLACE PROPERTY GRAPH` job runs. The user's existing graph object is unchanged.
30
+
- Phases 1–4 run normally. Tables are created (`CREATE TABLE IF NOT EXISTS` is a no-op against pre-existing tables) and rows are materialized.
JSON consumers should read `property_graph_status` (not just `property_graph_created`) to distinguish a deliberate skip from a creation failure.
43
+
44
+
## Status field reference
45
+
46
+
The CLI's `property_graph_status` field has three values:
47
+
48
+
|`property_graph_status`|`property_graph_created`| Exit code | Meaning |
49
+
|---|---|---|---|
50
+
|`"created"`|`true`| 0 | Phase 5 ran and BigQuery confirmed the graph object. |
51
+
|`"failed"`|`false`| 1 | Phase 5 ran but the graph object was not created. The CLI prints "Property Graph creation failed" to stderr. Tables and rows were still materialized. |
52
+
|`"skipped:user_requested"`|`false`| 0 |`--skip-property-graph` was set. Phase 5 did not run. No error message. |
53
+
54
+
Without `--skip-property-graph`, the existing exit-1 behavior on graph-create failure is preserved exactly.
55
+
56
+
## When to use this
57
+
58
+
-**You already manage `CREATE PROPERTY GRAPH` in Terraform / dbt / a SQL file.** The SDK's `CREATE OR REPLACE PROPERTY GRAPH` would clobber your DDL on every run.
59
+
-**Your property graph definition uses DDL details the SDK compiler doesn't emit.** You hand-authored the graph DDL to express custom labels or other DDL details the SDK's compiler doesn't generate.
60
+
-**You want to populate your tables on a different cadence than you redefine the graph.** The graph definition rarely changes; the data is refreshed continuously.
61
+
62
+
For all other cases, leave the flag off and let the SDK manage the property graph end-to-end.
63
+
64
+
## Python API
65
+
66
+
The flag is also available on `build_ontology_graph(...)`:
67
+
68
+
```python
69
+
from bigquery_agent_analytics.ontology_orchestrator import build_ontology_graph
`skipped_reason` is only present when the phase was skipped; it is omitted when phase 5 ran (whether or not it succeeded).
85
+
86
+
## Known limitation: `result["graph_ref"]` in split source/target setups
87
+
88
+
`build_ontology_graph(...)` accepts a single `dataset_id` and uses it both for extraction (where `agent_events` lives) and for the `graph_ref` reported in the result dict (`{project_id}.{dataset_id}.{name}`). When `--skip-property-graph` is set and the caller's actual property graph lives in `binding.target.dataset` (different from the `dataset_id` used for extraction), `result["graph_ref"]` reports the **extraction dataset**, not the user-owned graph's dataset. The materialized base tables themselves still go to `binding.target.dataset` per the resolved spec — this only affects the reported `graph_ref` string. Tracked as a follow-up; not blocking for `--skip-property-graph` itself since the user already knows where their authored graph lives.
0 commit comments