feat(ontology): add --skip-property-graph for user-owned graph DDL (#104)#108
Conversation
…oogleCloudPlatform#104) Lets users with their own CREATE PROPERTY GRAPH DDL — managed by Terraform, dbt, or hand-authored — populate base tables from BQ AA traces without overwriting the graph object on every run. Changes - ontology_orchestrator.build_ontology_graph gains skip_property_graph: bool = False. When True, phase 5 is not invoked: no OntologyPropertyGraphCompiler is constructed, no CREATE OR REPLACE PROPERTY GRAPH runs. - Result dict gains property_graph_status with values "created" / "failed" / "skipped:user_requested", plus skipped_reason ("user_requested") when phase 5 was skipped. - ontology-build CLI gains --skip-property-graph and threads property_graph_status through to the curated output dict so JSON consumers can distinguish "skipped" from "failed" without parsing stderr. - Exit handling: skipped_reason == "user_requested" exits 0 silently; the existing exit-1-with-error behavior is preserved for actual graph-creation failures. Tests - test_skip_property_graph_does_not_construct_compiler asserts the compiler class is never called (mock.assert_not_called) when the flag is set. - test_property_graph_status_created_on_success and test_property_graph_status_failed_on_compiler_false cover the two default-mode status values. - CLI tests cover exit 0 with status="skipped:user_requested", default skip_property_graph=False threading, and exit 1 with status="failed" on actual creation failure. 135/135 tests in test_ontology_orchestrator.py + test_cli.py pass.
|
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
…CloudPlatform#104) Closes the two GoogleCloudPlatform#104 acceptance gaps flagged on PR GoogleCloudPlatform#108 review: (1) Docs missing - New docs/ontology/ontology-build.md documents the bq-agent-sdk ontology-build orchestrator end-to-end and the new --skip-property-graph flag. - Includes a status-field reference table mapping property_graph_status (created / failed / skipped:user_requested) to property_graph_created and CLI exit code. - Includes Python API example showing skip_property_graph=True with expected result-dict shape. (2) No gated live integration test - New TestSkipPropertyGraph class in tests/test_integration_ontology_binding.py. - Gated on RUN_LIVE_BIGQUERY_TESTS=1 like the existing live tests. - Sequence: create authored CREATE PROPERTY GRAPH directly via SQL (simulating Terraform/dbt-managed DDL), capture the post-DDL CURRENT_TIMESTAMP(), run build_ontology_graph(..., skip_property_graph=True), then query JOBS_BY_PROJECT for any 'CREATE OR REPLACE PROPERTY GRAPH' jobs in the post-timestamp window — assert zero. Also re-runs the showcase GQL query to confirm the user's graph object still works after the SDK run. - The timestamp is captured AFTER the authored DDL specifically to avoid the false-positive trap called out in GoogleCloudPlatform#107 cell 1.3.
…CloudPlatform#104) Closes the two GoogleCloudPlatform#104 acceptance gaps flagged on PR GoogleCloudPlatform#108 review: (1) Docs missing - New docs/ontology/ontology-build.md documents the bq-agent-sdk ontology-build orchestrator end-to-end and the new --skip-property-graph flag. - Includes a status-field reference table mapping property_graph_status (created / failed / skipped:user_requested) to property_graph_created and CLI exit code. - Includes Python API example showing skip_property_graph=True with expected result-dict shape. (2) No gated live integration test - New TestSkipPropertyGraph class in tests/test_integration_ontology_binding.py. - Gated on RUN_LIVE_BIGQUERY_TESTS=1 like the existing live tests. - Sequence: create authored CREATE PROPERTY GRAPH directly via SQL (simulating Terraform/dbt-managed DDL), capture the post-DDL CURRENT_TIMESTAMP(), run build_ontology_graph(..., skip_property_graph=True), then query JOBS_BY_PROJECT for any 'CREATE OR REPLACE PROPERTY GRAPH' jobs in the post-timestamp window — assert zero. Also re-runs the showcase GQL query to confirm the user's graph object still works after the SDK run. - The timestamp is captured AFTER the authored DDL specifically to avoid the false-positive trap called out in GoogleCloudPlatform#107 cell 1.3. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Follow-up: closed the two #104 AC gaps from reviewDocs added —
Live integration test added — new
This is the exact contract #107 cells 1.3–1.5 will demonstrate in the V5 migration notebook once the storyboard moves to execution. The unit-test coverage from the original commit is unchanged and still passes (135/135). |
…loudPlatform#104) Addresses three review findings on PR GoogleCloudPlatform#108: (1) Live test now exercises real extraction/materialization - Pass dataset_id=_DATASET, table_id=_TABLE so extraction reads the production agent_events table where YMGO ADCP session data lives. Materializer still writes to scratch_dataset because spec entity sources arrive 3-part-qualified to binding.target.dataset via _qualify_source (resolved_spec.py:141). - Assert sum(rows_materialized.values()) > 0 to catch the silent- empty-graph trap where ontology_graph.py:683 returns an empty ExtractedGraph if extraction fails (e.g. wrong source dataset). (2) JOBS_BY_PROJECT assertion narrowed to the test's own graph - Filter by both 'CREATE OR REPLACE PROPERTY GRAPH' keyword AND the fully-qualified graph reference ({_PROJECT}.{scratch_dataset}.{spec.name}). Prevents false-fail on unrelated CREATE OR REPLACE PROPERTY GRAPH jobs running concurrently in the same project from other tests/developers. (3) docs/README.md gains a row for the new ontology-build doc. (4) New CLI test test_skip_property_graph_status_visible_in_text_format asserts property_graph_status appears in --format=text output, pinning the contract that the status field is not JSON-only. 7/7 ontology-build CLI tests pass.
…loudPlatform#104) Addresses three review findings on PR GoogleCloudPlatform#108: (1) Live test now exercises real extraction/materialization - Pass dataset_id=_DATASET, table_id=_TABLE so extraction reads the production agent_events table where YMGO ADCP session data lives. Materializer still writes to scratch_dataset because spec entity sources arrive 3-part-qualified to binding.target.dataset via _qualify_source (resolved_spec.py:141). - Assert sum(rows_materialized.values()) > 0 to catch the silent- empty-graph trap where ontology_graph.py:683 returns an empty ExtractedGraph if extraction fails (e.g. wrong source dataset). (2) JOBS_BY_PROJECT assertion narrowed to the test's own graph - Filter by both 'CREATE OR REPLACE PROPERTY GRAPH' keyword AND the fully-qualified graph reference ({_PROJECT}.{scratch_dataset}.{spec.name}). Prevents false-fail on unrelated CREATE OR REPLACE PROPERTY GRAPH jobs running concurrently in the same project from other tests/developers. (3) docs/README.md gains a row for the new ontology-build doc. (4) New CLI test test_skip_property_graph_status_visible_in_text_format asserts property_graph_status appears in --format=text output, pinning the contract that the status field is not JSON-only. 7/7 ontology-build CLI tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Review fixes folded in (commit 8d45671)(1) Live test now exercises real extraction/materialization. Was passing (2) JOBS_BY_PROJECT assertion narrowed. Previously filtered every project-level job by (3) (4) New CLI test (5) PR body updated to reflect the docs + live test from commit 054e141 and the hardening from 8d45671. Tests section now lists unit / CLI / live coverage explicitly. (6) CLA check — that's an admin/account-side gate that needs your action; I can't resolve it via PR commits. Verified locally:
|
…dPlatform#104) Addresses three review findings on PR GoogleCloudPlatform#108: (1) Live test DDL-detection blind spot The previous filter required the regressed CREATE OR REPLACE PROPERTY GRAPH to target _PROJECT.<scratch_dataset>.<spec.name>. But if skip_property_graph regressed, the compiler would actually target _PROJECT._DATASET.<spec.name> (the orchestrator's dataset_id argument is _DATASET in this test, used for extraction of agent_events). The blind spot: a regression could fire DDL that the test would not catch. Fixed by replacing the fully-qualified-graph-ref filter with two narrower constraints that catch the regression in either dataset: - graph name (spec.name) — present in the DDL string regardless of which dataset the compiler targets - sdk_feature='ontology-gql' label — only SDK-issued property-graph jobs carry this label per ontology_property_graph.py:465; the test's setup CREATE PROPERTY GRAPH (issued via direct SQL) does not, so it does not trip the assertion (2) docs/ontology/ontology-build.md: document graph_ref limitation Added a "Known limitation" section noting that result["graph_ref"] reports the extraction dataset, not the binding's target dataset, in split source/target setups. The materialized base tables themselves still go to the binding's target dataset per the resolved spec; only the reported string is affected. (3) docs/ontology/ontology-build.md: soften DDL-options wording "additional indexes, dialect-specific options" was overreaching for BigQuery property graphs; tightened to "custom labels or other DDL details the SDK's compiler doesn't generate." 136/136 tests pass.
…dPlatform#104) Addresses three review findings on PR GoogleCloudPlatform#108: (1) Live test DDL-detection blind spot The previous filter required the regressed CREATE OR REPLACE PROPERTY GRAPH to target _PROJECT.<scratch_dataset>.<spec.name>. But if skip_property_graph regressed, the compiler would actually target _PROJECT._DATASET.<spec.name> (the orchestrator's dataset_id argument is _DATASET in this test, used for extraction of agent_events). The blind spot: a regression could fire DDL that the test would not catch. Fixed by replacing the fully-qualified-graph-ref filter with two narrower constraints that catch the regression in either dataset: - graph name (spec.name) — present in the DDL string regardless of which dataset the compiler targets - sdk_feature='ontology-gql' label — only SDK-issued property-graph jobs carry this label per ontology_property_graph.py:465; the test's setup CREATE PROPERTY GRAPH (issued via direct SQL) does not, so it does not trip the assertion (2) docs/ontology/ontology-build.md: document graph_ref limitation Added a "Known limitation" section noting that result["graph_ref"] reports the extraction dataset, not the binding's target dataset, in split source/target setups. The materialized base tables themselves still go to the binding's target dataset per the resolved spec; only the reported string is affected. (3) docs/ontology/ontology-build.md: soften DDL-options wording "additional indexes, dialect-specific options" was overreaching for BigQuery property graphs; tightened to "custom labels or other DDL details the SDK's compiler doesn't generate." 136/136 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Review fixes folded in (commit e98f1c2)(1) DDL-detection blind spot fixed. The previous filter required a regressed Replaced with two narrower constraints that catch a regression in either dataset:
This catches the regression case the test was actually meant to detect. (2) (3) Docs wording softened. (4) CLI live smoke test (finding #3 — non-blocker). Not added in this commit. The unit tests cover CLI wiring ( Verified locally:
|
) The previous comment claimed the test's setup CREATE PROPERTY GRAPH job did not carry the sdk_feature='ontology-gql' label. That was factually wrong: setup goes through OntologyPropertyGraphCompiler.create_property_graph() (line 387), which does carry the label. The test logic was already correct — the setup job is excluded by the post-setup timestamp captured in step 2, not by the label filter. The label filter excludes user-authored raw SQL DDL jobs (without SDK labels), which is its actual purpose. Only the comment needed to change. No code change.
9fd8e85 to
e44967b
Compare
Run bash autoformat.sh (isort + pyink). Fixes the Format check CI job that was failing on PR GoogleCloudPlatform#108. No behavior change.
…oogleCloudPlatform#105 PR 2b) Closes the user-facing surface of issue GoogleCloudPlatform#105. Builds on PR 2a's validate_binding_against_bigquery core (shipped in PR GoogleCloudPlatform#109). CLI changes (src/bigquery_agent_analytics/cli.py) Standalone command: bq-agent-sdk binding-validate --project-id ... --ontology X.yaml --binding Y.yaml [--location US] [--strict] [--format json|text|table] Loads the ontology + binding via upstream loaders, builds a google.cloud.bigquery.Client, calls validate_binding_against_bigquery(...), prints a structured JSON report on stdout. Warnings always print to stderr (one line each) so CI logs surface advisory drift even when stdout is consumed by a JSON-aware tool. Exit codes: 0 ok, 1 failure(s), 2 unexpected error (load failure, missing flags, etc). ontology-build flags: --validate-binding (default-mode pre-flight) --validate-binding-strict (strict-mode pre-flight) When set, run the validator before phase 2 (extraction). On any failure, the build short-circuits BEFORE any AI.GENERATE call fires, so authoring drift never costs tokens. Default-mode warnings print to stderr but do not block. The two flags are mutually exclusive; both are incompatible with the deprecated --spec-path form because the validator needs the unresolved Ontology + Binding pair, not a combined GraphSpec. Helper _run_binding_preflight() centralizes the load + validate + print + exit logic so it can be invoked from ontology-build without duplicating the flow. Also: --location flag added to ontology-build, threaded through to build_ontology_graph() (which has supported it on the Python side since PR GoogleCloudPlatform#108 / commit 292320b). The CLI was previously building without forwarding it. Tests (tests/test_cli.py) TestBindingValidate (6 tests): - test_clean_validation_exits_zero - test_failures_exit_one_with_payload - test_warnings_print_to_stderr_but_do_not_flip_exit - test_strict_flag_threaded_through - test_missing_required_flags_exit_2 - test_load_failure_exits_two TestOntologyBuildValidateBindingFlag (5 tests): - test_validate_binding_short_circuits_on_failure_before_build (asserts build_ontology_graph is NEVER called when validation fails — extraction does not start) - test_validate_binding_strict_short_circuits_on_nullable_keys - test_validate_binding_clean_proceeds_to_build - test_validate_binding_with_spec_path_rejected - test_both_flags_rejected (mutual exclusion) All tests use patched validators (no live BQ); the live integration test for the full flow lives in PR 2a (GoogleCloudPlatform#109). 253/253 tests pass across the touched test files. Docs New: docs/ontology/binding-validation.md - When to run (binding authoring, CI gating, ontology-build). - Standalone CLI examples (default + strict). - ontology-build integration examples. - Failure-code reference table covering all 7 default codes plus the strict-only KEY_COLUMN_NULLABLE. - CI usage pattern (GitHub Actions). - Python API example. - Cross-links to ontology-build.md, binding.md, and GoogleCloudPlatform#76's planned post-extraction validator. Updated: docs/README.md - Added binding-validation.md to the Ontology Reference table. What's NOT in this PR - Live integration test for the CLI surface (the validator itself has live coverage from PR 2a's TestBindingValidationLive). A literal subprocess-style CLI smoke test could land as a follow-up if someone wants it, but the unit tests cover wiring + exit codes + flag threading comprehensively.
…ate-binding flags (#105 PR 2b) (#112) * feat(cli): add bq-agent-sdk binding-validate + ontology-build flags (#105 PR 2b) Closes the user-facing surface of issue #105. Builds on PR 2a's validate_binding_against_bigquery core (shipped in PR #109). CLI changes (src/bigquery_agent_analytics/cli.py) Standalone command: bq-agent-sdk binding-validate --project-id ... --ontology X.yaml --binding Y.yaml [--location US] [--strict] [--format json|text|table] Loads the ontology + binding via upstream loaders, builds a google.cloud.bigquery.Client, calls validate_binding_against_bigquery(...), prints a structured JSON report on stdout. Warnings always print to stderr (one line each) so CI logs surface advisory drift even when stdout is consumed by a JSON-aware tool. Exit codes: 0 ok, 1 failure(s), 2 unexpected error (load failure, missing flags, etc). ontology-build flags: --validate-binding (default-mode pre-flight) --validate-binding-strict (strict-mode pre-flight) When set, run the validator before phase 2 (extraction). On any failure, the build short-circuits BEFORE any AI.GENERATE call fires, so authoring drift never costs tokens. Default-mode warnings print to stderr but do not block. The two flags are mutually exclusive; both are incompatible with the deprecated --spec-path form because the validator needs the unresolved Ontology + Binding pair, not a combined GraphSpec. Helper _run_binding_preflight() centralizes the load + validate + print + exit logic so it can be invoked from ontology-build without duplicating the flow. Also: --location flag added to ontology-build, threaded through to build_ontology_graph() (which has supported it on the Python side since PR #108 / commit 292320b). The CLI was previously building without forwarding it. Tests (tests/test_cli.py) TestBindingValidate (6 tests): - test_clean_validation_exits_zero - test_failures_exit_one_with_payload - test_warnings_print_to_stderr_but_do_not_flip_exit - test_strict_flag_threaded_through - test_missing_required_flags_exit_2 - test_load_failure_exits_two TestOntologyBuildValidateBindingFlag (5 tests): - test_validate_binding_short_circuits_on_failure_before_build (asserts build_ontology_graph is NEVER called when validation fails — extraction does not start) - test_validate_binding_strict_short_circuits_on_nullable_keys - test_validate_binding_clean_proceeds_to_build - test_validate_binding_with_spec_path_rejected - test_both_flags_rejected (mutual exclusion) All tests use patched validators (no live BQ); the live integration test for the full flow lives in PR 2a (#109). 253/253 tests pass across the touched test files. Docs New: docs/ontology/binding-validation.md - When to run (binding authoring, CI gating, ontology-build). - Standalone CLI examples (default + strict). - ontology-build integration examples. - Failure-code reference table covering all 7 default codes plus the strict-only KEY_COLUMN_NULLABLE. - CI usage pattern (GitHub Actions). - Python API example. - Cross-links to ontology-build.md, binding.md, and #76's planned post-extraction validator. Updated: docs/README.md - Added binding-validation.md to the Ontology Reference table. What's NOT in this PR - Live integration test for the CLI surface (the validator itself has live coverage from PR 2a's TestBindingValidationLive). A literal subprocess-style CLI smoke test could land as a follow-up if someone wants it, but the unit tests cover wiring + exit codes + flag threading comprehensively. * fix+test+docs: lowercase JSON codes, changelog, location + warning-only paths (#105 PR 2b) Four review findings folded in. (1) Failure-code reference now shows lowercase JSON values The CLI emits f.code.value (e.g. 'missing_table'), not the enum attribute name ('MISSING_TABLE'). Docs previously listed the uppercase form, so users copying the table into jq filters would silently match nothing. The reference table now has both columns (Python FailureCode + JSON value), with a sample jq filter using the lowercase form. (2) CHANGELOG entry under [Unreleased] This PR adds a public CLI command (binding-validate), two public ontology-build flags (--validate-binding[-strict]), --location on ontology-build, and the validate_binding_against_bigquery Python API. All four are now called out under [Unreleased] / Added. (3) Test the warning-only path through ontology-build The new test test_validate_binding_warnings_only_proceeds_to_build patches the validator to return a BindingValidationReport with only warnings (no failures), runs ontology-build --validate-binding, and asserts: - exit code 0 - build_ontology_graph IS called (warnings don't short-circuit) - WARN: key_column_nullable appears in the CLI output This covers _run_binding_preflight()'s default-mode advisory branch, complementing the existing failure-short-circuit and clean-success tests. (4) --location now has tests and a doc note Two new tests: - TestBindingValidate::test_location_threaded_to_bigquery_client confirms binding-validate --location=EU constructs a BQ client with location='EU'. - TestOntologyBuildValidateBindingFlag:: test_location_threaded_through_orchestrator confirms ontology-build --location=EU forwards to build_ontology_graph so the orchestrator's BQ client targets the EU multi-region. docs/ontology/ontology-build.md: new 'BigQuery location' section covering --location plus a 'Pre-flight binding validation' section referencing binding-validation.md for the full --validate-binding flag reference. 256/256 tests pass (test_binding_validation.py + test_cli.py + test_ontology_orchestrator.py + test_ontology_materializer.py + test_resolved_spec.py). Autoformat clean. Optional nit (deferred, not blocking): _run_binding_preflight loads ontology + binding once for validation, _load_spec_from_args reloads them for resolve(). Worth a small helper that returns the loaded objects once, but acceptable as-is for PR 2b scope. * docs: avoid naming INFORMATION_SCHEMA as preflight mechanism (#105 PR 2b) Two wording fixes; no behavior change. The validator uses bq_client.get_table(...) metadata calls (binding_validation.py:309), not direct INFORMATION_SCHEMA queries. Two places said the latter: (1) docs/ontology/ontology-build.md: 'pre-flight validator queries INFORMATION_SCHEMA in the same region' -> 'uses the BigQuery client with the requested location to fetch each bound table's metadata'. (2) tests/test_cli.py docstring on test_location_threaded_to_bigquery_client: same correction. The test asserts the location threads through; the wording for *why* location matters now matches the actual mechanism. 256/256 tests still pass. Autoformat clean.
Implements #104 —
--skip-property-graphfor users with their ownCREATE PROPERTY GRAPHDDL (Terraform / dbt / hand-authored). Lets the SDK populate base tables from BQ AA traces without overwriting the user's graph object on every run.Changes
ontology_orchestrator.pybuild_ontology_graph(...)gainsskip_property_graph: bool = False.True, phase 5 short-circuits: noOntologyPropertyGraphCompileris constructed, noCREATE OR REPLACE PROPERTY GRAPHruns.property_graph_status("created"/"failed"/"skipped:user_requested") andskipped_reason("user_requested"only when phase 5 was skipped).cli.py(ontology-build)--skip-property-graphflag, threaded through tobuild_ontology_graph.property_graph_statusso JSON consumers can distinguish skipped from failed without parsing stderr.skipped_reason == "user_requested"exits 0 silently; existing exit-1-with-error behavior preserved for actual graph-creation failures (property_graph_created=Falsefor any other reason).Docs (
docs/ontology/ontology-build.md, new)--skip-property-graphsection.property_graph_status→property_graph_created+ exit code, so JSON consumers have a stable contract.docs/README.mdunder Ontology Reference.Tests
Unit (
tests/test_ontology_orchestrator.py):test_skip_property_graph_does_not_construct_compiler— assertsOntologyPropertyGraphCompileris never called when the flag is set.test_property_graph_status_created_on_success— default flow with successful creation reports"created", noskipped_reason.test_property_graph_status_failed_on_compiler_false— default flow withcreate_property_graph()=Falsereports"failed", distinct from"skipped:user_requested".CLI (
tests/test_cli.py::TestOntologyBuild):test_skip_property_graph_exits_zero_with_status— exit 0, status"skipped:user_requested", no "Property Graph creation failed" message, flag threaded through.test_default_invocation_omits_skip_flag— default invocation passesskip_property_graph=False.test_skip_property_graph_status_visible_in_text_format— pins the contract thatproperty_graph_statusappears in--format=textoutput, not just JSON.test_property_graph_failure_status_failed— exit 1 with status"failed"and the existing error message preserved.Live integration (
tests/test_integration_ontology_binding.py::TestSkipPropertyGraph, gated onRUN_LIVE_BIGQUERY_TESTS=1):test_skip_property_graph_issues_no_create_graph_job— full end-to-end test:CREATE OR REPLACE PROPERTY GRAPHdirectly via SQL (simulating Terraform/dbt-managed DDL).CURRENT_TIMESTAMP()after the authored DDL completes (closes the false-positive trap from Doc: V5 migration notebook storyboard for the four-guarantee decision-lineage narrative #107 cell 1.3 where the JOBS_BY_PROJECT filter would otherwise catch the test's own setup job).build_ontology_graph(..., dataset_id=_DATASET, table_id=_TABLE, skip_property_graph=True)— extraction reads from realagent_events, materializer writes toscratch_datasetper spec, no DDL job for phase 5.property_graph_status == "skipped:user_requested"ANDsum(rows_materialized.values()) > 0(catches the silent-empty-graph trap).INFORMATION_SCHEMA.JOBS_BY_PROJECT WHERE creation_time > @before AND UPPER(query) LIKE '%CREATE OR REPLACE PROPERTY GRAPH%' AND query LIKE '%{_PROJECT}.{scratch_dataset}.{spec.name}%'— assert zero rows. The graph-ref filter prevents false-fail from unrelated DDL jobs running concurrently in the same project.This is exactly the contract #107 cells 1.3–1.5 will demonstrate in the V5 migration notebook when the storyboard moves to execution.
7/7 ontology-build CLI tests + 4/4 orchestrator tests pass; all 135 mocked tests in
test_ontology_orchestrator.pyandtest_cli.pypass.Next
After this lands, #105 (binding pre-flight) can start independently per the working plan. #76 follows; #75 C1 is gated on #76.