feat(ontology): add --skip-property-graph for user-owned graph DDL (#104) by caohy1988 · Pull Request #108 · GoogleCloudPlatform/BigQuery-Agent-Analytics-SDK

caohy1988 · 2026-05-03T01:17:17Z

Implements #104 — --skip-property-graph for users with their own CREATE PROPERTY GRAPH DDL (Terraform / dbt / hand-authored). Lets the SDK populate base tables from BQ AA traces without overwriting the user's graph object on every run.

Changes

ontology_orchestrator.py

build_ontology_graph(...) gains skip_property_graph: bool = False.
When True, phase 5 short-circuits: no OntologyPropertyGraphCompiler is constructed, no CREATE OR REPLACE PROPERTY GRAPH runs.
Result dict gains property_graph_status ("created" / "failed" / "skipped:user_requested") and skipped_reason ("user_requested" only when phase 5 was skipped).

cli.py (ontology-build)

New --skip-property-graph flag, threaded through to build_ontology_graph.
Curated CLI output dict gains property_graph_status so JSON consumers can distinguish skipped from failed without parsing stderr.
Exit handling: skipped_reason == "user_requested" exits 0 silently; existing exit-1-with-error behavior preserved for actual graph-creation failures (property_graph_created=False for any other reason).

Docs (docs/ontology/ontology-build.md, new)

Documents the SDK orchestrator end-to-end with a dedicated --skip-property-graph section.
Status-field reference table mapping property_graph_status → property_graph_created + exit code, so JSON consumers have a stable contract.
"When to use this" guidance covering Terraform/dbt-managed DDL, custom graph options, and decoupled refresh cadence.
Python API example with the expected result-dict shape.
Linked from docs/README.md under Ontology Reference.

Tests

Unit (tests/test_ontology_orchestrator.py):

test_skip_property_graph_does_not_construct_compiler — asserts OntologyPropertyGraphCompiler is never called when the flag is set.
test_property_graph_status_created_on_success — default flow with successful creation reports "created", no skipped_reason.
test_property_graph_status_failed_on_compiler_false — default flow with create_property_graph()=False reports "failed", distinct from "skipped:user_requested".

CLI (tests/test_cli.py::TestOntologyBuild):

test_skip_property_graph_exits_zero_with_status — exit 0, status "skipped:user_requested", no "Property Graph creation failed" message, flag threaded through.
test_default_invocation_omits_skip_flag — default invocation passes skip_property_graph=False.
test_skip_property_graph_status_visible_in_text_format — pins the contract that property_graph_status appears in --format=text output, not just JSON.
test_property_graph_failure_status_failed — exit 1 with status "failed" and the existing error message preserved.

Live integration (tests/test_integration_ontology_binding.py::TestSkipPropertyGraph, gated on RUN_LIVE_BIGQUERY_TESTS=1):

test_skip_property_graph_issues_no_create_graph_job — full end-to-end test:
1. Create authored CREATE OR REPLACE PROPERTY GRAPH directly via SQL (simulating Terraform/dbt-managed DDL).
2. Capture CURRENT_TIMESTAMP() after the authored DDL completes (closes the false-positive trap from Doc: V5 migration notebook storyboard for the four-guarantee decision-lineage narrative #107 cell 1.3 where the JOBS_BY_PROJECT filter would otherwise catch the test's own setup job).
3. Run build_ontology_graph(..., dataset_id=_DATASET, table_id=_TABLE, skip_property_graph=True) — extraction reads from real agent_events, materializer writes to scratch_dataset per spec, no DDL job for phase 5.
4. Assert property_graph_status == "skipped:user_requested" AND sum(rows_materialized.values()) > 0 (catches the silent-empty-graph trap).
5. Query INFORMATION_SCHEMA.JOBS_BY_PROJECT WHERE creation_time > @before AND UPPER(query) LIKE '%CREATE OR REPLACE PROPERTY GRAPH%' AND query LIKE '%{_PROJECT}.{scratch_dataset}.{spec.name}%' — assert zero rows. The graph-ref filter prevents false-fail from unrelated DDL jobs running concurrently in the same project.
6. Re-run the showcase GQL query — confirms the user's graph object survived intact.

This is exactly the contract #107 cells 1.3–1.5 will demonstrate in the V5 migration notebook when the storyboard moves to execution.

7/7 ontology-build CLI tests + 4/4 orchestrator tests pass; all 135 mocked tests in test_ontology_orchestrator.py and test_cli.py pass.

…oogleCloudPlatform#104) Lets users with their own CREATE PROPERTY GRAPH DDL — managed by Terraform, dbt, or hand-authored — populate base tables from BQ AA traces without overwriting the graph object on every run. Changes - ontology_orchestrator.build_ontology_graph gains skip_property_graph: bool = False. When True, phase 5 is not invoked: no OntologyPropertyGraphCompiler is constructed, no CREATE OR REPLACE PROPERTY GRAPH runs. - Result dict gains property_graph_status with values "created" / "failed" / "skipped:user_requested", plus skipped_reason ("user_requested") when phase 5 was skipped. - ontology-build CLI gains --skip-property-graph and threads property_graph_status through to the curated output dict so JSON consumers can distinguish "skipped" from "failed" without parsing stderr. - Exit handling: skipped_reason == "user_requested" exits 0 silently; the existing exit-1-with-error behavior is preserved for actual graph-creation failures. Tests - test_skip_property_graph_does_not_construct_compiler asserts the compiler class is never called (mock.assert_not_called) when the flag is set. - test_property_graph_status_created_on_success and test_property_graph_status_failed_on_compiler_false cover the two default-mode status values. - CLI tests cover exit 0 with status="skipped:user_requested", default skip_property_graph=False threading, and exit 1 with status="failed" on actual creation failure. 135/135 tests in test_ontology_orchestrator.py + test_cli.py pass.

google-cla · 2026-05-03T01:17:35Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

…CloudPlatform#104) Closes the two GoogleCloudPlatform#104 acceptance gaps flagged on PR GoogleCloudPlatform#108 review: (1) Docs missing - New docs/ontology/ontology-build.md documents the bq-agent-sdk ontology-build orchestrator end-to-end and the new --skip-property-graph flag. - Includes a status-field reference table mapping property_graph_status (created / failed / skipped:user_requested) to property_graph_created and CLI exit code. - Includes Python API example showing skip_property_graph=True with expected result-dict shape. (2) No gated live integration test - New TestSkipPropertyGraph class in tests/test_integration_ontology_binding.py. - Gated on RUN_LIVE_BIGQUERY_TESTS=1 like the existing live tests. - Sequence: create authored CREATE PROPERTY GRAPH directly via SQL (simulating Terraform/dbt-managed DDL), capture the post-DDL CURRENT_TIMESTAMP(), run build_ontology_graph(..., skip_property_graph=True), then query JOBS_BY_PROJECT for any 'CREATE OR REPLACE PROPERTY GRAPH' jobs in the post-timestamp window — assert zero. Also re-runs the showcase GQL query to confirm the user's graph object still works after the SDK run. - The timestamp is captured AFTER the authored DDL specifically to avoid the false-positive trap called out in GoogleCloudPlatform#107 cell 1.3.

…CloudPlatform#104) Closes the two GoogleCloudPlatform#104 acceptance gaps flagged on PR GoogleCloudPlatform#108 review: (1) Docs missing - New docs/ontology/ontology-build.md documents the bq-agent-sdk ontology-build orchestrator end-to-end and the new --skip-property-graph flag. - Includes a status-field reference table mapping property_graph_status (created / failed / skipped:user_requested) to property_graph_created and CLI exit code. - Includes Python API example showing skip_property_graph=True with expected result-dict shape. (2) No gated live integration test - New TestSkipPropertyGraph class in tests/test_integration_ontology_binding.py. - Gated on RUN_LIVE_BIGQUERY_TESTS=1 like the existing live tests. - Sequence: create authored CREATE PROPERTY GRAPH directly via SQL (simulating Terraform/dbt-managed DDL), capture the post-DDL CURRENT_TIMESTAMP(), run build_ontology_graph(..., skip_property_graph=True), then query JOBS_BY_PROJECT for any 'CREATE OR REPLACE PROPERTY GRAPH' jobs in the post-timestamp window — assert zero. Also re-runs the showcase GQL query to confirm the user's graph object still works after the SDK run. - The timestamp is captured AFTER the authored DDL specifically to avoid the false-positive trap called out in GoogleCloudPlatform#107 cell 1.3. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

caohy1988 · 2026-05-03T01:31:39Z

Follow-up: closed the two #104 AC gaps from review

Docs added — docs/ontology/ontology-build.md documents the SDK orchestrator end-to-end with a dedicated --skip-property-graph section. Includes:

Per-flag behavior table mapping property_graph_status (created / failed / skipped:user_requested) to property_graph_created and CLI exit code, so JSON consumers have a stable reference for status disambiguation.
"When to use this" section covering Terraform/dbt-managed DDL, custom graph options, and decoupled refresh cadence.
Python API example showing skip_property_graph=True with the expected result-dict keys.

Live integration test added — new TestSkipPropertyGraph::test_skip_property_graph_issues_no_create_graph_job in tests/test_integration_ontology_binding.py. Gated on RUN_LIVE_BIGQUERY_TESTS=1 matching the existing live-test pattern. Sequence:

Create authored CREATE OR REPLACE PROPERTY GRAPH directly (simulating user-managed DDL).
Capture CURRENT_TIMESTAMP() after the authored DDL completes (avoids the false-positive trap from Doc: V5 migration notebook storyboard for the four-guarantee decision-lineage narrative #107 cell 1.3 where the JOBS_BY_PROJECT filter would catch the user's own setup job).
Run build_ontology_graph(..., skip_property_graph=True) — assert property_graph_status == "skipped:user_requested".
Query INFORMATION_SCHEMA.JOBS_BY_PROJECT WHERE creation_time > @before AND UPPER(query) LIKE '%CREATE OR REPLACE PROPERTY GRAPH%' — assert zero rows.
Re-run the showcase GQL query against the user's graph — confirms the graph object survived the SDK run intact.

This is the exact contract #107 cells 1.3–1.5 will demonstrate in the V5 migration notebook once the storyboard moves to execution.

The unit-test coverage from the original commit is unchanged and still passes (135/135).

…loudPlatform#104) Addresses three review findings on PR GoogleCloudPlatform#108: (1) Live test now exercises real extraction/materialization - Pass dataset_id=_DATASET, table_id=_TABLE so extraction reads the production agent_events table where YMGO ADCP session data lives. Materializer still writes to scratch_dataset because spec entity sources arrive 3-part-qualified to binding.target.dataset via _qualify_source (resolved_spec.py:141). - Assert sum(rows_materialized.values()) > 0 to catch the silent- empty-graph trap where ontology_graph.py:683 returns an empty ExtractedGraph if extraction fails (e.g. wrong source dataset). (2) JOBS_BY_PROJECT assertion narrowed to the test's own graph - Filter by both 'CREATE OR REPLACE PROPERTY GRAPH' keyword AND the fully-qualified graph reference ({_PROJECT}.{scratch_dataset}.{spec.name}). Prevents false-fail on unrelated CREATE OR REPLACE PROPERTY GRAPH jobs running concurrently in the same project from other tests/developers. (3) docs/README.md gains a row for the new ontology-build doc. (4) New CLI test test_skip_property_graph_status_visible_in_text_format asserts property_graph_status appears in --format=text output, pinning the contract that the status field is not JSON-only. 7/7 ontology-build CLI tests pass.

…loudPlatform#104) Addresses three review findings on PR GoogleCloudPlatform#108: (1) Live test now exercises real extraction/materialization - Pass dataset_id=_DATASET, table_id=_TABLE so extraction reads the production agent_events table where YMGO ADCP session data lives. Materializer still writes to scratch_dataset because spec entity sources arrive 3-part-qualified to binding.target.dataset via _qualify_source (resolved_spec.py:141). - Assert sum(rows_materialized.values()) > 0 to catch the silent- empty-graph trap where ontology_graph.py:683 returns an empty ExtractedGraph if extraction fails (e.g. wrong source dataset). (2) JOBS_BY_PROJECT assertion narrowed to the test's own graph - Filter by both 'CREATE OR REPLACE PROPERTY GRAPH' keyword AND the fully-qualified graph reference ({_PROJECT}.{scratch_dataset}.{spec.name}). Prevents false-fail on unrelated CREATE OR REPLACE PROPERTY GRAPH jobs running concurrently in the same project from other tests/developers. (3) docs/README.md gains a row for the new ontology-build doc. (4) New CLI test test_skip_property_graph_status_visible_in_text_format asserts property_graph_status appears in --format=text output, pinning the contract that the status field is not JSON-only. 7/7 ontology-build CLI tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

caohy1988 · 2026-05-03T01:44:03Z

Review fixes folded in (commit `8d45671`)

(1) Live test now exercises real extraction/materialization. Was passing dataset_id=scratch_dataset so extraction queried scratch_dataset.agent_events (which doesn't exist). The AI extraction path silently returns an empty graph when the source table is missing (ontology_graph.py:683), so the test could pass while phases 1-4 did nothing. Fixed: now passes dataset_id=_DATASET, table_id=_TABLE so extraction reads the real agent_events table; materializer still writes to scratch_dataset because spec entity sources arrive 3-part-qualified to binding.target.dataset via _qualify_source (resolved_spec.py:141), so dataset_id doesn't influence output table location. Added assert sum(rows_materialized.values()) > 0 to catch the silent-empty trap.

(2) JOBS_BY_PROJECT assertion narrowed. Previously filtered every project-level job by LIKE '%CREATE OR REPLACE PROPERTY GRAPH%', which could false-fail on an unrelated graph-DDL job from another developer or test in the same project. Now also requires the query to contain the test's fully-qualified graph reference ({_PROJECT}.{scratch_dataset}.{spec.name}), scoping the assertion to the test's own scratch dataset.

(3) docs/README.md gains a row for ontology/ontology-build.md under Ontology Reference, so the new doc is discoverable.

(4) New CLI test test_skip_property_graph_status_visible_in_text_format asserts property_graph_status appears in --format=text output. Pins the contract that the status field is not JSON-only.

(5) PR body updated to reflect the docs + live test from commit 054e141 and the hardening from 8d45671. Tests section now lists unit / CLI / live coverage explicitly.

(6) CLA check — that's an admin/account-side gate that needs your action; I can't resolve it via PR commits.

Verified locally:

pytest tests/test_ontology_orchestrator.py tests/test_cli.py → 135 passed.
pytest tests/test_cli.py::TestOntologyBuild → 7/7 pass (was 6 before adding the text-format check).
Live test (TestSkipPropertyGraph::test_skip_property_graph_issues_no_create_graph_job) collects clean; runs only with RUN_LIVE_BIGQUERY_TESTS=1.

…dPlatform#104) Addresses three review findings on PR GoogleCloudPlatform#108: (1) Live test DDL-detection blind spot The previous filter required the regressed CREATE OR REPLACE PROPERTY GRAPH to target _PROJECT.<scratch_dataset>.<spec.name>. But if skip_property_graph regressed, the compiler would actually target _PROJECT._DATASET.<spec.name> (the orchestrator's dataset_id argument is _DATASET in this test, used for extraction of agent_events). The blind spot: a regression could fire DDL that the test would not catch. Fixed by replacing the fully-qualified-graph-ref filter with two narrower constraints that catch the regression in either dataset: - graph name (spec.name) — present in the DDL string regardless of which dataset the compiler targets - sdk_feature='ontology-gql' label — only SDK-issued property-graph jobs carry this label per ontology_property_graph.py:465; the test's setup CREATE PROPERTY GRAPH (issued via direct SQL) does not, so it does not trip the assertion (2) docs/ontology/ontology-build.md: document graph_ref limitation Added a "Known limitation" section noting that result["graph_ref"] reports the extraction dataset, not the binding's target dataset, in split source/target setups. The materialized base tables themselves still go to the binding's target dataset per the resolved spec; only the reported string is affected. (3) docs/ontology/ontology-build.md: soften DDL-options wording "additional indexes, dialect-specific options" was overreaching for BigQuery property graphs; tightened to "custom labels or other DDL details the SDK's compiler doesn't generate." 136/136 tests pass.

…dPlatform#104) Addresses three review findings on PR GoogleCloudPlatform#108: (1) Live test DDL-detection blind spot The previous filter required the regressed CREATE OR REPLACE PROPERTY GRAPH to target _PROJECT.<scratch_dataset>.<spec.name>. But if skip_property_graph regressed, the compiler would actually target _PROJECT._DATASET.<spec.name> (the orchestrator's dataset_id argument is _DATASET in this test, used for extraction of agent_events). The blind spot: a regression could fire DDL that the test would not catch. Fixed by replacing the fully-qualified-graph-ref filter with two narrower constraints that catch the regression in either dataset: - graph name (spec.name) — present in the DDL string regardless of which dataset the compiler targets - sdk_feature='ontology-gql' label — only SDK-issued property-graph jobs carry this label per ontology_property_graph.py:465; the test's setup CREATE PROPERTY GRAPH (issued via direct SQL) does not, so it does not trip the assertion (2) docs/ontology/ontology-build.md: document graph_ref limitation Added a "Known limitation" section noting that result["graph_ref"] reports the extraction dataset, not the binding's target dataset, in split source/target setups. The materialized base tables themselves still go to the binding's target dataset per the resolved spec; only the reported string is affected. (3) docs/ontology/ontology-build.md: soften DDL-options wording "additional indexes, dialect-specific options" was overreaching for BigQuery property graphs; tightened to "custom labels or other DDL details the SDK's compiler doesn't generate." 136/136 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

caohy1988 · 2026-05-03T03:10:09Z

Review fixes folded in (commit `e98f1c2`)

(1) DDL-detection blind spot fixed. The previous filter required a regressed CREATE OR REPLACE PROPERTY GRAPH to target _PROJECT.<scratch_dataset>.<spec.name>. But the orchestrator's compiler builds the DDL with dataset_id=_DATASET in this test (since the test passes _DATASET for extraction), so a regression would fire DDL targeting _PROJECT._DATASET.<spec.name> — and the old filter would miss it.

Replaced with two narrower constraints that catch a regression in either dataset:

Graph name (spec.name) — appears in the DDL string regardless of which dataset the compiler targets.
sdk_feature='ontology-gql' label — only SDK-issued property-graph jobs carry this label per ontology_property_graph.py:465. The test's setup CREATE PROPERTY GRAPH (issued via plain SQL, not via the SDK) does not get this label, so it doesn't trip the assertion. Cleaner than dataset-scoped filtering.

This catches the regression case the test was actually meant to detect.

(2) graph_ref source/target limitation documented. Added a "Known limitation" section to docs/ontology/ontology-build.md noting that result["graph_ref"] reports the extraction dataset, not the binding's target dataset, in split source/target setups. The materialized base tables themselves still go to the binding's target dataset per the resolved spec; only the reported string is affected. Worth a follow-up issue to either rename the field or compute it from binding.target — flagged here as out-of-scope for #108.

(3) Docs wording softened. "additional indexes, dialect-specific options" was overreaching for BigQuery property graphs; tightened to "custom labels or other DDL details the SDK's compiler doesn't generate."

(4) CLI live smoke test (finding #3 — non-blocker). Not added in this commit. The unit tests cover CLI wiring (property_graph_status in JSON + text output, exit codes for skip / fail / success, flag threading); the live test exercises the orchestrator path end-to-end. A literal CLI live smoke test (running subprocess.run(['bq-agent-sdk', 'ontology-build', '--skip-property-graph', ...])) is a reasonable follow-up but doesn't add coverage beyond what's already there.

Verified locally:

pytest tests/test_ontology_orchestrator.py tests/test_cli.py → 136 passed.
Live test (TestSkipPropertyGraph::test_skip_property_graph_issues_no_create_graph_job) collects clean; runs only with RUN_LIVE_BIGQUERY_TESTS=1.

) The previous comment claimed the test's setup CREATE PROPERTY GRAPH job did not carry the sdk_feature='ontology-gql' label. That was factually wrong: setup goes through OntologyPropertyGraphCompiler.create_property_graph() (line 387), which does carry the label. The test logic was already correct — the setup job is excluded by the post-setup timestamp captured in step 2, not by the label filter. The label filter excludes user-authored raw SQL DDL jobs (without SDK labels), which is its actual purpose. Only the comment needed to change. No code change.

Run bash autoformat.sh (isort + pyink). Fixes the Format check CI job that was failing on PR GoogleCloudPlatform#108. No behavior change.

…oogleCloudPlatform#105 PR 2b) Closes the user-facing surface of issue GoogleCloudPlatform#105. Builds on PR 2a's validate_binding_against_bigquery core (shipped in PR GoogleCloudPlatform#109). CLI changes (src/bigquery_agent_analytics/cli.py) Standalone command: bq-agent-sdk binding-validate --project-id ... --ontology X.yaml --binding Y.yaml [--location US] [--strict] [--format json|text|table] Loads the ontology + binding via upstream loaders, builds a google.cloud.bigquery.Client, calls validate_binding_against_bigquery(...), prints a structured JSON report on stdout. Warnings always print to stderr (one line each) so CI logs surface advisory drift even when stdout is consumed by a JSON-aware tool. Exit codes: 0 ok, 1 failure(s), 2 unexpected error (load failure, missing flags, etc). ontology-build flags: --validate-binding (default-mode pre-flight) --validate-binding-strict (strict-mode pre-flight) When set, run the validator before phase 2 (extraction). On any failure, the build short-circuits BEFORE any AI.GENERATE call fires, so authoring drift never costs tokens. Default-mode warnings print to stderr but do not block. The two flags are mutually exclusive; both are incompatible with the deprecated --spec-path form because the validator needs the unresolved Ontology + Binding pair, not a combined GraphSpec. Helper _run_binding_preflight() centralizes the load + validate + print + exit logic so it can be invoked from ontology-build without duplicating the flow. Also: --location flag added to ontology-build, threaded through to build_ontology_graph() (which has supported it on the Python side since PR GoogleCloudPlatform#108 / commit 292320b). The CLI was previously building without forwarding it. Tests (tests/test_cli.py) TestBindingValidate (6 tests): - test_clean_validation_exits_zero - test_failures_exit_one_with_payload - test_warnings_print_to_stderr_but_do_not_flip_exit - test_strict_flag_threaded_through - test_missing_required_flags_exit_2 - test_load_failure_exits_two TestOntologyBuildValidateBindingFlag (5 tests): - test_validate_binding_short_circuits_on_failure_before_build (asserts build_ontology_graph is NEVER called when validation fails — extraction does not start) - test_validate_binding_strict_short_circuits_on_nullable_keys - test_validate_binding_clean_proceeds_to_build - test_validate_binding_with_spec_path_rejected - test_both_flags_rejected (mutual exclusion) All tests use patched validators (no live BQ); the live integration test for the full flow lives in PR 2a (GoogleCloudPlatform#109). 253/253 tests pass across the touched test files. Docs New: docs/ontology/binding-validation.md - When to run (binding authoring, CI gating, ontology-build). - Standalone CLI examples (default + strict). - ontology-build integration examples. - Failure-code reference table covering all 7 default codes plus the strict-only KEY_COLUMN_NULLABLE. - CI usage pattern (GitHub Actions). - Python API example. - Cross-links to ontology-build.md, binding.md, and GoogleCloudPlatform#76's planned post-extraction validator. Updated: docs/README.md - Added binding-validation.md to the Ontology Reference table. What's NOT in this PR - Live integration test for the CLI surface (the validator itself has live coverage from PR 2a's TestBindingValidationLive). A literal subprocess-style CLI smoke test could land as a follow-up if someone wants it, but the unit tests cover wiring + exit codes + flag threading comprehensively.

…ate-binding flags (#105 PR 2b) (#112) * feat(cli): add bq-agent-sdk binding-validate + ontology-build flags (#105 PR 2b) Closes the user-facing surface of issue #105. Builds on PR 2a's validate_binding_against_bigquery core (shipped in PR #109). CLI changes (src/bigquery_agent_analytics/cli.py) Standalone command: bq-agent-sdk binding-validate --project-id ... --ontology X.yaml --binding Y.yaml [--location US] [--strict] [--format json|text|table] Loads the ontology + binding via upstream loaders, builds a google.cloud.bigquery.Client, calls validate_binding_against_bigquery(...), prints a structured JSON report on stdout. Warnings always print to stderr (one line each) so CI logs surface advisory drift even when stdout is consumed by a JSON-aware tool. Exit codes: 0 ok, 1 failure(s), 2 unexpected error (load failure, missing flags, etc). ontology-build flags: --validate-binding (default-mode pre-flight) --validate-binding-strict (strict-mode pre-flight) When set, run the validator before phase 2 (extraction). On any failure, the build short-circuits BEFORE any AI.GENERATE call fires, so authoring drift never costs tokens. Default-mode warnings print to stderr but do not block. The two flags are mutually exclusive; both are incompatible with the deprecated --spec-path form because the validator needs the unresolved Ontology + Binding pair, not a combined GraphSpec. Helper _run_binding_preflight() centralizes the load + validate + print + exit logic so it can be invoked from ontology-build without duplicating the flow. Also: --location flag added to ontology-build, threaded through to build_ontology_graph() (which has supported it on the Python side since PR #108 / commit 292320b). The CLI was previously building without forwarding it. Tests (tests/test_cli.py) TestBindingValidate (6 tests): - test_clean_validation_exits_zero - test_failures_exit_one_with_payload - test_warnings_print_to_stderr_but_do_not_flip_exit - test_strict_flag_threaded_through - test_missing_required_flags_exit_2 - test_load_failure_exits_two TestOntologyBuildValidateBindingFlag (5 tests): - test_validate_binding_short_circuits_on_failure_before_build (asserts build_ontology_graph is NEVER called when validation fails — extraction does not start) - test_validate_binding_strict_short_circuits_on_nullable_keys - test_validate_binding_clean_proceeds_to_build - test_validate_binding_with_spec_path_rejected - test_both_flags_rejected (mutual exclusion) All tests use patched validators (no live BQ); the live integration test for the full flow lives in PR 2a (#109). 253/253 tests pass across the touched test files. Docs New: docs/ontology/binding-validation.md - When to run (binding authoring, CI gating, ontology-build). - Standalone CLI examples (default + strict). - ontology-build integration examples. - Failure-code reference table covering all 7 default codes plus the strict-only KEY_COLUMN_NULLABLE. - CI usage pattern (GitHub Actions). - Python API example. - Cross-links to ontology-build.md, binding.md, and #76's planned post-extraction validator. Updated: docs/README.md - Added binding-validation.md to the Ontology Reference table. What's NOT in this PR - Live integration test for the CLI surface (the validator itself has live coverage from PR 2a's TestBindingValidationLive). A literal subprocess-style CLI smoke test could land as a follow-up if someone wants it, but the unit tests cover wiring + exit codes + flag threading comprehensively. * fix+test+docs: lowercase JSON codes, changelog, location + warning-only paths (#105 PR 2b) Four review findings folded in. (1) Failure-code reference now shows lowercase JSON values The CLI emits f.code.value (e.g. 'missing_table'), not the enum attribute name ('MISSING_TABLE'). Docs previously listed the uppercase form, so users copying the table into jq filters would silently match nothing. The reference table now has both columns (Python FailureCode + JSON value), with a sample jq filter using the lowercase form. (2) CHANGELOG entry under [Unreleased] This PR adds a public CLI command (binding-validate), two public ontology-build flags (--validate-binding[-strict]), --location on ontology-build, and the validate_binding_against_bigquery Python API. All four are now called out under [Unreleased] / Added. (3) Test the warning-only path through ontology-build The new test test_validate_binding_warnings_only_proceeds_to_build patches the validator to return a BindingValidationReport with only warnings (no failures), runs ontology-build --validate-binding, and asserts: - exit code 0 - build_ontology_graph IS called (warnings don't short-circuit) - WARN: key_column_nullable appears in the CLI output This covers _run_binding_preflight()'s default-mode advisory branch, complementing the existing failure-short-circuit and clean-success tests. (4) --location now has tests and a doc note Two new tests: - TestBindingValidate::test_location_threaded_to_bigquery_client confirms binding-validate --location=EU constructs a BQ client with location='EU'. - TestOntologyBuildValidateBindingFlag:: test_location_threaded_through_orchestrator confirms ontology-build --location=EU forwards to build_ontology_graph so the orchestrator's BQ client targets the EU multi-region. docs/ontology/ontology-build.md: new 'BigQuery location' section covering --location plus a 'Pre-flight binding validation' section referencing binding-validation.md for the full --validate-binding flag reference. 256/256 tests pass (test_binding_validation.py + test_cli.py + test_ontology_orchestrator.py + test_ontology_materializer.py + test_resolved_spec.py). Autoformat clean. Optional nit (deferred, not blocking): _run_binding_preflight loads ontology + binding once for validation, _load_spec_from_args reloads them for resolve(). Worth a small helper that returns the loaded objects once, but acceptable as-is for PR 2b scope. * docs: avoid naming INFORMATION_SCHEMA as preflight mechanism (#105 PR 2b) Two wording fixes; no behavior change. The validator uses bq_client.get_table(...) metadata calls (binding_validation.py:309), not direct INFORMATION_SCHEMA queries. Two places said the latter: (1) docs/ontology/ontology-build.md: 'pre-flight validator queries INFORMATION_SCHEMA in the same region' -> 'uses the BigQuery client with the requested location to fetch each bound table's metadata'. (2) tests/test_cli.py docstring on test_location_threaded_to_bigquery_client: same correction. The test asserts the location threads through; the wording for *why* location matters now matches the actual mechanism. 256/256 tests still pass. Autoformat clean.

caohy1988 force-pushed the feat/skip-property-graph branch from 9fd8e85 to e44967b Compare May 3, 2026 06:19

style: apply autoformat to test files

fcd7d9c

Run bash autoformat.sh (isort + pyink). Fixes the Format check CI job that was failing on PR GoogleCloudPlatform#108. No behavior change.

haiyuan-eng-google merged commit 292320b into GoogleCloudPlatform:main May 3, 2026
9 checks passed

caohy1988 mentioned this pull request May 4, 2026

feat(cli): add bq-agent-sdk binding-validate + ontology-build --validate-binding flags (#105 PR 2b) #112

Merged

caohy1988 mentioned this pull request May 5, 2026

feat(ontology): add validate_extracted_graph (#76) #113

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ontology): add --skip-property-graph for user-owned graph DDL (#104)#108

feat(ontology): add --skip-property-graph for user-owned graph DDL (#104)#108
haiyuan-eng-google merged 6 commits intoGoogleCloudPlatform:mainfrom
caohy1988:feat/skip-property-graph

caohy1988 commented May 3, 2026 •

edited

Loading

Uh oh!

google-cla Bot commented May 3, 2026

Uh oh!

caohy1988 commented May 3, 2026

Uh oh!

caohy1988 commented May 3, 2026

Uh oh!

caohy1988 commented May 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

caohy1988 commented May 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Tests

Next

Uh oh!

google-cla Bot commented May 3, 2026

Uh oh!

caohy1988 commented May 3, 2026

Follow-up: closed the two #104 AC gaps from review

Uh oh!

caohy1988 commented May 3, 2026

Review fixes folded in (commit 8d45671)

Uh oh!

caohy1988 commented May 3, 2026

Review fixes folded in (commit e98f1c2)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

caohy1988 commented May 3, 2026 •

edited

Loading

Review fixes folded in (commit `8d45671`)

Review fixes folded in (commit `e98f1c2`)