feat(cli): add bq-agent-sdk binding-validate + ontology-build --validate-binding flags (#105 PR 2b) (#112)

caohy1988 · web-flow · commit 4c3a293dbbdf · 2026-05-04T23:48:24.000-07:00
* feat(cli): add bq-agent-sdk binding-validate + ontology-build flags (#105 PR 2b) Closes the user-facing surface of issue #105. Builds on PR 2a's validate_binding_against_bigquery core (shipped in PR #109). CLI changes (src/bigquery_agent_analytics/cli.py) Standalone command: bq-agent-sdk binding-validate --project-id ... --ontology X.yaml --binding Y.yaml [--location US] [--strict] [--format json|text|table] Loads the ontology + binding via upstream loaders, builds a google.cloud.bigquery.Client, calls validate_binding_against_bigquery(...), prints a structured JSON report on stdout. Warnings always print to stderr (one line each) so CI logs surface advisory drift even when stdout is consumed by a JSON-aware tool. Exit codes: 0 ok, 1 failure(s), 2 unexpected error (load failure, missing flags, etc). ontology-build flags: --validate-binding (default-mode pre-flight) --validate-binding-strict (strict-mode pre-flight) When set, run the validator before phase 2 (extraction). On any failure, the build short-circuits BEFORE any AI.GENERATE call fires, so authoring drift never costs tokens. Default-mode warnings print to stderr but do not block. The two flags are mutually exclusive; both are incompatible with the deprecated --spec-path form because the validator needs the unresolved Ontology + Binding pair, not a combined GraphSpec. Helper _run_binding_preflight() centralizes the load + validate + print + exit logic so it can be invoked from ontology-build without duplicating the flow. Also: --location flag added to ontology-build, threaded through to build_ontology_graph() (which has supported it on the Python side since PR #108 / commit 292320b). The CLI was previously building without forwarding it. Tests (tests/test_cli.py) TestBindingValidate (6 tests): - test_clean_validation_exits_zero - test_failures_exit_one_with_payload - test_warnings_print_to_stderr_but_do_not_flip_exit - test_strict_flag_threaded_through - test_missing_required_flags_exit_2 - test_load_failure_exits_two TestOntologyBuildValidateBindingFlag (5 tests): - test_validate_binding_short_circuits_on_failure_before_build (asserts build_ontology_graph is NEVER called when validation fails — extraction does not start) - test_validate_binding_strict_short_circuits_on_nullable_keys - test_validate_binding_clean_proceeds_to_build - test_validate_binding_with_spec_path_rejected - test_both_flags_rejected (mutual exclusion) All tests use patched validators (no live BQ); the live integration test for the full flow lives in PR 2a (#109). 253/253 tests pass across the touched test files. Docs New: docs/ontology/binding-validation.md - When to run (binding authoring, CI gating, ontology-build). - Standalone CLI examples (default + strict). - ontology-build integration examples. - Failure-code reference table covering all 7 default codes plus the strict-only KEY_COLUMN_NULLABLE. - CI usage pattern (GitHub Actions). - Python API example. - Cross-links to ontology-build.md, binding.md, and #76's planned post-extraction validator. Updated: docs/README.md - Added binding-validation.md to the Ontology Reference table. What's NOT in this PR - Live integration test for the CLI surface (the validator itself has live coverage from PR 2a's TestBindingValidationLive). A literal subprocess-style CLI smoke test could land as a follow-up if someone wants it, but the unit tests cover wiring + exit codes + flag threading comprehensively. * fix+test+docs: lowercase JSON codes, changelog, location + warning-only paths (#105 PR 2b) Four review findings folded in. (1) Failure-code reference now shows lowercase JSON values The CLI emits f.code.value (e.g. 'missing_table'), not the enum attribute name ('MISSING_TABLE'). Docs previously listed the uppercase form, so users copying the table into jq filters would silently match nothing. The reference table now has both columns (Python FailureCode + JSON value), with a sample jq filter using the lowercase form. (2) CHANGELOG entry under [Unreleased] This PR adds a public CLI command (binding-validate), two public ontology-build flags (--validate-binding[-strict]), --location on ontology-build, and the validate_binding_against_bigquery Python API. All four are now called out under [Unreleased] / Added. (3) Test the warning-only path through ontology-build The new test test_validate_binding_warnings_only_proceeds_to_build patches the validator to return a BindingValidationReport with only warnings (no failures), runs ontology-build --validate-binding, and asserts: - exit code 0 - build_ontology_graph IS called (warnings don't short-circuit) - WARN: key_column_nullable appears in the CLI output This covers _run_binding_preflight()'s default-mode advisory branch, complementing the existing failure-short-circuit and clean-success tests. (4) --location now has tests and a doc note Two new tests: - TestBindingValidate::test_location_threaded_to_bigquery_client confirms binding-validate --location=EU constructs a BQ client with location='EU'. - TestOntologyBuildValidateBindingFlag:: test_location_threaded_through_orchestrator confirms ontology-build --location=EU forwards to build_ontology_graph so the orchestrator's BQ client targets the EU multi-region. docs/ontology/ontology-build.md: new 'BigQuery location' section covering --location plus a 'Pre-flight binding validation' section referencing binding-validation.md for the full --validate-binding flag reference. 256/256 tests pass (test_binding_validation.py + test_cli.py + test_ontology_orchestrator.py + test_ontology_materializer.py + test_resolved_spec.py). Autoformat clean. Optional nit (deferred, not blocking): _run_binding_preflight loads ontology + binding once for validation, _load_spec_from_args reloads them for resolve(). Worth a small helper that returns the loaded objects once, but acceptable as-is for PR 2b scope. * docs: avoid naming INFORMATION_SCHEMA as preflight mechanism (#105 PR 2b) Two wording fixes; no behavior change. The validator uses bq_client.get_table(...) metadata calls (binding_validation.py:309), not direct INFORMATION_SCHEMA queries. Two places said the latter: (1) docs/ontology/ontology-build.md: 'pre-flight validator queries INFORMATION_SCHEMA in the same region' -> 'uses the BigQuery client with the requested location to fetch each bound table's metadata'. (2) tests/test_cli.py docstring on test_location_threaded_to_bigquery_client: same correction. The test asserts the location threads through; the wording for *why* location matters now matches the actual mechanism. 256/256 tests still pass. Autoformat clean.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -7,6 +7,34 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ## [Unreleased]
 
+### Added
+
+- **`bq-agent-sdk binding-validate` CLI** — pre-flight validator that
+  checks whether a binding YAML's referenced BigQuery tables
+  physically exist with the columns and types the binding requires,
+  before extraction wastes ``AI.GENERATE`` tokens. Emits a structured
+  JSON report (failures + warnings) and exits 0 / 1 / 2. Supports
+  `--strict` to escalate `KEY_COLUMN_NULLABLE` warnings to hard
+  failures. See [issue #105](https://github.com/GoogleCloudPlatform/BigQuery-Agent-Analytics-SDK/issues/105)
+  and `docs/ontology/binding-validation.md`.
+- **`bq-agent-sdk ontology-build --validate-binding` and
+  `--validate-binding-strict`** opt-in flags. Run the binding
+  pre-flight before phase 2 (extraction). On any failure, the build
+  short-circuits before any `AI.GENERATE` call fires; default-mode
+  warnings print to stderr but don't block. The two flags are
+  mutually exclusive; both incompatible with the deprecated
+  `--spec-path` form because the validator needs the unresolved
+  `Ontology` + `Binding` pair.
+- **`bq-agent-sdk ontology-build --location`** — BigQuery location
+  (e.g. `US`, `EU`) threaded through to `build_ontology_graph()`.
+  The Python API has supported `location` since 0.2.3; this adds
+  the matching CLI flag.
+- **`validate_binding_against_bigquery(...)` Python API** in
+  `bigquery_agent_analytics.binding_validation`. Same surface the
+  CLI calls: takes `Ontology` + `Binding` + `bq_client`, returns a
+  `BindingValidationReport` with `failures` + `warnings` lists and
+  an `ok` property. Issue [#105](https://github.com/GoogleCloudPlatform/BigQuery-Agent-Analytics-SDK/issues/105).
+
 ## [0.2.3] - 2026-04-27
 
 ### Fixed
diff --git a/docs/README.md b/docs/README.md
@@ -37,6 +37,7 @@ architecture, rationale, and implementation plans behind key SDK features.
 | [ontology/cli.md](ontology/cli.md) | CLI design for the `gm` tool (validate, compile, import-owl) |
 | [ontology/owl-import.md](ontology/owl-import.md) | OWL import — converting OWL ontologies to YAML format |
 | [ontology/ontology-build.md](ontology/ontology-build.md) | `bq-agent-sdk ontology-build` orchestrator + `--skip-property-graph` reference |
+| [ontology/binding-validation.md](ontology/binding-validation.md) | `bq-agent-sdk binding-validate` pre-flight + `ontology-build --validate-binding[-strict]` reference |
 
 ## Deployment Surfaces
 
diff --git a/docs/ontology/binding-validation.md b/docs/ontology/binding-validation.md
@@ -0,0 +1,154 @@
+# Binding Validation — Pre-flight Reference
+
+`bq-agent-sdk binding-validate` checks whether the BigQuery tables a binding YAML points at physically exist with the columns and types the binding requires, **before** the SDK starts extraction. Catches the most common authoring error (binding YAML drifted out of sync with physical tables) before extraction wastes `AI.GENERATE` tokens.
+
+## When to run it
+
+- **Before any `ontology-build`** that points at user-pre-defined tables (Terraform / dbt / hand-authored DDL). Use either:
+  - `bq-agent-sdk binding-validate ...` standalone, or
+  - `bq-agent-sdk ontology-build --validate-binding ...` to gate the build.
+- **In CI** as a pre-flight check that runs on every binding YAML change. Strict mode (`--strict`) is appropriate here — it forces every primary-key column to be REQUIRED.
+- **Locally during binding authoring** to catch typos and column-name drift before the first build.
+
+## Standalone CLI
+
+```bash
+bq-agent-sdk binding-validate \
+  --project-id my-project \
+  --ontology my.ontology.yaml \
+  --binding my-bq-prod.binding.yaml \
+  --location US
+```
+
+Output is a JSON report on stdout, plus advisory warnings printed to stderr. Exit codes:
+
+| Exit | Meaning |
+|---|---|
+| `0` | `report.ok` is True. No failures. Warnings (if any) are advisory. |
+| `1` | `report.ok` is False. At least one failure. |
+| `2` | Unexpected error: missing flags, ontology/binding load failure, etc. |
+
+### Strict mode
+
+```bash
+bq-agent-sdk binding-validate \
+  --project-id my-project \
+  --ontology my.ontology.yaml \
+  --binding my-bq-prod.binding.yaml \
+  --strict
+```
+
+Strict mode escalates `KEY_COLUMN_NULLABLE` warnings into hard failures. Use this in CI when you want every primary-key and endpoint-key column to be REQUIRED. Default mode does **not** require `NOT NULL` on key columns because the SDK's own `OntologyMaterializer.create_tables()` emits NULLABLE keys (`ontology_materializer.py:206`) — a default-mode hard failure here would reject SDK-created tables.
+
+## `ontology-build` integration
+
+Two opt-in flags gate the build on a passing pre-flight:
+
+```bash
+bq-agent-sdk ontology-build \
+  --project-id my-project \
+  --dataset-id my-dataset \
+  --ontology my.ontology.yaml \
+  --binding my-bq-prod.binding.yaml \
+  --session-ids sess-1,sess-2 \
+  --validate-binding         # default mode: warnings advisory; failures short-circuit
+```
+
+Or in strict mode:
+
+```bash
+bq-agent-sdk ontology-build \
+  ... \
+  --validate-binding-strict  # NULLABLE keys escalate to failures
+```
+
+When the validator reports any failure, **the build short-circuits before any `AI.GENERATE` call fires** — no extraction tokens are spent. Default-mode warnings print to stderr but do not block.
+
+The two flags are mutually exclusive. Both are incompatible with the deprecated `--spec-path` form because the validator needs the unresolved `Ontology` + `Binding` pair, not a combined `GraphSpec`.
+
+## Failure-code reference
+
+The validator returns a `BindingValidationReport` with two collections: `failures` (always blocking) and `warnings` (advisory in default mode, escalated under `--strict`). Each entry carries `code`, `binding_element`, `binding_path`, `bq_ref`, `expected`, `observed`, and `detail`.
+
+The CLI emits failure codes as **lowercase strings** in the JSON report (e.g., `"missing_table"`, not `"MISSING_TABLE"`). Use the lowercase form when filtering with `jq` or other JSON tools. The Python `FailureCode` enum exposes both forms — `FailureCode.MISSING_TABLE.value == "missing_table"`.
+
+### Default-mode failure codes (always blocking)
+
+| Python `FailureCode` | JSON `code` value | What it means | Typical fix |
+|---|---|---|---|
+| `MISSING_TABLE` | `"missing_table"` | The bound `source` table doesn't exist in BigQuery. | Create the table, or fix the binding's `source`. |
+| `MISSING_COLUMN` | `"missing_column"` | A bound column (property, key, or SDK metadata column like `session_id` / `extracted_at`) doesn't exist on the table. | Add the column, or fix the binding's `column` mapping. |
+| `TYPE_MISMATCH` | `"type_mismatch"` | A bound column exists but its BigQuery type doesn't match the ontology-derived expected type. | Change the BQ column's type, or change the ontology property's type. |
+| `ENDPOINT_TYPE_MISMATCH` | `"endpoint_type_mismatch"` | An edge's `from_columns` or `to_columns` entry disagrees with the referenced node's primary-key column type. Two flavors: spec-level (edge vs ontology) and physical (edge vs node's actual storage). | Align the edge endpoint's type with the node's primary-key type. |
+| `UNEXPECTED_REPEATED_MODE` | `"unexpected_repeated_mode"` | A scalar property/key column is in REPEATED (ARRAY) mode in BigQuery. | Restructure the table — the SDK can't bind scalar properties to ARRAY columns. |
+| `MISSING_DATASET` | `"missing_dataset"` | The bound table's dataset doesn't exist. | Create the dataset, or fix the binding's `target.dataset` / fully-qualified `source`. |
+| `INSUFFICIENT_PERMISSIONS` | `"insufficient_permissions"` | The calling identity can't read the table. | Grant `bigquery.tables.get` on the dataset, or run as an identity that already has it. |
+
+### Strict-only code (warning by default, failure under `--strict`)
+
+| Python `FailureCode` | JSON `code` value | What it means | Why it's strict-only |
+|---|---|---|---|
+| `KEY_COLUMN_NULLABLE` | `"key_column_nullable"` | A primary-key or endpoint-key column is in NULLABLE mode. | The SDK's own `CREATE TABLE IF NOT EXISTS` DDL emits NULLABLE key columns. A default-mode hard failure here would reject SDK-created tables. Use `--strict` in CI to enforce REQUIRED keys when you control the DDL. |
+
+Example `jq` filter for failed builds:
+
+```bash
+bq-agent-sdk binding-validate ... --format=json \
+  | jq '.failures[] | select(.code == "missing_column") | .bq_ref'
+```
+
+## CI usage pattern
+
+```yaml
+# .github/workflows/binding-validation.yml
+- name: Pre-flight binding validation
+  run: |
+    bq-agent-sdk binding-validate \
+      --project-id ${{ secrets.GCP_PROJECT }} \
+      --ontology my.ontology.yaml \
+      --binding my-bq-prod.binding.yaml \
+      --location US \
+      --strict
+```
+
+A failed validation exits 1 and the CI step fails. Combine with the matching `ontology-build --validate-binding-strict` in your deploy step so the same gate runs at build time.
+
+## Python API
+
+The CLI is a thin wrapper around `validate_binding_against_bigquery`. For programmatic use:
+
+```python
+from google.cloud import bigquery
+from bigquery_ontology import load_ontology, load_binding
+from bigquery_agent_analytics.binding_validation import (
+    validate_binding_against_bigquery,
+    FailureCode,
+)
+
+ontology = load_ontology("my.ontology.yaml")
+binding = load_binding("my-bq-prod.binding.yaml", ontology=ontology)
+client = bigquery.Client(project="my-project", location="US")
+
+report = validate_binding_against_bigquery(
+    ontology=ontology,
+    binding=binding,
+    bq_client=client,
+    strict=False,
+)
+
+if not report.ok:
+    for f in report.failures:
+        print(f"{f.code.value} at {f.binding_path} ({f.bq_ref}): {f.detail}")
+    raise SystemExit(1)
+
+for w in report.warnings:
+    print(f"WARN: {w.code.value} at {w.binding_path}")
+```
+
+`report.ok` returns `True` iff `report.failures` is empty — warnings do not flip it. Under `strict=True`, `KEY_COLUMN_NULLABLE` warnings become failures with the same code and `report.warnings` is empty (escalated, not duplicated).
+
+## Related
+
+- `docs/ontology/ontology-build.md` — the orchestrator the validator gates.
+- `docs/ontology/binding.md` — the binding model the validator validates.
+- Issue [#76](https://github.com/GoogleCloudPlatform/BigQuery-Agent-Analytics-SDK/issues/76) — the planned post-extraction `validate_extracted_graph` validator (different phase, different inputs).
diff --git a/docs/ontology/ontology-build.md b/docs/ontology/ontology-build.md
@@ -53,7 +53,33 @@ The CLI's `property_graph_status` field has three values:
 
 Without `--skip-property-graph`, the existing exit-1 behavior on graph-create failure is preserved exactly.
 
-## When to use this
+## BigQuery location
+
+`--location` (e.g. `--location=US`, `--location=EU`) is forwarded to the orchestrator's BigQuery client. Required when the bound tables live outside the default location, especially in combination with `--validate-binding[-strict]` (the pre-flight validator uses the BigQuery client with the requested location to fetch each bound table's metadata).
+
+```
+bq-agent-sdk ontology-build \
+  --project-id my-project \
+  --dataset-id my-dataset \
+  --ontology my.ontology.yaml \
+  --binding my-bq-prod.binding.yaml \
+  --session-ids sess-1 \
+  --location EU \
+  --validate-binding
+```
+
+## Pre-flight binding validation
+
+`--validate-binding` and `--validate-binding-strict` run the binding validator before extraction, short-circuiting the build if the binding YAML drifted from the physical tables. See [binding-validation.md](binding-validation.md) for the full reference.
+
+```
+bq-agent-sdk ontology-build ... --validate-binding         # default mode
+bq-agent-sdk ontology-build ... --validate-binding-strict  # strict mode
+```
+
+The two flags are mutually exclusive and incompatible with the deprecated `--spec-path` form.
+
+## When to use `--skip-property-graph`
 
 - **You already manage `CREATE PROPERTY GRAPH` in Terraform / dbt / a SQL file.** The SDK's `CREATE OR REPLACE PROPERTY GRAPH` would clobber your DDL on every run.
 - **Your property graph definition uses DDL details the SDK compiler doesn't emit.** You hand-authored the graph DDL to express custom labels or other DDL details the SDK's compiler doesn't generate.
diff --git a/src/bigquery_agent_analytics/cli.py b/src/bigquery_agent_analytics/cli.py
diff --git a/tests/test_cli.py b/tests/test_cli.py