feat(ontology): add validate_binding_against_bigquery (#105 PR 2a) by caohy1988 · Pull Request #109 · GoogleCloudPlatform/BigQuery-Agent-Analytics-SDK

caohy1988 · 2026-05-03T06:39:23Z

Implements #105 PR 2a per the working plan on #96: the validator core. PR 2b will follow with the user-facing CLI surface and docs.

What this is

Pre-flight validator that checks whether a binding YAML's referenced BigQuery tables physically exist with the columns and types the binding requires, before the SDK starts extraction. Catches the most common authoring error (binding YAML drifted out of sync with physical tables) before extraction wastes AI.GENERATE tokens.

Different from #76's validate_extracted_graph:

	#105 (this PR)	#76
Inputs	`Ontology` + `Binding` + live `bq_client`	`ResolvedGraph` + `ExtractedGraph`
Phase	Pre-extraction	Post-extraction, pre-materialization
Surfaces	binding ↔ BigQuery schema drift	extractor output ↔ ontology spec drift

Both expose the same public report ergonomics (ok / failures / typed codes) but keep separate Failure/Warning types because context fields differ — per the working plan's cross-PR consistency note on #96.

Surface

from bigquery_agent_analytics.binding_validation import (
    validate_binding_against_bigquery,
    BindingValidationReport,
    BindingValidationFailure,
    BindingValidationWarning,
    FailureCode,
)

report = validate_binding_against_bigquery(
    ontology=loaded_ontology,
    binding=loaded_binding,
    bq_client=bigquery.Client(project="my-project", location="US"),
    strict=False,
)

if not report.ok:
    for f in report.failures:
        print(f)
for w in report.warnings:
    print(f"WARN: {w}")

FailureCode (8 codes):

7 default-mode failures: MISSING_TABLE, MISSING_COLUMN, TYPE_MISMATCH, ENDPOINT_TYPE_MISMATCH, UNEXPECTED_REPEATED_MODE, MISSING_DATASET, INSUFFICIENT_PERMISSIONS
1 strict-only: KEY_COLUMN_NULLABLE — emits BindingValidationWarning by default; BindingValidationFailure under strict=True. The SDK's own CREATE TABLE IF NOT EXISTS DDL emits NULLABLE keys (ontology_materializer.py:206), so a default-mode hard failure on this code would reject SDK-created tables.

Internal flow

resolve(ontology, binding) → ResolvedGraph so the validator honors fully-qualified entity.source overrides via _qualify_source (resolved_spec.py:141) — cross-project bindings work.
Per entity: get_table → column-name index → per-property type/mode checks → per-key-column REPEATED + strict-only NULLABLE check → SDK metadata-column checks (session_id STRING + extracted_at TIMESTAMP, both written unconditionally by the materializer).
Per relationship: same property + metadata checks plus dual ENDPOINT_TYPE_MISMATCH checks for from_columns/to_columns:
- Spec-level: edge BQ type vs ontology-derived expected SDK type.
- Physical cross-table: edge BQ type vs the referenced node table's actual key BQ type — fires only when the node has drifted from spec, so the common edge-only-drift case produces a single mismatch (no double-reporting).

binding_path indices reflect the binding YAML's own ordering (built from binding.entities[i].properties), not the ResolvedEntity's ontology / effective-property order. Tooling can navigate to the exact YAML line the user wrote.

Type compatibility uses the materializer's _DDL_TYPE_MAP (ontology_materializer.py:125) so consistency with SDK-generated DDL is automatic. Legacy BQ aliases (INTEGER/FLOAT/BOOLEAN) accepted.

Tests

25 unit tests (all pass against a fake BQ client):

TestSdkCreatedTablesRegression — SDK-create_tables() output validates clean by default; strict mode surfaces NULLABLE keys as failures; type expectations come directly from _DDL_TYPE_MAP (forces validator updates whenever the materializer's map changes).
One positive case per failure code: TestMissingTable, TestMissingColumn, TestTypeMismatch (+ legacy-alias acceptance), TestEndpointTypeMismatch, TestEndpointPhysicalCrossTableCheck, TestUnexpectedRepeatedMode, TestMissingDataset, TestInsufficientPermissions, TestKeyColumnNullable (default warning + strict failure + required-keys-clean).
TestMetadataColumns — missing session_id, missing extracted_at, wrong-type metadata.
TestTypeMapExhaustiveCoverage::test_every_ddl_type_is_in_compatible_bq_types — exhaustive _DDL_TYPE_MAP ↔ _COMPATIBLE_BQ_TYPES coverage.
TestCompositeKey — two-column primary keys, including positional second-column type mismatch.
TestBindingPathYamlOrder — paths reflect binding YAML order (not resolved order).
TestCrossProjectSource — fully-qualified entity.source validates against its own project.
TestReportShape — failure carries binding_element / binding_path / bq_ref / detail; ok is failures-empty.

1 live integration test (tests/test_integration_ontology_binding.py::TestBindingValidationLive::test_validator_end_to_end_against_real_bigquery, gated on RUN_LIVE_BIGQUERY_TESTS=1):

Self-contained (function-scope scratch dataset; destructive ALTER doesn't bleed into other live tests).
Phase 1: real OntologyMaterializer.create_tables() against BQ.
Phase 2: default-mode validation → report.ok=True with 4 advisory KEY_COLUMN_NULLABLE warnings.
Phase 3: strict mode → 4 KEY_COLUMN_NULLABLE failures, warnings=() (escalated, not duplicated).
Phase 4: real ALTER TABLE ... DROP COLUMN confidence → 1 MISSING_COLUMN failure at binding.entities[0].properties[1].column.

Verified locally: pytest tests/test_binding_validation.py tests/test_ontology_materializer.py tests/test_resolved_spec.py tests/test_integration_ontology_binding.py → 106 passed, 6 skipped (live tests skip without env var). Live test passed in 13.24s when run with RUN_LIVE_BIGQUERY_TESTS=1 GOOGLE_CLOUD_PROJECT=test-project-0728-467323.

Autoformat (bash autoformat.sh) clean.

What's NOT in this PR (PR 2b)

bq-agent-sdk binding-validate [--strict] standalone CLI command.
bq-agent-sdk ontology-build --validate-binding and --validate-binding-strict opt-in flags.
docs/ontology/binding-validation.md user-facing documentation.

(Originally PR 2b also owned the live integration test; that scope moved into this PR after on-demand verification against real GCP, so 2b's remaining footprint is just the CLI surface and docs.)

Per the working plan, PR 2b can start as soon as this lands.

…tform#105 PR 2a) Pre-flight validator that checks whether a binding YAML's referenced BigQuery tables physically exist with the columns and types the binding requires, before extraction wastes AI.GENERATE tokens. Different from validate_extracted_graph in GoogleCloudPlatform#76: that one validates extracted graph output against the ResolvedGraph spec. This one validates the binding against live BigQuery schemas. The two share the same public report ergonomics (ok / failures / typed codes) but keep separate Failure/Warning types because their context fields differ (binding_path/bq_ref here vs. node_id/edge_id/event_id/ FallbackScope there) — per the working plan's cross-PR consistency note on GoogleCloudPlatform#96. Module: src/bigquery_agent_analytics/binding_validation.py Public surface: - validate_binding_against_bigquery(ontology, binding, bq_client, strict=False) -> BindingValidationReport - BindingValidationReport(failures, warnings); ok property is True iff failures is empty (warnings do not flip ok). - BindingValidationFailure / BindingValidationWarning carry code, binding_element, binding_path (binding.entities[N].properties[M] .column style), bq_ref, expected, observed, detail. - FailureCode enum: 7 default-mode codes (MISSING_TABLE, MISSING_COLUMN, TYPE_MISMATCH, ENDPOINT_TYPE_MISMATCH, UNEXPECTED_REPEATED_MODE, MISSING_DATASET, INSUFFICIENT_PERMISSIONS) + 1 strict-only code (KEY_COLUMN_NULLABLE). Internal flow: 1. resolve(ontology, binding) -> ResolvedGraph (so the validator honors fully-qualified entity.source overrides via _qualify_source at resolved_spec.py:141 — cross-project bindings work). 2. For each entity: get_table -> column-name index -> per-property type / mode checks -> per-key-column REPEATED + strict-only nullable check. 3. For each relationship: same property checks plus ENDPOINT_TYPE_MISMATCH for from_columns/to_columns whose BQ types do not match the referenced entity's primary-key column types. Type compatibility uses the materializer's _DDL_TYPE_MAP (ontology_materializer.py:125) so consistency with SDK-generated DDL is automatic. Legacy BQ aliases (INTEGER/FLOAT/BOOLEAN) accepted. Strict-mode contract: - strict=False (default): KEY_COLUMN_NULLABLE emits a BindingValidationWarning; report.ok stays True. SDK-created tables (CREATE TABLE IF NOT EXISTS without NOT NULL on key columns) must validate clean by default. - strict=True: same checks emit BindingValidationFailure with the same code; report.ok flips to False. Warnings are escalated, not duplicated. Tests (16 unit tests, all pass): - TestSdkCreatedTablesRegression: SDK-create_tables() output validates clean by default (catches the validator-rejects- SDK-tables trap); strict=True surfaces the same input as KEY_COLUMN_NULLABLE failures. - One positive case per failure code: TestMissingTable, TestMissingColumn, TestTypeMismatch (+ legacy alias acceptance), TestEndpointTypeMismatch, TestUnexpectedRepeatedMode, TestMissingDataset, TestInsufficientPermissions, TestKeyColumnNullable (default warning + strict failure + required-keys clean). - TestCrossProjectSource: a binding whose entity.source is fully qualified to a different project from binding.target.project validates against the entity's project, not the target's. - TestReportShape: failure carries binding_element / binding_path / bq_ref / detail; ok is failures-empty. PR 2b (CLI surface: bq-agent-sdk binding-validate + ontology-build --validate-binding[-strict] flags + live integration test) follows in a separate PR per the working plan.

…ote (GoogleCloudPlatform#105 PR 2a) Addresses four review findings on PR GoogleCloudPlatform#109. (1) binding_path correctness — index now reflects binding YAML order Previously the validator used enumerate(entity.properties) / enumerate(rel.properties) on ResolvedEntity / ResolvedRelationship, whose ordering follows the ontology's effective-property order, not the binding YAML's order. With inheritance or a different binding ordering, a failure path like binding.entities[0].properties[1] .column could point to the wrong YAML entry. Fix: build {logical_property_name: yaml_index} maps from binding.entities[i].properties / binding.relationships[i].properties, then derive paths via prop.logical_name lookup. Falls back to a name-keyed path on the rare case where a resolved property has no matching binding YAML entry. New regression test TestBindingPathYamlOrder :: test_path_index_uses_binding_yaml_order_not_resolved_order asserts the path correctly resolves to the binding's own index when the binding lists properties in reverse ontology order. (2) Physical cross-table endpoint check ENDPOINT_TYPE_MISMATCH previously compared edge endpoint column type against the ontology-derived expected SDK type only. That misses the case where a node table has drifted away from its ontology declaration but the edge has not — the per-entity loop catches the node's TYPE_MISMATCH but the edge's storage-level disagreement with the node was invisible. Fix: when the referenced node table is in the table cache, compare the edge endpoint's actual BQ field_type against the node table's actual key column field_type and emit a second ENDPOINT_TYPE_MISMATCH whose detail explicitly says 'physical cross-table mismatch'. Compares canonical aliases so INTEGER/INT64 are not flagged as mismatched. The two checks are complementary: spec-level fires when only the edge is wrong, physical fires when the node has drifted but the edge has not. New test TestEndpointPhysicalCrossTableCheck :: test_edge_endpoint_disagrees_with_node_actual_field_type asserts both layers fire when both diverge from the spec, and the existing TestEndpointTypeMismatch test now expects two complementary mismatches (one spec-level + one physical) and asserts the detail strings match each layer's wording. (3) Docstring no longer references docs/ontology/binding-validation.md That file lands in PR 2b. Module docstring now says the user-facing CLI surface and full failure-code documentation land in PR 2b of issue GoogleCloudPlatform#105. (4) Stronger materializer-DDL regression New test TestSdkCreatedTablesRegression :: test_expected_types_match_materializer_ddl_type_map builds the expected schema directly from ontology_materializer._DDL_TYPE_MAP rather than a hand-written fixture. If a future change updates _DDL_TYPE_MAP (e.g. adds NUMERIC support), the test forces a corresponding update to _COMPATIBLE_BQ_TYPES in binding_validation, preventing silent regressions. 100/100 tests pass across test_binding_validation.py, test_ontology_materializer.py, test_resolved_spec.py.

caohy1988 · 2026-05-03T06:49:26Z

Review fixes folded in (commit `fe31c38`)

(1) binding_path correctness — paths now reflect binding YAML order. Previously enumerate(entity.properties) / enumerate(rel.properties) walked the ResolvedEntity / ResolvedRelationship order, which follows ontology effective-property order — not the binding YAML's order. With inheritance or a different binding ordering, a path like binding.entities[0].properties[1].column could land on the wrong YAML entry. Fix: per-element {logical_property_name: yaml_index} maps built from binding.entities[i].properties / binding.relationships[i].properties, with paths derived via prop.logical_name lookup. New test TestBindingPathYamlOrder::test_path_index_uses_binding_yaml_order_not_resolved_order asserts the path resolves to the binding's own index when the YAML lists properties in reverse ontology order.

(2) Physical cross-table endpoint check added. ENDPOINT_TYPE_MISMATCH previously did only a spec-level check (edge BQ type vs ontology-derived expected SDK type). That misses cases where the node table has drifted from its ontology declaration but the edge has not — the per-entity loop catches the node's TYPE_MISMATCH but the edge ↔ node storage-level disagreement was invisible. Fix: when the referenced node table is in the cache, also compare the edge endpoint's actual BQ field_type against the node table's actual key column field_type and emit a second ENDPOINT_TYPE_MISMATCH whose detail explicitly says "physical cross-table mismatch". Canonical-alias-aware so INTEGER/INT64 aren't flagged. The two checks are complementary: spec-level fires when only the edge is wrong; physical fires when the node has drifted but the edge has not. New test TestEndpointPhysicalCrossTableCheck::test_edge_endpoint_disagrees_with_node_actual_field_type covers the node-drifted-edge-correct case.

(3) Docstring no longer references deferred docs. docs/ontology/binding-validation.md lands in PR 2b. Module docstring now says the user-facing CLI surface and full failure-code documentation land in PR 2b.

(4) Stronger materializer-DDL regression. New test TestSdkCreatedTablesRegression::test_expected_types_match_materializer_ddl_type_map builds the expected schema directly from ontology_materializer._DDL_TYPE_MAP rather than a hand-written fixture. If a future change updates _DDL_TYPE_MAP (e.g. adds NUMERIC support), the test forces a corresponding update to _COMPATIBLE_BQ_TYPES in binding_validation, preventing silent regressions.

Verified: 100/100 tests pass across test_binding_validation.py, test_ontology_materializer.py, test_resolved_spec.py. Autoformat clean.

…dup (GoogleCloudPlatform#105 PR 2a) Five review findings folded in. (1) Validate SDK metadata columns (session_id, extracted_at) The materializer's _entity_columns() / _relationship_columns() hard-code session_id STRING + extracted_at TIMESTAMP for every entity and relationship table (ontology_materializer.py:159, 164), and routing writes those fields unconditionally on every materialize() call (ontology_materializer.py:258, 333). Without this check, a user-predefined table missing either column would validate clean, then fail at load_table_from_json / INSERT time. The validator now checks each entity and relationship table for both metadata columns: presence (MISSING_COLUMN), non-REPEATED mode (UNEXPECTED_REPEATED_MODE), and type compatibility (TYPE_MISMATCH for STRING / TIMESTAMP). Failure binding_path uses the form binding.entities[i].<metadata>.session_id so users can distinguish metadata-column failures from property failures. New tests: - TestMetadataColumns::test_missing_session_id_on_entity_flagged - TestMetadataColumns::test_missing_extracted_at_on_relationship_flagged - TestMetadataColumns::test_metadata_column_with_wrong_type_flagged (2) Key column failure paths now real YAML paths For KEY_COLUMN_NULLABLE and UNEXPECTED_REPEATED_MODE failures on entity primary keys, the path now reuses the binding YAML index of the matching bound property — paths like binding.entities[0].properties[0].column instead of pseudo paths like binding.entities[0].<key>.decision_id. Falls back to the pseudo path only when the key is not also a bound property (defensive; ontology generally requires keys to be properties). (3) Suppress redundant physical cross-table endpoint mismatch ENDPOINT_TYPE_MISMATCH previously double-reported in the common edge-only-drift case (edge wrong, node correct): both the spec-level (1) and physical (2) checks fired with the same expected/observed pair, just different detail wording. The physical check now fires only when the node table has actually drifted from the ontology spec AND the edge disagrees with the node's actual storage. In the edge-only case, (1) already conveys the disagreement; emitting (2) was pure noise. In the node-drifted case, (2) adds genuinely new information (the node's actual type is what the edge needs to match for joins, not the ontology declaration). TestEndpointTypeMismatch::test_edge_endpoint_type_does_not_match_referenced_entity_key updated to expect 1 mismatch (spec-level only). The dedicated TestEndpointPhysicalCrossTableCheck::test_edge_endpoint_disagrees_with_node_actual_field_type case still asserts both layers fire when both diverge. (4) Exhaustive _DDL_TYPE_MAP / _COMPATIBLE_BQ_TYPES coverage New TestTypeMapExhaustiveCoverage::test_every_ddl_type_is_in_compatible_bq_types loops every value in ontology_materializer._DDL_TYPE_MAP and asserts (a) it appears as a key in _COMPATIBLE_BQ_TYPES, and (b) the canonical type accepts itself as compatible. If the materializer adds NUMERIC support without a corresponding binding-validation update, this test catches the silent miss. (5) Composite-key tests GoogleCloudPlatform#105 calls out composite endpoint keys explicitly. Two new tests under TestCompositeKey cover the two-column primary-key topology: - test_composite_primary_key_validates: clean baseline with matching positional types on edge and node. - test_composite_key_second_column_type_mismatch: the second composite-key column has the wrong type on the edge; the validator must flag ENDPOINT_TYPE_MISMATCH at from_columns[1]. Test infrastructure: added _meta_fields() helper that returns fresh metadata-field instances so test mutations on one table do not silently affect others. _good_schemas() uses it per-table. 106/106 tests pass across test_binding_validation.py, test_ontology_materializer.py, test_resolved_spec.py.

caohy1988 · 2026-05-03T07:05:46Z

Review fixes folded in (commit `2342165`)

(1, P1) SDK metadata columns now validated. The materializer hard-codes session_id STRING + extracted_at TIMESTAMP for every entity and relationship table (ontology_materializer.py:159, 164) and writes those columns on every materialize() call (ontology_materializer.py:258, 333). Without this check, a user-predefined table missing either column would validate clean and fail at INSERT time. The validator now checks both metadata columns on every table — presence, non-REPEATED mode, and type compatibility — using paths like binding.entities[i].<metadata>.session_id so failures are distinguishable from property failures. Three new tests under TestMetadataColumns cover missing session_id, missing extracted_at, and wrong-type metadata.

(2, P2) Key failure paths are real YAML paths. KEY_COLUMN_NULLABLE and UNEXPECTED_REPEATED_MODE failures on primary keys now reuse the matching bound property's binding YAML index — binding.entities[0].properties[0].column instead of binding.entities[0].<key>.decision_id. Falls back to the pseudo path only when the key isn't a bound property (defensive).

(3, P2) Endpoint physical check no longer double-reports. Previously ENDPOINT_TYPE_MISMATCH fired twice in the common edge-only-drift case (edge wrong, node correct) with the same expected/observed pair. The physical (2) check now only fires when the node table has actually drifted from the ontology spec AND the edge disagrees with the node's actual storage. In the edge-only case, only the spec-level (1) check fires; in the node-drifted case, (2) adds genuinely new information (the node's actual type is what edges must match for joins). TestEndpointTypeMismatch updated to expect 1 mismatch; TestEndpointPhysicalCrossTableCheck still asserts both fire when both diverge.

(4, P3) Exhaustive _DDL_TYPE_MAP ↔ _COMPATIBLE_BQ_TYPES coverage. New test TestTypeMapExhaustiveCoverage::test_every_ddl_type_is_in_compatible_bq_types loops every value in ontology_materializer._DDL_TYPE_MAP and asserts (a) it's a key in _COMPATIBLE_BQ_TYPES, and (b) the canonical type accepts itself. If the materializer adds NUMERIC support without a corresponding validator update, this test catches the silent miss.

(5, P3) Composite-key tests added. TestCompositeKey::test_composite_primary_key_validates covers a clean two-column primary-key topology with matching positional types. TestCompositeKey::test_composite_key_second_column_type_mismatch confirms the validator flags ENDPOINT_TYPE_MISMATCH at from_columns[1] when the second composite-key column has the wrong type on the edge.

Test-infrastructure fix: _good_schemas() was sharing the same metadata-field instances across tables, so a test mutating session_id.field_type on one table would silently mutate the others. Now each table gets fresh instances via the _meta_fields() helper.

Verified: 106/106 tests pass across test_binding_validation.py, test_ontology_materializer.py, test_resolved_spec.py. Autoformat clean.

…udPlatform#105 PR 2a) Two comment-only cleanups; no behavior change. (1) tests/test_binding_validation.py:431 — TestEndpointTypeMismatch:: test_edge_endpoint_type_does_not_match_referenced_entity_key docstring previously said two ENDPOINT_TYPE_MISMATCH entries are expected. The implementation suppresses the physical (2) check in edge-only-drift cases as of commit 2342165, so only one entry is expected now. Docstring updated to match, with a pointer to TestEndpointPhysicalCrossTableCheck for the node-drifted case where (2) genuinely adds information. (2) src/bigquery_agent_analytics/binding_validation.py:23 — module docstring referenced bigquery_agent_analytics.graph_validation .validate_extracted_graph as if it existed. That symbol lands with issue GoogleCloudPlatform#76 and is not on this PR. Reworded to plain text: 'Different from the planned extracted-graph validator in issue GoogleCloudPlatform#76', with an explicit warning that no graph_validation module exists yet so callers do not try to import it. 25/25 unit tests pass.

…Query (GoogleCloudPlatform#105 PR 2a) Adds TestBindingValidationLive::test_validator_end_to_end_against_real_bigquery to tests/test_integration_ontology_binding.py, gated on RUN_LIVE_BIGQUERY_TESTS=1 alongside the other live tests in this module. Self-contained: uses its own per-test scratch dataset (function- scope fixtures, not the module-scoped ones the rest of the file shares) because phase 4 of the test deliberately drops a column via ALTER TABLE. Running destructive SQL against the shared dataset would interfere with other tests in this file. Phases: 1. Materialize real tables via OntologyMaterializer (executes real CREATE TABLE IF NOT EXISTS for entity + relationship tables, including SDK metadata columns). 2. Default-mode validation: report.ok must be True; warnings contain only KEY_COLUMN_NULLABLE entries (because CREATE TABLE IF NOT EXISTS emits NULLABLE keys without NOT NULL constraints). 3. Strict-mode validation: same input must surface those four warnings as KEY_COLUMN_NULLABLE failures, with warnings empty (escalated, not duplicated). 4. Drop the 'confidence' column via real ALTER TABLE; default- mode re-validation must emit exactly one MISSING_COLUMN failure pointing at binding.entities[0].properties[1].column (binding YAML order: decision_id at [0], confidence at [1]). Verified live against test-project-0728-467323 (raincoatrun@): PASSED in 13.28s. Skipped automatically without RUN_LIVE_BIGQUERY_TESTS=1.

caohy1988 · 2026-05-03T16:53:36Z

Live integration test added (commit `6ececb1`)

Adapted /tmp/run_validator_live.py (the one-off runner from the live verification) into a proper pytest test in tests/test_integration_ontology_binding.py, alongside the file's other live tests. Gated on RUN_LIVE_BIGQUERY_TESTS=1.

TestBindingValidationLive::test_validator_end_to_end_against_real_bigquery:

Self-contained: uses its own per-test (function-scope) scratch dataset rather than the module-scoped fixture, because phase 4 deliberately drops a column via real ALTER TABLE. Running destructive SQL against the shared dataset would interfere with the other tests in this file.
Four phases:
1. OntologyMaterializer.from_ontology_binding(...).create_tables() runs real CREATE TABLE IF NOT EXISTS for entity + relationship tables (including SDK metadata columns).
2. Default-mode validation → report.ok=True; the only signal is 4 advisory KEY_COLUMN_NULLABLE warnings (entity primary keys + relationship endpoint keys, all NULLABLE because CREATE TABLE IF NOT EXISTS emits no NOT NULL).
3. Strict-mode validation → same input produces 4 KEY_COLUMN_NULLABLE failures, report.ok=False, warnings=() (escalated, not duplicated).
4. ALTER TABLE ... DROP COLUMN confidence → default-mode re-validation produces exactly 1 MISSING_COLUMN failure at binding.entities[0].properties[1].column. Path index reflects binding YAML order (decision_id at [0], confidence at [1]).

Live run results (against test-project-0728-467323, raincoatrun@gmail.com): PASSED in 13.28s. Without RUN_LIVE_BIGQUERY_TESTS=1: skipped instantly.

This was originally scoped for PR 2b per the working plan, but since you wanted to verify against real GCP now, folding it into PR 2a's coverage was the cleanest move. PR 2b's remaining scope shrinks to: bq-agent-sdk binding-validate [--strict] standalone CLI command, ontology-build --validate-binding[-strict] opt-in flags, and docs/ontology/binding-validation.md.

…oogleCloudPlatform#105 PR 2a) Two small cleanups; no behavior change. (1) src/bigquery_agent_analytics/binding_validation.py: module docstring previously read like a PR-review note (told callers not to import a graph_validation module that does not exist yet). Replaced with a public-doc-shaped sentence that just describes the relationship to GoogleCloudPlatform#76's planned extracted-graph validator. (2) tests/test_integration_ontology_binding.py: the live test TestBindingValidationLive depends on two function-scope fixtures (isolated_scratch and isolated_ontology_and_binding), and the second's binding YAML embeds the first's dataset id. If either fixture is later flipped to module scope by accident, the binding would point at a stale dataset. New assertion binding.target.dataset == ds_id catches that drift before any BQ call runs. Live test still PASSES against test-project-0728-467323 (13.24s).

caohy1988 added 2 commits May 2, 2026 23:38

caohy1988 added 2 commits May 3, 2026 00:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ontology): add validate_binding_against_bigquery (#105 PR 2a)#109

feat(ontology): add validate_binding_against_bigquery (#105 PR 2a)#109
caohy1988 wants to merge 6 commits intoGoogleCloudPlatform:mainfrom
caohy1988:feat/binding-validator-core

caohy1988 commented May 3, 2026 •

edited

Loading

Uh oh!

caohy1988 commented May 3, 2026

Uh oh!

caohy1988 commented May 3, 2026

Uh oh!

caohy1988 commented May 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

caohy1988 commented May 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this is

Surface

Internal flow

Tests

What's NOT in this PR (PR 2b)

Uh oh!

caohy1988 commented May 3, 2026

Review fixes folded in (commit fe31c38)

Uh oh!

caohy1988 commented May 3, 2026

Review fixes folded in (commit 2342165)

Uh oh!

caohy1988 commented May 3, 2026

Live integration test added (commit 6ececb1)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

caohy1988 commented May 3, 2026 •

edited

Loading

Review fixes folded in (commit `fe31c38`)

Review fixes folded in (commit `2342165`)

Live integration test added (commit `6ececb1`)