Skip to content

Feat: bq-agent-sdk ontology-build --skip-property-graph to populate base tables only #104

@caohy1988

Description

@caohy1988

Feat: bq-agent-sdk ontology-build --skip-property-graph to populate base tables only

Goal

Add an opt-in flag on the ontology-build CLI that runs phases 1–4 (load spec → extract → create tables → materialize) and stops before phase 5 (CREATE OR REPLACE PROPERTY GRAPH). Lets users with a pre-defined BigQuery property graph populate its base tables from BQ AA traces without overwriting their graph DDL on every run.

Motivation

build_ontology_graph(...) (ontology_orchestrator.py:291) runs five phases in sequence:

  1. Load spec.
  2. Extract ExtractedGraph via AI.GENERATE on agent_events.
  3. OntologyMaterializer.create_tables()CREATE TABLE IF NOT EXISTS per entity/relationship table. Idempotent against pre-existing tables.
  4. OntologyMaterializer.materialize(...) — staging-table → DELETE-by-session → INSERT FROM staging. Non-destructive, schema-aware via bq_client.get_table(...).schema.
  5. OntologyPropertyGraphCompiler.compile_property_graph_ddl(...) then execute → CREATE OR REPLACE PROPERTY GRAPH (ontology_property_graph.py:307).

Phase 5 is destructive against a user-defined property graph. A user with their own CREATE PROPERTY GRAPH DDL — managed by Terraform, dbt, or hand-authored to express graph-object features the SDK doesn't generate yet — gets it overwritten on every ontology-build run.

The clean workaround today is dropping into Python and calling OntologyGraphManager.extract_graph(...) + OntologyMaterializer.materialize(...) directly, bypassing the orchestrator. That works but loses the CLI surface (--ontology, --binding, --session-ids, format output, error handling).

Proposed change

One new flag on the ontology-build command in cli.py:1200:

bq-agent-sdk ontology-build \
  --ontology my.ontology.yaml \
  --binding my-bq-prod.binding.yaml \
  --session-ids s1,s2,s3 \
  --project-id my-project \
  --dataset-id my-dataset \
  --skip-property-graph

When --skip-property-graph is set:

  • Phase 5 is not invoked.
  • The result dict reports property_graph_created: False with skipped_reason: "user_requested" (distinct from today's False which means attempted-and-failed).
  • The CLI's printed/JSON output also exposes a disambiguating field. Today's CLI output (the curated dict around cli.py:1266–1274) only includes property_graph_created. Add a property_graph_status field with one of "created", "failed", or "skipped:user_requested". Without this, JSON consumers see property_graph_created: false with exit 0 and no signal explaining why — locally consistent with the result dict but user-visibly ambiguous.
  • The CLI exit-1 branch at cli.py:1277–1284 (which today raises typer.Exit(code=1) whenever property_graph_created is False) must be updated to short-circuit when result.get("skipped_reason") == "user_requested". In that case, exit 0 with no error message. The "Property Graph creation failed" message stays for the attempted-and-failed branch.

The default stays False to preserve current behavior.

Implementation sketch

build_ontology_graph gains a skip_property_graph: bool = False parameter. The phase-5 block becomes:

if skip_property_graph:
    property_graph_created = False
    skipped_reason = "user_requested"
else:
    # existing CREATE OR REPLACE PROPERTY GRAPH path
    ...

The CLI threads the flag through. Output dict gains skipped_reason only when populated.

Acceptance criteria

  • bq-agent-sdk ontology-build --skip-property-graph ... populates entity/relationship tables and exits 0 without invoking CREATE OR REPLACE PROPERTY GRAPH.
  • The CLI exit-handling branch at cli.py:1277 is updated: when property_graph_created is False and skipped_reason == "user_requested", the CLI exits 0 with no error printed. When property_graph_created is False for any other reason, the existing exit-1-with-error-message behavior is preserved.
  • The CLI's curated output dict (around cli.py:1266–1274) gains a property_graph_status field with values "created", "failed", or "skipped:user_requested" — both text and json formats expose it. Test asserts JSON consumers can distinguish skipped from failed without reading stderr.
  • Without the flag, behavior is unchanged (regression test asserts exit 1 + the existing error message when phase 5 fails after successful materialization).
  • A unit test asserts OntologyPropertyGraphCompiler is not constructed when the flag is set.
  • An integration test (gated on RUN_LIVE_BIGQUERY_TESTS=1, matching the existing ontology integration test pattern at tests/test_integration_ontology_binding.py:44) creates a pre-existing property graph, runs ontology-build --skip-property-graph against pre-existing base tables, and verifies the user's graph definition is unchanged after the run.
  • docs/ontology/ (or wherever the orchestrator is documented) explains the flag and the use case.

Out of scope

  • Validating that the user's pre-existing property graph is consistent with the ontology+binding the SDK is materializing into. That is a separate pre-flight concern — see the companion issue for binding-vs-physical-schema validation.
  • Skipping phase 3 (create_tables). It's already a no-op against pre-existing tables via CREATE TABLE IF NOT EXISTS (ontology_materializer.py:207, 210, 221, 241). No flag needed.
  • Auto-deriving an ontology+binding from a user's existing property graph DDL.

Related

Effort

~0.5 eng-day. CLI threading + one orchestrator branch + tests.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions