Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ architecture, rationale, and implementation plans behind key SDK features.
| [ontology/compilation.md](ontology/compilation.md) | Compilation — resolving ontology + binding into backend DDL |
| [ontology/cli.md](ontology/cli.md) | CLI design for the `gm` tool (validate, compile, import-owl) |
| [ontology/owl-import.md](ontology/owl-import.md) | OWL import — converting OWL ontologies to YAML format |
| [ontology/ontology-build.md](ontology/ontology-build.md) | `bq-agent-sdk ontology-build` orchestrator + `--skip-property-graph` reference |

## Deployment Surfaces

Expand Down
88 changes: 88 additions & 0 deletions docs/ontology/ontology-build.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
# `bq-agent-sdk ontology-build` — End-to-End Orchestrator

`bq-agent-sdk ontology-build` runs the SDK's full ontology pipeline end-to-end against a populated `agent_events` table:

1. Load the spec (`--ontology X.yaml --binding Y.yaml`).
2. Extract an `ExtractedGraph` from agent telemetry via `AI.GENERATE`.
3. Create physical entity/relationship tables (`CREATE TABLE IF NOT EXISTS`).
4. Materialize extracted nodes/edges into those tables.
5. Run `CREATE OR REPLACE PROPERTY GRAPH` to wire the BigQuery property graph object.

The Python entry point is `bigquery_agent_analytics.ontology_orchestrator.build_ontology_graph(...)`. The CLI is a thin wrapper.

## Skipping property-graph DDL

Use `--skip-property-graph` when **the caller owns their own `CREATE PROPERTY GRAPH` DDL** — e.g., the property graph is provisioned via Terraform, dbt, or hand-authored SQL — and only wants the SDK to populate base tables.

```
bq-agent-sdk ontology-build \
--project-id my-project \
--dataset-id my-dataset \
--ontology my.ontology.yaml \
--binding my-bq-prod.binding.yaml \
--session-ids sess-1,sess-2 \
--skip-property-graph
```

Behavior with the flag set:

- Phase 5 short-circuits. No `OntologyPropertyGraphCompiler` is constructed, no `CREATE OR REPLACE PROPERTY GRAPH` job runs. The user's existing graph object is unchanged.
- Phases 1–4 run normally. Tables are created (`CREATE TABLE IF NOT EXISTS` is a no-op against pre-existing tables) and rows are materialized.
- The CLI exits 0.
- The output dict reports:

```json
{
"property_graph_created": false,
"property_graph_status": "skipped:user_requested",
...
}
```

JSON consumers should read `property_graph_status` (not just `property_graph_created`) to distinguish a deliberate skip from a creation failure.

## Status field reference

The CLI's `property_graph_status` field has three values:

| `property_graph_status` | `property_graph_created` | Exit code | Meaning |
|---|---|---|---|
| `"created"` | `true` | 0 | Phase 5 ran and BigQuery confirmed the graph object. |
| `"failed"` | `false` | 1 | Phase 5 ran but the graph object was not created. The CLI prints "Property Graph creation failed" to stderr. Tables and rows were still materialized. |
| `"skipped:user_requested"` | `false` | 0 | `--skip-property-graph` was set. Phase 5 did not run. No error message. |

Without `--skip-property-graph`, the existing exit-1 behavior on graph-create failure is preserved exactly.

## When to use this

- **You already manage `CREATE PROPERTY GRAPH` in Terraform / dbt / a SQL file.** The SDK's `CREATE OR REPLACE PROPERTY GRAPH` would clobber your DDL on every run.
- **Your property graph definition uses DDL details the SDK compiler doesn't emit.** You hand-authored the graph DDL to express custom labels or other DDL details the SDK's compiler doesn't generate.
- **You want to populate your tables on a different cadence than you redefine the graph.** The graph definition rarely changes; the data is refreshed continuously.

For all other cases, leave the flag off and let the SDK manage the property graph end-to-end.

## Python API

The flag is also available on `build_ontology_graph(...)`:

```python
from bigquery_agent_analytics.ontology_orchestrator import build_ontology_graph

result = build_ontology_graph(
spec=resolved_spec,
session_ids=["sess-1"],
project_id="my-project",
dataset_id="my-dataset",
skip_property_graph=True, # phase 5 skipped
)

assert result["property_graph_status"] == "skipped:user_requested"
assert result["skipped_reason"] == "user_requested"
assert result["property_graph_created"] is False
```

`skipped_reason` is only present when the phase was skipped; it is omitted when phase 5 ran (whether or not it succeeded).

## Known limitation: `result["graph_ref"]` in split source/target setups

`build_ontology_graph(...)` accepts a single `dataset_id` and uses it both for extraction (where `agent_events` lives) and for the `graph_ref` reported in the result dict (`{project_id}.{dataset_id}.{name}`). When `--skip-property-graph` is set and the caller's actual property graph lives in `binding.target.dataset` (different from the `dataset_id` used for extraction), `result["graph_ref"]` reports the **extraction dataset**, not the user-owned graph's dataset. The materialized base tables themselves still go to `binding.target.dataset` per the resolved spec — this only affects the reported `graph_ref` string. Tracked as a follow-up; not blocking for `--skip-property-graph` itself since the user already knows where their authored graph lives.
21 changes: 21 additions & 0 deletions src/bigquery_agent_analytics/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -1238,6 +1238,16 @@ def ontology_build(
no_ai_generate: bool = typer.Option(
False, help="Skip AI.GENERATE; fetch raw payloads instead."
),
skip_property_graph: bool = typer.Option(
False,
"--skip-property-graph",
help=(
"Skip CREATE OR REPLACE PROPERTY GRAPH. Use when the caller "
"owns their own property-graph DDL and only wants the SDK to "
"populate base tables. CLI exits 0 with "
"property_graph_status='skipped:user_requested'."
),
),
fmt: str = typer.Option(
"json",
"--format",
Expand All @@ -1261,6 +1271,7 @@ def ontology_build(
table_id=table_id,
endpoint=endpoint,
use_ai_generate=not no_ai_generate,
skip_property_graph=skip_property_graph,
)

output = {
Expand All @@ -1271,9 +1282,19 @@ def ontology_build(
"tables_created": result["tables_created"],
"rows_materialized": result["rows_materialized"],
"property_graph_created": result["property_graph_created"],
"property_graph_status": result.get(
"property_graph_status",
"created" if result["property_graph_created"] else "failed",
),
}
typer.echo(format_output(output, fmt))

# Distinguish "user-requested skip" (exit 0) from "creation failed"
# (exit 1). Same property_graph_created=False, different operator
# intent — JSON consumers read property_graph_status to tell them
# apart without parsing stderr.
if result.get("skipped_reason") == "user_requested":
return
if not result["property_graph_created"]:
typer.echo(
"Error: Property Graph creation failed. "
Expand Down
52 changes: 39 additions & 13 deletions src/bigquery_agent_analytics/ontology_orchestrator.py
Original file line number Diff line number Diff line change
Expand Up @@ -300,14 +300,16 @@ def build_ontology_graph(
endpoint: str = "gemini-2.5-flash",
use_ai_generate: bool = True,
location: Optional[str] = None,
skip_property_graph: bool = False,
) -> dict[str, Any]:
"""Run the full ontology graph pipeline end-to-end.

1. Load the YAML spec (or use pre-loaded ``spec``).
2. Extract an ``ExtractedGraph`` from agent telemetry.
3. Create physical tables (if not exists).
4. Materialize extracted nodes/edges into tables.
5. Create the BigQuery Property Graph.
5. Create the BigQuery Property Graph (skipped when
``skip_property_graph=True``).

Args:
session_ids: Sessions to extract from.
Expand All @@ -323,10 +325,22 @@ def build_ontology_graph(
endpoint: AI.GENERATE model endpoint.
use_ai_generate: If True, uses server-side AI extraction.
location: BigQuery location.
skip_property_graph: When True, skip phase 5 (do not run
``CREATE OR REPLACE PROPERTY GRAPH``). Use this when the
caller owns their own property-graph DDL and only wants
the SDK to populate base tables. The result dict reports
``property_graph_created=False`` with
``skipped_reason="user_requested"`` and
``property_graph_status="skipped:user_requested"``, which
callers (and the CLI) use to distinguish a deliberate
skip from a creation failure.

Returns:
A dict with keys: ``spec``, ``graph``, ``tables_created``,
``rows_materialized``, ``property_graph_created``,
``property_graph_status`` (one of ``"created"``, ``"failed"``,
``"skipped:user_requested"``), ``skipped_reason`` (only set
when phase 5 was skipped, e.g. ``"user_requested"``),
``graph_name``, ``graph_ref``.
"""
from .ontology_graph import OntologyGraphManager
Expand Down Expand Up @@ -391,24 +405,36 @@ def build_ontology_graph(
rows_materialized = materializer.materialize(graph, session_ids)
logger.info("Rows materialized: %s", rows_materialized)

# 5. Create property graph.
compiler = OntologyPropertyGraphCompiler(
project_id=project_id,
dataset_id=dataset_id,
spec=spec,
location=location,
)
pg_created = compiler.create_property_graph(graph_name=name)

graph_ref = f"{project_id}.{dataset_id}.{name}"
logger.info("Property Graph %r created=%s.", graph_ref, pg_created)

return {
# 5. Create property graph (or skip when caller owns the DDL).
result: dict[str, Any] = {
"spec": spec,
"graph": graph,
"tables_created": tables_created,
"rows_materialized": rows_materialized,
"property_graph_created": pg_created,
"graph_name": name,
"graph_ref": graph_ref,
}
if skip_property_graph:
logger.info(
"Property Graph creation skipped (skip_property_graph=True); "
"caller owns the DDL for graph %r.",
graph_ref,
)
result["property_graph_created"] = False
result["skipped_reason"] = "user_requested"
result["property_graph_status"] = "skipped:user_requested"
else:
compiler = OntologyPropertyGraphCompiler(
project_id=project_id,
dataset_id=dataset_id,
spec=spec,
location=location,
)
pg_created = compiler.create_property_graph(graph_name=name)
logger.info("Property Graph %r created=%s.", graph_ref, pg_created)
result["property_graph_created"] = pg_created
result["property_graph_status"] = "created" if pg_created else "failed"

return result
148 changes: 148 additions & 0 deletions tests/test_cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -2472,3 +2472,151 @@ def test_bad_spec_path_exit_2(self):
],
)
assert result.exit_code == 2

@patch("bigquery_agent_analytics.ontology_orchestrator.build_ontology_graph")
def test_skip_property_graph_exits_zero_with_status(self, mock_build):
"""--skip-property-graph: exit 0, status='skipped:user_requested'."""
from bigquery_agent_analytics.ontology_models import ExtractedGraph

mock_build.return_value = {
"graph_name": "g",
"graph_ref": "proj.ds.g",
"graph": ExtractedGraph(name="test"),
"tables_created": {"mako_DecisionPoint": "p.d.decision_points"},
"rows_materialized": {"mako_DecisionPoint": 2},
"property_graph_created": False,
"skipped_reason": "user_requested",
"property_graph_status": "skipped:user_requested",
"spec": MagicMock(),
}

result = runner.invoke(
app,
[
"ontology-build",
"--project-id=proj",
"--dataset-id=ds",
f"--spec-path={self._SPEC_PATH}",
"--session-ids=sess1",
"--env=p.d",
"--skip-property-graph",
],
)
assert result.exit_code == 0
# Skip path must NOT print the "Property Graph creation failed" stderr.
assert "Property Graph creation failed" not in result.output
parsed = json.loads(result.output)
assert parsed["property_graph_created"] is False
assert parsed["property_graph_status"] == "skipped:user_requested"

# Flag is threaded through to the orchestrator.
_, kwargs = mock_build.call_args
assert kwargs["skip_property_graph"] is True

@patch("bigquery_agent_analytics.ontology_orchestrator.build_ontology_graph")
def test_default_invocation_omits_skip_flag(self, mock_build):
"""Default invocation passes skip_property_graph=False."""
from bigquery_agent_analytics.ontology_models import ExtractedGraph

mock_build.return_value = {
"graph_name": "g",
"graph_ref": "proj.ds.g",
"graph": ExtractedGraph(name="test"),
"tables_created": {},
"rows_materialized": {},
"property_graph_created": True,
"property_graph_status": "created",
"spec": MagicMock(),
}

result = runner.invoke(
app,
[
"ontology-build",
"--project-id=proj",
"--dataset-id=ds",
f"--spec-path={self._SPEC_PATH}",
"--session-ids=sess1",
"--env=p.d",
],
)
assert result.exit_code == 0
parsed = json.loads(result.output)
assert parsed["property_graph_status"] == "created"

_, kwargs = mock_build.call_args
assert kwargs["skip_property_graph"] is False

@patch("bigquery_agent_analytics.ontology_orchestrator.build_ontology_graph")
def test_skip_property_graph_status_visible_in_text_format(self, mock_build):
"""--format=text exposes property_graph_status to non-JSON consumers.

Pins the contract that property_graph_status is not JSON-only:
--format=table renders dict keys; --format=text falls back to a
readable representation. The status string must appear in either.
"""
from bigquery_agent_analytics.ontology_models import ExtractedGraph

mock_build.return_value = {
"graph_name": "g",
"graph_ref": "proj.ds.g",
"graph": ExtractedGraph(name="test"),
"tables_created": {},
"rows_materialized": {},
"property_graph_created": False,
"skipped_reason": "user_requested",
"property_graph_status": "skipped:user_requested",
"spec": MagicMock(),
}

result = runner.invoke(
app,
[
"ontology-build",
"--project-id=proj",
"--dataset-id=ds",
f"--spec-path={self._SPEC_PATH}",
"--session-ids=sess1",
"--env=p.d",
"--skip-property-graph",
"--format=text",
],
)
assert result.exit_code == 0
# The status string must appear in the text-format output so non-
# JSON consumers can see why the graph was not created.
assert "skipped:user_requested" in result.output

@patch("bigquery_agent_analytics.ontology_orchestrator.build_ontology_graph")
def test_property_graph_failure_status_failed(self, mock_build):
"""When the orchestrator reports failure, exit 1 with status='failed'.

Distinguishes the failure path from the user-requested-skip path by
asserting the status field, not just the exit code.
"""
from bigquery_agent_analytics.ontology_models import ExtractedGraph

mock_build.return_value = {
"graph_name": "g",
"graph_ref": "proj.ds.g",
"graph": ExtractedGraph(name="test"),
"tables_created": {},
"rows_materialized": {},
"property_graph_created": False,
"property_graph_status": "failed",
"spec": MagicMock(),
}

result = runner.invoke(
app,
[
"ontology-build",
"--project-id=proj",
"--dataset-id=ds",
f"--spec-path={self._SPEC_PATH}",
"--session-ids=sess1",
"--env=p.d",
],
)
assert result.exit_code == 1
assert "Property Graph creation failed" in result.output
Loading
Loading