Skip to content

Commit 5e53b61

Browse files
committed
test+docs: harden DDL-detection filter, soften DDL claims (#104)
Addresses three review findings on PR #108: (1) Live test DDL-detection blind spot The previous filter required the regressed CREATE OR REPLACE PROPERTY GRAPH to target _PROJECT.<scratch_dataset>.<spec.name>. But if skip_property_graph regressed, the compiler would actually target _PROJECT._DATASET.<spec.name> (the orchestrator's dataset_id argument is _DATASET in this test, used for extraction of agent_events). The blind spot: a regression could fire DDL that the test would not catch. Fixed by replacing the fully-qualified-graph-ref filter with two narrower constraints that catch the regression in either dataset: - graph name (spec.name) — present in the DDL string regardless of which dataset the compiler targets - sdk_feature='ontology-gql' label — only SDK-issued property-graph jobs carry this label per ontology_property_graph.py:465; the test's setup CREATE PROPERTY GRAPH (issued via direct SQL) does not, so it does not trip the assertion (2) docs/ontology/ontology-build.md: document graph_ref limitation Added a "Known limitation" section noting that result["graph_ref"] reports the extraction dataset, not the binding's target dataset, in split source/target setups. The materialized base tables themselves still go to the binding's target dataset per the resolved spec; only the reported string is affected. (3) docs/ontology/ontology-build.md: soften DDL-options wording "additional indexes, dialect-specific options" was overreaching for BigQuery property graphs; tightened to "custom labels or other DDL details the SDK's compiler doesn't generate." 136/136 tests pass.
1 parent 5480465 commit 5e53b61

2 files changed

Lines changed: 35 additions & 11 deletions

File tree

docs/ontology/ontology-build.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -56,7 +56,7 @@ Without `--skip-property-graph`, the existing exit-1 behavior on graph-create fa
5656
## When to use this
5757

5858
- **You already manage `CREATE PROPERTY GRAPH` in Terraform / dbt / a SQL file.** The SDK's `CREATE OR REPLACE PROPERTY GRAPH` would clobber your DDL on every run.
59-
- **Your property graph definition uses options the SDK doesn't generate.** You hand-authored the graph DDL to express features (custom labels, additional indexes, dialect-specific options) the SDK's compiler doesn't emit.
59+
- **Your property graph definition uses DDL details the SDK compiler doesn't emit.** You hand-authored the graph DDL to express custom labels or other DDL details the SDK's compiler doesn't generate.
6060
- **You want to populate your tables on a different cadence than you redefine the graph.** The graph definition rarely changes; the data is refreshed continuously.
6161

6262
For all other cases, leave the flag off and let the SDK manage the property graph end-to-end.
@@ -82,3 +82,7 @@ assert result["property_graph_created"] is False
8282
```
8383

8484
`skipped_reason` is only present when the phase was skipped; it is omitted when phase 5 ran (whether or not it succeeded).
85+
86+
## Known limitation: `result["graph_ref"]` in split source/target setups
87+
88+
`build_ontology_graph(...)` accepts a single `dataset_id` and uses it both for extraction (where `agent_events` lives) and for the `graph_ref` reported in the result dict (`{project_id}.{dataset_id}.{name}`). When `--skip-property-graph` is set and the caller's actual property graph lives in `binding.target.dataset` (different from the `dataset_id` used for extraction), `result["graph_ref"]` reports the **extraction dataset**, not the user-owned graph's dataset. The materialized base tables themselves still go to `binding.target.dataset` per the resolved spec — this only affects the reported `graph_ref` string. Tracked as a follow-up; not blocking for `--skip-property-graph` itself since the user already knows where their authored graph lives.

tests/test_integration_ontology_binding.py

Lines changed: 30 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -431,19 +431,39 @@ def test_skip_property_graph_issues_no_create_graph_job(
431431
)
432432

433433
# Step 4: assert no CREATE OR REPLACE PROPERTY GRAPH job ran for
434-
# *this scratch dataset's graph* in the post-timestamp window.
435-
# Filter by both the DDL keyword and the fully-qualified graph
436-
# reference so the test does not false-fail on an unrelated
437-
# CREATE OR REPLACE PROPERTY GRAPH issued by another developer
438-
# or test running concurrently in the same project.
439-
expected_graph_ref = f"{_PROJECT}.{scratch_dataset}.{spec.name}"
434+
# *this test's graph* in the post-timestamp window.
435+
#
436+
# Filter design:
437+
# 1. timestamp > the post-DDL baseline (closes the trap from
438+
# #107 cell 1.3 where the user's own setup CREATE PROPERTY
439+
# GRAPH would otherwise be caught).
440+
# 2. DDL keyword.
441+
# 3. graph name (spec.name) — the graph name is in the DDL
442+
# string regardless of which dataset the compiler would
443+
# target. If skip_property_graph regresses, the compiler
444+
# runs with dataset_id=_DATASET (the orchestrator's
445+
# argument), so the regressed DDL would target
446+
# _PROJECT._DATASET.<spec.name>, NOT
447+
# _PROJECT.<scratch_dataset>.<spec.name>. Filtering on the
448+
# graph name (rather than the fully-qualified ref) catches
449+
# the regression in either dataset.
450+
# 4. sdk_feature='ontology-gql' label — only SDK-issued
451+
# property-graph jobs carry this label
452+
# (ontology_property_graph.py:465), so unrelated user-
453+
# authored CREATE PROPERTY GRAPH DDLs (including the test's
454+
# own setup job in step 1, which was not labeled this way)
455+
# do not trip the assertion.
440456
region_qual = f"`region-{_LOCATION.lower()}`"
441457
jobs_query = f"""
442458
SELECT job_id, query, creation_time
443-
FROM {region_qual}.INFORMATION_SCHEMA.JOBS_BY_PROJECT
459+
FROM {region_qual}.INFORMATION_SCHEMA.JOBS_BY_PROJECT AS j
444460
WHERE creation_time > @before
445461
AND UPPER(query) LIKE '%CREATE OR REPLACE PROPERTY GRAPH%'
446-
AND query LIKE @graph_ref_pattern
462+
AND query LIKE @graph_name_pattern
463+
AND EXISTS (
464+
SELECT 1 FROM UNNEST(j.labels) AS l
465+
WHERE l.key = 'sdk_feature' AND l.value = 'ontology-gql'
466+
)
447467
"""
448468
job = client.query(
449469
jobs_query,
@@ -453,9 +473,9 @@ def test_skip_property_graph_issues_no_create_graph_job(
453473
"before", "TIMESTAMP", before_skip_build_ts
454474
),
455475
bigquery.ScalarQueryParameter(
456-
"graph_ref_pattern",
476+
"graph_name_pattern",
457477
"STRING",
458-
f"%{expected_graph_ref}%",
478+
f"%{spec.name}%",
459479
),
460480
]
461481
),

0 commit comments

Comments
 (0)