Skip to content

Commit 75d4f77

Browse files
committed
docs: add ontology compilation reference doc
1 parent aef7967 commit 75d4f77

1 file changed

Lines changed: 251 additions & 0 deletions

File tree

src/ontology/docs/compilation.md

Lines changed: 251 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,251 @@
1+
# Compilation — Core Design (v0)
2+
3+
Status: draft
4+
Scope: how an ontology (`ontology.md`) plus a binding (`binding.md`) are
5+
resolved and emitted as backend DDL (`CREATE PROPERTY GRAPH` on BigQuery or
6+
Spanner).
7+
8+
**v0 compiles flat ontologies only.** Ontologies that use `extends` on
9+
entities or relationships are rejected at compile time. Inheritance
10+
lowering — substitutability, per-label property projections, cross-table
11+
identity, overlapping siblings — is the subject of a separate future
12+
design. Deployment, credentials, and measures are out of scope.
13+
14+
## 1. Goals
15+
16+
- **Single-shot compile.** Ontology plus binding in, DDL text out. No
17+
intermediate on-disk artifact.
18+
- **Deterministic output.** Same inputs → byte-identical DDL.
19+
- **Backend-neutral pipeline, backend-specific emitter.** Resolution is
20+
shared across backends; emission is per-backend.
21+
- **Output is just text.** What consumes it (deploy tool, `bq query`,
22+
Terraform, a human) is outside this spec.
23+
24+
## 2. Pipeline
25+
26+
```
27+
ontology.yaml ──┐
28+
├──► Resolver ──► ResolvedGraph ──► Emitter ──► DDL
29+
binding.yaml ──┘ (BQ|Spanner)
30+
```
31+
32+
Stages:
33+
34+
1. **Load.** Parse and validate ontology and binding independently against
35+
their specs.
36+
2. **Resolve.** Cross-check names, wire derived expressions to bound
37+
columns. Produce an in-memory `ResolvedGraph`.
38+
3. **Emit.** Walk the `ResolvedGraph` and produce backend-specific DDL.
39+
40+
## 2a. Type overview (resolved model)
41+
42+
```yaml
43+
# ResolvedGraph
44+
name: <string> # graph name, from ontology
45+
target: <Target> # from binding
46+
node_tables: [<NodeTable>, ...]
47+
edge_tables: [<EdgeTable>, ...]
48+
```
49+
50+
```yaml
51+
# NodeTable
52+
label: <string> # entity name
53+
key_columns: [<string>, ...]
54+
source: <string> # fully qualified
55+
properties: [<ResolvedProperty>, ...]
56+
```
57+
58+
```yaml
59+
# EdgeTable
60+
label: <string> # relationship name
61+
source: <string>
62+
from_key_columns: [<string>, ...]
63+
to_key_columns: [<string>, ...]
64+
from_node_table: <string> # which node table this edge's source points to
65+
to_node_table: <string>
66+
properties: [<ResolvedProperty>, ...]
67+
```
68+
69+
```yaml
70+
# ResolvedProperty
71+
name: <string> # logical property name
72+
type: <string> # GoogleSQL type
73+
sql: <string> # column name, or substituted expression for derived
74+
```
75+
76+
## 3. Resolution
77+
78+
### Substitute derived expressions
79+
80+
For each derived property, substitute each name referenced in `expr:` with
81+
the column name from the binding. References to other derived properties
82+
are resolved recursively; cycles are a compile-time error.
83+
84+
### Resolve endpoints
85+
86+
For each relationship, look up the single node table for each endpoint
87+
entity. Because v0 does not lower inheritance, each endpoint entity is
88+
bound to exactly one node table, so endpoint resolution is direct.
89+
90+
## 4. Emission
91+
92+
Both backends produce `CREATE PROPERTY GRAPH` statements. Node tables and
93+
edge tables are listed in deterministic alphabetical order. Property lists
94+
follow the ontology declaration order of the owning entity / relationship.
95+
96+
### BigQuery
97+
98+
#### Worked example
99+
100+
Ontology fragment:
101+
102+
```yaml
103+
entities:
104+
- name: Person
105+
keys: { primary: [person_id] }
106+
properties:
107+
- { name: person_id, type: string }
108+
- { name: name, type: string }
109+
- { name: first_name, type: string }
110+
- { name: last_name, type: string }
111+
- { name: full_name, type: string,
112+
expr: "first_name || ' ' || last_name" }
113+
- name: Account
114+
keys: { primary: [account_id] }
115+
properties:
116+
- { name: account_id, type: string }
117+
- { name: opened_at, type: timestamp }
118+
```
119+
120+
Binding fragment:
121+
122+
```yaml
123+
entities:
124+
- name: Person
125+
source: raw.persons
126+
properties:
127+
- { name: person_id, column: person_id }
128+
- { name: name, column: display_name }
129+
- { name: first_name, column: given_name }
130+
- { name: last_name, column: family_name }
131+
- name: Account
132+
source: raw.accounts
133+
properties:
134+
- { name: account_id, column: acct_id }
135+
- { name: opened_at, column: created_ts }
136+
```
137+
138+
Emitted DDL:
139+
140+
```sql
141+
CREATE PROPERTY GRAPH finance
142+
NODE TABLES (
143+
raw.accounts AS accounts
144+
KEY (acct_id)
145+
LABEL Account PROPERTIES (acct_id AS account_id, created_ts AS opened_at),
146+
raw.persons AS persons
147+
KEY (person_id)
148+
LABEL Person PROPERTIES (
149+
person_id,
150+
display_name AS name,
151+
given_name AS first_name,
152+
family_name AS last_name,
153+
(given_name || ' ' || family_name) AS full_name
154+
)
155+
)
156+
EDGE TABLES (
157+
raw.holdings AS holdings
158+
SOURCE KEY (account_id) REFERENCES accounts (acct_id)
159+
DESTINATION KEY (security_id) REFERENCES securities (cusip)
160+
LABEL HOLDS PROPERTIES (snapshot_date AS as_of, qty AS quantity)
161+
);
162+
```
163+
164+
Derived expressions become SQL expressions in the `PROPERTIES` list;
165+
column renames become `AS` clauses.
166+
167+
### Spanner
168+
169+
Same `CREATE PROPERTY GRAPH` / `NODE TABLES` / `EDGE TABLES` form, minor
170+
syntactic differences.
171+
172+
### Relationship to the GCP reference grammar
173+
174+
The resolved model maps to the `CREATE PROPERTY GRAPH` grammar as follows:
175+
176+
| Resolved model | GCP grammar |
177+
|---|---|
178+
| `NodeTable.source` | `<source> [AS <alias>]` |
179+
| `NodeTable.key_columns` | `KEY (<cols>)` |
180+
| `NodeTable.label` + `properties` | `LABEL <name> PROPERTIES (<spec_list>)` |
181+
| `EdgeTable.from_key_columns` + `from_node_table` | `SOURCE KEY (<cols>) REFERENCES <node>` |
182+
| `EdgeTable.to_key_columns` + `to_node_table` | `DESTINATION KEY (<cols>) REFERENCES <node>` |
183+
184+
The resolved model collapses the grammar's variant forms to a single
185+
canonical shape. We always emit the explicit
186+
`LABEL <name> PROPERTIES (<list>)` form and do not emit:
187+
188+
- `DEFAULT LABEL` — our properties are always enumerated.
189+
- `PROPERTIES ARE ALL COLUMNS` — same reason.
190+
- `LABEL <name> NO PROPERTIES` — every label projects at least one
191+
property.
192+
- `DYNAMIC LABEL` / `DYNAMIC PROPERTIES` — our ontology is closed-world
193+
with declared labels and properties. See §7.
194+
195+
References:
196+
[Spanner graph schema statements](https://cloud.google.com/spanner/docs/reference/standard-sql/graph-schema-statements),
197+
[BigQuery graph creation](https://cloud.google.com/bigquery/docs/graph-create).
198+
199+
## 5. Derived expressions in DDL
200+
201+
Derived properties appear as `<substituted_expr> AS <name>` in the
202+
`PROPERTIES` list. No intermediate view is created. See the `full_name`
203+
example in §4.
204+
205+
## 6. Compile-time validation
206+
207+
On top of ontology-level (`ontology.md` §10) and binding-level
208+
(`binding.md` §9) rules:
209+
210+
1. **No `extends`.** No entity or relationship in the ontology uses
211+
`extends`. Compilation of hierarchical ontologies is reserved for a
212+
future design.
213+
2. Every name in a derived expression resolves to a bound or derived
214+
property on the same entity or relationship.
215+
3. No cycles among derived properties.
216+
4. Every logical property type is supported by the target backend
217+
(`ontology.md` §7).
218+
219+
Warnings: bound entity referenced by no relationship.
220+
221+
## 7. Determinism and output shape
222+
223+
One `CREATE PROPERTY GRAPH` per compile. Node tables sorted alphabetically,
224+
then edge tables sorted alphabetically. Property lists follow ontology
225+
declaration order.
226+
227+
## 8. Open questions
228+
229+
- **Multi-graph output.** One `CREATE PROPERTY GRAPH` per compile.
230+
Multi-graph from one ontology is a composition concern.
231+
- **`DYNAMIC LABEL`.** Spanner and BigQuery support a string column as a
232+
runtime-assigned label (one node table and one edge table per schema).
233+
We don't emit it today — closed-world ontology with declared labels is
234+
enough. Revisit if an importer or user surfaces a real need.
235+
236+
## 9. Out of scope
237+
238+
- **Inheritance lowering.** Compilation of ontologies with `extends` —
239+
substitutability, per-label property projections, fanout vs union-view
240+
vs label-ref strategies, cross-table identity, overlapping siblings,
241+
merged-node lowering. Separate future design.
242+
- **CLI surface.** Command names, flag names, output destinations — a
243+
separate doc.
244+
- **Applying DDL to a live backend.** Credentials, transactions, rollback,
245+
drift detection. Any tool that can accept DDL text can consume this
246+
compiler's output.
247+
- **Measures and aggregations.** Not part of the property graph DDL.
248+
- **Composition.** Multi-file ontology assembly, shared binding defaults,
249+
overlay graphs.
250+
- **Schema evolution and migration.** Diffing two compiled outputs and
251+
emitting `ALTER` statements — separate concern.

0 commit comments

Comments
 (0)