|
| 1 | +# Compilation — Core Design (v0) |
| 2 | + |
| 3 | +Status: draft |
| 4 | +Scope: how an ontology (`ontology.md`) plus a binding (`binding.md`) are |
| 5 | +resolved and emitted as backend DDL (`CREATE PROPERTY GRAPH` on BigQuery or |
| 6 | +Spanner). |
| 7 | + |
| 8 | +**v0 compiles flat ontologies only.** Ontologies that use `extends` on |
| 9 | +entities or relationships are rejected at compile time. Inheritance |
| 10 | +lowering — substitutability, per-label property projections, cross-table |
| 11 | +identity, overlapping siblings — is the subject of a separate future |
| 12 | +design. Deployment, credentials, and measures are out of scope. |
| 13 | + |
| 14 | +## 1. Goals |
| 15 | + |
| 16 | +- **Single-shot compile.** Ontology plus binding in, DDL text out. No |
| 17 | + intermediate on-disk artifact. |
| 18 | +- **Deterministic output.** Same inputs → byte-identical DDL. |
| 19 | +- **Backend-neutral pipeline, backend-specific emitter.** Resolution is |
| 20 | + shared across backends; emission is per-backend. |
| 21 | +- **Output is just text.** What consumes it (deploy tool, `bq query`, |
| 22 | + Terraform, a human) is outside this spec. |
| 23 | + |
| 24 | +## 2. Pipeline |
| 25 | + |
| 26 | +``` |
| 27 | +ontology.yaml ──┐ |
| 28 | + ├──► Resolver ──► ResolvedGraph ──► Emitter ──► DDL |
| 29 | +binding.yaml ──┘ (BQ|Spanner) |
| 30 | +``` |
| 31 | + |
| 32 | +Stages: |
| 33 | + |
| 34 | +1. **Load.** Parse and validate ontology and binding independently against |
| 35 | + their specs. |
| 36 | +2. **Resolve.** Cross-check names, wire derived expressions to bound |
| 37 | + columns. Produce an in-memory `ResolvedGraph`. |
| 38 | +3. **Emit.** Walk the `ResolvedGraph` and produce backend-specific DDL. |
| 39 | + |
| 40 | +## 2a. Type overview (resolved model) |
| 41 | + |
| 42 | +```yaml |
| 43 | +# ResolvedGraph |
| 44 | +name: <string> # graph name, from ontology |
| 45 | +target: <Target> # from binding |
| 46 | +node_tables: [<NodeTable>, ...] |
| 47 | +edge_tables: [<EdgeTable>, ...] |
| 48 | +``` |
| 49 | +
|
| 50 | +```yaml |
| 51 | +# NodeTable |
| 52 | +label: <string> # entity name |
| 53 | +key_columns: [<string>, ...] |
| 54 | +source: <string> # fully qualified |
| 55 | +properties: [<ResolvedProperty>, ...] |
| 56 | +``` |
| 57 | +
|
| 58 | +```yaml |
| 59 | +# EdgeTable |
| 60 | +label: <string> # relationship name |
| 61 | +source: <string> |
| 62 | +from_key_columns: [<string>, ...] |
| 63 | +to_key_columns: [<string>, ...] |
| 64 | +from_node_table: <string> # which node table this edge's source points to |
| 65 | +to_node_table: <string> |
| 66 | +properties: [<ResolvedProperty>, ...] |
| 67 | +``` |
| 68 | +
|
| 69 | +```yaml |
| 70 | +# ResolvedProperty |
| 71 | +name: <string> # logical property name |
| 72 | +type: <string> # GoogleSQL type |
| 73 | +sql: <string> # column name, or substituted expression for derived |
| 74 | +``` |
| 75 | +
|
| 76 | +## 3. Resolution |
| 77 | +
|
| 78 | +### Substitute derived expressions |
| 79 | +
|
| 80 | +For each derived property, substitute each name referenced in `expr:` with |
| 81 | +the column name from the binding. References to other derived properties |
| 82 | +are resolved recursively; cycles are a compile-time error. |
| 83 | + |
| 84 | +### Resolve endpoints |
| 85 | + |
| 86 | +For each relationship, look up the single node table for each endpoint |
| 87 | +entity. Because v0 does not lower inheritance, each endpoint entity is |
| 88 | +bound to exactly one node table, so endpoint resolution is direct. |
| 89 | + |
| 90 | +## 4. Emission |
| 91 | + |
| 92 | +Both backends produce `CREATE PROPERTY GRAPH` statements. Node tables and |
| 93 | +edge tables are listed in deterministic alphabetical order. Property lists |
| 94 | +follow the ontology declaration order of the owning entity / relationship. |
| 95 | + |
| 96 | +### BigQuery |
| 97 | + |
| 98 | +#### Worked example |
| 99 | + |
| 100 | +Ontology fragment: |
| 101 | + |
| 102 | +```yaml |
| 103 | +entities: |
| 104 | + - name: Person |
| 105 | + keys: { primary: [person_id] } |
| 106 | + properties: |
| 107 | + - { name: person_id, type: string } |
| 108 | + - { name: name, type: string } |
| 109 | + - { name: first_name, type: string } |
| 110 | + - { name: last_name, type: string } |
| 111 | + - { name: full_name, type: string, |
| 112 | + expr: "first_name || ' ' || last_name" } |
| 113 | + - name: Account |
| 114 | + keys: { primary: [account_id] } |
| 115 | + properties: |
| 116 | + - { name: account_id, type: string } |
| 117 | + - { name: opened_at, type: timestamp } |
| 118 | +``` |
| 119 | + |
| 120 | +Binding fragment: |
| 121 | + |
| 122 | +```yaml |
| 123 | +entities: |
| 124 | + - name: Person |
| 125 | + source: raw.persons |
| 126 | + properties: |
| 127 | + - { name: person_id, column: person_id } |
| 128 | + - { name: name, column: display_name } |
| 129 | + - { name: first_name, column: given_name } |
| 130 | + - { name: last_name, column: family_name } |
| 131 | + - name: Account |
| 132 | + source: raw.accounts |
| 133 | + properties: |
| 134 | + - { name: account_id, column: acct_id } |
| 135 | + - { name: opened_at, column: created_ts } |
| 136 | +``` |
| 137 | + |
| 138 | +Emitted DDL: |
| 139 | + |
| 140 | +```sql |
| 141 | +CREATE PROPERTY GRAPH finance |
| 142 | + NODE TABLES ( |
| 143 | + raw.accounts AS accounts |
| 144 | + KEY (acct_id) |
| 145 | + LABEL Account PROPERTIES (acct_id AS account_id, created_ts AS opened_at), |
| 146 | + raw.persons AS persons |
| 147 | + KEY (person_id) |
| 148 | + LABEL Person PROPERTIES ( |
| 149 | + person_id, |
| 150 | + display_name AS name, |
| 151 | + given_name AS first_name, |
| 152 | + family_name AS last_name, |
| 153 | + (given_name || ' ' || family_name) AS full_name |
| 154 | + ) |
| 155 | + ) |
| 156 | + EDGE TABLES ( |
| 157 | + raw.holdings AS holdings |
| 158 | + SOURCE KEY (account_id) REFERENCES accounts (acct_id) |
| 159 | + DESTINATION KEY (security_id) REFERENCES securities (cusip) |
| 160 | + LABEL HOLDS PROPERTIES (snapshot_date AS as_of, qty AS quantity) |
| 161 | + ); |
| 162 | +``` |
| 163 | + |
| 164 | +Derived expressions become SQL expressions in the `PROPERTIES` list; |
| 165 | +column renames become `AS` clauses. |
| 166 | + |
| 167 | +### Spanner |
| 168 | + |
| 169 | +Same `CREATE PROPERTY GRAPH` / `NODE TABLES` / `EDGE TABLES` form, minor |
| 170 | +syntactic differences. |
| 171 | + |
| 172 | +### Relationship to the GCP reference grammar |
| 173 | + |
| 174 | +The resolved model maps to the `CREATE PROPERTY GRAPH` grammar as follows: |
| 175 | + |
| 176 | +| Resolved model | GCP grammar | |
| 177 | +|---|---| |
| 178 | +| `NodeTable.source` | `<source> [AS <alias>]` | |
| 179 | +| `NodeTable.key_columns` | `KEY (<cols>)` | |
| 180 | +| `NodeTable.label` + `properties` | `LABEL <name> PROPERTIES (<spec_list>)` | |
| 181 | +| `EdgeTable.from_key_columns` + `from_node_table` | `SOURCE KEY (<cols>) REFERENCES <node>` | |
| 182 | +| `EdgeTable.to_key_columns` + `to_node_table` | `DESTINATION KEY (<cols>) REFERENCES <node>` | |
| 183 | + |
| 184 | +The resolved model collapses the grammar's variant forms to a single |
| 185 | +canonical shape. We always emit the explicit |
| 186 | +`LABEL <name> PROPERTIES (<list>)` form and do not emit: |
| 187 | + |
| 188 | +- `DEFAULT LABEL` — our properties are always enumerated. |
| 189 | +- `PROPERTIES ARE ALL COLUMNS` — same reason. |
| 190 | +- `LABEL <name> NO PROPERTIES` — every label projects at least one |
| 191 | + property. |
| 192 | +- `DYNAMIC LABEL` / `DYNAMIC PROPERTIES` — our ontology is closed-world |
| 193 | + with declared labels and properties. See §7. |
| 194 | + |
| 195 | +References: |
| 196 | +[Spanner graph schema statements](https://cloud.google.com/spanner/docs/reference/standard-sql/graph-schema-statements), |
| 197 | +[BigQuery graph creation](https://cloud.google.com/bigquery/docs/graph-create). |
| 198 | + |
| 199 | +## 5. Derived expressions in DDL |
| 200 | + |
| 201 | +Derived properties appear as `<substituted_expr> AS <name>` in the |
| 202 | +`PROPERTIES` list. No intermediate view is created. See the `full_name` |
| 203 | +example in §4. |
| 204 | + |
| 205 | +## 6. Compile-time validation |
| 206 | + |
| 207 | +On top of ontology-level (`ontology.md` §10) and binding-level |
| 208 | +(`binding.md` §9) rules: |
| 209 | + |
| 210 | +1. **No `extends`.** No entity or relationship in the ontology uses |
| 211 | + `extends`. Compilation of hierarchical ontologies is reserved for a |
| 212 | + future design. |
| 213 | +2. Every name in a derived expression resolves to a bound or derived |
| 214 | + property on the same entity or relationship. |
| 215 | +3. No cycles among derived properties. |
| 216 | +4. Every logical property type is supported by the target backend |
| 217 | + (`ontology.md` §7). |
| 218 | + |
| 219 | +Warnings: bound entity referenced by no relationship. |
| 220 | + |
| 221 | +## 7. Determinism and output shape |
| 222 | + |
| 223 | +One `CREATE PROPERTY GRAPH` per compile. Node tables sorted alphabetically, |
| 224 | +then edge tables sorted alphabetically. Property lists follow ontology |
| 225 | +declaration order. |
| 226 | + |
| 227 | +## 8. Open questions |
| 228 | + |
| 229 | +- **Multi-graph output.** One `CREATE PROPERTY GRAPH` per compile. |
| 230 | + Multi-graph from one ontology is a composition concern. |
| 231 | +- **`DYNAMIC LABEL`.** Spanner and BigQuery support a string column as a |
| 232 | + runtime-assigned label (one node table and one edge table per schema). |
| 233 | + We don't emit it today — closed-world ontology with declared labels is |
| 234 | + enough. Revisit if an importer or user surfaces a real need. |
| 235 | + |
| 236 | +## 9. Out of scope |
| 237 | + |
| 238 | +- **Inheritance lowering.** Compilation of ontologies with `extends` — |
| 239 | + substitutability, per-label property projections, fanout vs union-view |
| 240 | + vs label-ref strategies, cross-table identity, overlapping siblings, |
| 241 | + merged-node lowering. Separate future design. |
| 242 | +- **CLI surface.** Command names, flag names, output destinations — a |
| 243 | + separate doc. |
| 244 | +- **Applying DDL to a live backend.** Credentials, transactions, rollback, |
| 245 | + drift detection. Any tool that can accept DDL text can consume this |
| 246 | + compiler's output. |
| 247 | +- **Measures and aggregations.** Not part of the property graph DDL. |
| 248 | +- **Composition.** Multi-file ontology assembly, shared binding defaults, |
| 249 | + overlay graphs. |
| 250 | +- **Schema evolution and migration.** Diffing two compiled outputs and |
| 251 | + emitting `ALTER` statements — separate concern. |
0 commit comments