|
| 1 | +# gel-python Architecture & Onboarding Guide |
| 2 | + |
| 3 | +## Overview |
| 4 | + |
| 5 | +Gel Python (referred to as "gel-python" internally) is a comprehensive Python client library for Gel that provides a fully type-safe API with query builder capabilities, ORM features, and seamless integration with Pydantic models. This document serves as an onboarding guide for engineers contributing to the project. |
| 6 | + |
| 7 | +## Project Structure |
| 8 | + |
| 9 | +The core code is organized with all non-public API code under `gel/_internal/` with module names prefixed with underscores to clearly indicate internal implementation: |
| 10 | + |
| 11 | +``` |
| 12 | +gel/ |
| 13 | +├── _internal/ |
| 14 | +│ ├── _qb/ # Query builder AST and code generation |
| 15 | +│ ├── _qbmodel/ # Query builder model implementation |
| 16 | +│ │ ├── _abstract/ # Abstract model layer |
| 17 | +│ │ └── _pydantic/ # Pydantic-specific bindings |
| 18 | +│ ├── _reflection/ # Schema reflection system |
| 19 | +│ ├── _codegen/ # Code generation for reflected schemas |
| 20 | +│ │ └── _models/ # Model generation |
| 21 | +│ │ └── _pydantic.py # Main generator (~6000 lines) |
| 22 | +│ ├── _save.py # Save/persistence implementation |
| 23 | +│ ├── _tracked_list.py # Multi property tracking for changes |
| 24 | +│ └── _link_set.py # Multi links & multi links with props |
| 25 | +``` |
| 26 | + |
| 27 | +## Core Components |
| 28 | + |
| 29 | +### 1. Query Builder (`_qb/`) |
| 30 | + |
| 31 | +The query builder is implemented as an AST (Abstract Syntax Tree) with self-contained code generation: |
| 32 | + |
| 33 | +- **`_abstract.py`**: Base query builder expressions (AST nodes) |
| 34 | +- **`_expressions.py`**: Expression types and operations |
| 35 | +- **`_generics.py`**: Custom implementation of Python's `Annotated` for type metadata |
| 36 | +- **`_protocols.py`**: Two key protocols: |
| 37 | + - `__edgeql_qbexpr__`: Classes implementing this return AST nodes |
| 38 | + - `__edgeql_expr__`: Code generation protocol returning EdgeQL strings |
| 39 | + |
| 40 | +Key insight: Query builder nodes implement their own code generation, making them self-contained units that produce EdgeQL. |
| 41 | + |
| 42 | +### 2. Model System (`_qbmodel/`) |
| 43 | + |
| 44 | +The model system has two layers: |
| 45 | + |
| 46 | +#### Abstract Layer (`_abstract/`) |
| 47 | +- **Platform-agnostic** implementation of query builder methods |
| 48 | +- Defines `GelModel` base class for object types |
| 49 | +- Implements descriptors for property/link access patterns |
| 50 | + |
| 51 | +#### Pydantic Layer (`_pydantic/`) |
| 52 | +- Contains necessary workarounds to make Pydantic work with database models |
| 53 | +- Handles partial data loading (database models may have missing required fields) |
| 54 | +- Implements custom validation and JSON schema generation |
| 55 | +- Makes link properties fit Pydantic & Python object model |
| 56 | + |
| 57 | +### 3. Code Generation (`_codegen/_models/_pydantic.py`) |
| 58 | + |
| 59 | +This file is responsible for: |
| 60 | + |
| 61 | +- Generating type-safe Python models from Gel schema |
| 62 | +- Creating overloads for generic functions |
| 63 | +- Managing imports and module structure |
| 64 | +- Handling complex type mappings (Gel → Python) |
| 65 | + |
| 66 | +Key challenges: |
| 67 | +- **Function overloads**: Must generate specific overloads for each type to maintain type safety |
| 68 | +- **Implicit casting**: Gel's implicit casts must be carefully ordered to avoid MyPy overlap errors |
| 69 | +- **Type checking**: Implements rudimentary type checker for callable types |
| 70 | + |
| 71 | +### 4. Save Implementation (`_save.py`) |
| 72 | + |
| 73 | +The save system traverses object graphs and generates EdgeQL mutations: |
| 74 | + |
| 75 | +1. **`make_plan()`**: Analyzes objects and creates a delta tree of changes |
| 76 | +2. **Change nodes**: Each operation (property change, link addition, etc.) has a corresponding node type |
| 77 | +3. **Batching**: Groups similar operations for efficiency (up to 100 per batch) |
| 78 | +4. **Transaction handling**: The generation is structured in a way to allow transactional execution of save queries |
| 79 | + |
| 80 | +### 5. Model Classes (`_qbmodel/_pydantic/_models.py`) |
| 81 | + |
| 82 | +The model hierarchy: |
| 83 | + |
| 84 | +```python |
| 85 | +GelSourceModel # Base Pydantic wrapper with change tracking |
| 86 | + ├── GelModel # Handles objects |
| 87 | + ├── GelLinkModel # Handles link properties |
| 88 | + └── ProxyModel # The nightmare - wraps objects with link properties |
| 89 | +``` |
| 90 | + |
| 91 | +**ProxyModel is the most complex part** - it's technically a Pydantic model but doesn't behave like one, routing attributes to wrapped objects and link properties dynamically. |
| 92 | + |
| 93 | +## Link Properties: The Complexity Multiplier |
| 94 | + |
| 95 | +Link properties are attributes on relationships (e.g., a "friendship" link with a "since_date" property). They complicate everything: |
| 96 | + |
| 97 | +- **No native Python concept** for properties on properties |
| 98 | +- **ProxyModel hack**: Wraps objects to add link property support |
| 99 | +- **Type safety challenges**: Must maintain transparency when link properties are added |
| 100 | +- **Collection complexity**: Multi-links with properties require custom collection implementations |
| 101 | + |
| 102 | +Without link properties, the codebase would be **3x simpler** (save.py would be 3x shorter, queries 10x smaller). |
| 103 | + |
| 104 | +## Testing Infrastructure |
| 105 | + |
| 106 | +### Type-Safe Testing |
| 107 | + |
| 108 | +Tests use a custom `@tb.typecheck` decorator that: |
| 109 | +1. Extracts test code into separate files |
| 110 | +2. Runs MyPy on each test individually |
| 111 | +3. Supports `assertEqual(reveal_type(), ...)` to ensure correct type inference |
| 112 | +4. Most tests are in `tests/test_model_generator.py`, QB tests are in `tests/test_qb.py` |
| 113 | + |
| 114 | + |
| 115 | +### Test Models Generation |
| 116 | + |
| 117 | +Run `python tools/gen_models.py` to generate test models into your virtual environment's site-packages. This enables IDE support for test development. |
| 118 | + |
| 119 | +## Development Setup |
| 120 | + |
| 121 | +### Prerequisites |
| 122 | + |
| 123 | +1. gel development VM with gel server binary in PATH |
| 124 | +2. Install with: `pip install -e .` |
| 125 | +3. **Critical**: Edit `.pth` files because editable wheel install is broken: |
| 126 | + |
| 127 | + ```bash |
| 128 | + python -c 'import pathlib, gel; print(pathlib.Path(gel.__path__[0]).parent)' > \ |
| 129 | + $(python -c 'import site; print(site.getsitepackages()[0])')/gel.pth |
| 130 | + ``` |
| 131 | + |
| 132 | +4. Set environment variables for gel server path if needed |
| 133 | + |
| 134 | + * If you want to use the dev server in your dev Gel environment: |
| 135 | + |
| 136 | + ``` |
| 137 | + export GEL_SERVER_BINARY=<your-server-venv>/bin/gel-server |
| 138 | + ``` |
| 139 | +
|
| 140 | + then run `pytest` with `$ env __EDGEDB_DEVMODE=1 pytest` |
| 141 | +
|
| 142 | + * or |
| 143 | +
|
| 144 | + ``` |
| 145 | + export GEL_SERVER_BINARY=$(gel server info --bin-path --version '6') |
| 146 | + ``` |
| 147 | +
|
| 148 | + and you should be able to just run `$ pytest`. |
| 149 | +
|
| 150 | +
|
| 151 | +### Running Tests |
| 152 | +
|
| 153 | +```bash |
| 154 | +# Basic test run |
| 155 | +pytest tests/test_qb.py |
| 156 | +
|
| 157 | +# Parallel execution (requires pytest-xdist) |
| 158 | +pytest -n 5 # Warning: High RAM usage, each process starts own DB |
| 159 | +
|
| 160 | +# Run failing tests first |
| 161 | +pytest --ff |
| 162 | +
|
| 163 | +# Run specific tests |
| 164 | +pytest -k "test_name or other_test" |
| 165 | +``` |
| 166 | + |
| 167 | +## Key Technical Decisions |
| 168 | + |
| 169 | +### Equality and Hashing |
| 170 | + |
| 171 | +- Objects with same ID are equal regardless of data differences |
| 172 | +- New objects (no ID) are only equal to themselves |
| 173 | +- Link properties are ignored in equality comparisons |
| 174 | +- Objects with IDs are hashable; new objects are not |
| 175 | + |
| 176 | +### Descriptor Magic |
| 177 | + |
| 178 | +The system heavily uses Python descriptors for the query builder: |
| 179 | + |
| 180 | +- Accessing `User.friends` on the class returns a query builder path |
| 181 | +- Accessing `user.friends` on an instance returns actual data |
| 182 | +- This duality enables intuitive API: `User.friends.name` for queries |
| 183 | + |
| 184 | +### Type System Integration |
| 185 | + |
| 186 | +- Returns proper Pydantic models from queries |
| 187 | +- Codec layer adapted to accept return types from query builder |
| 188 | +- Type inference works through the entire pipeline |
| 189 | + |
| 190 | +## Common Pitfalls and Gotchas |
| 191 | + |
| 192 | +1. **UUID Performance**: Type IDs use integers instead of UUIDs due to Python's slow UUID constructor |
| 193 | +2. **Type IDs Issue**: Currently rely on database-specific IDs (needs fixing to use type names) |
| 194 | +3. **Pydantic Validation**: Custom validation pipeline through Rust layer requires careful schema generation |
| 195 | +4. **Stack Inspection**: Used to work around Pydantic limitations - fragile but necessary |
| 196 | +5. **Collection Tracking**: Custom collections track changes for save operations |
| 197 | + |
| 198 | +## Where to Start Contributing |
| 199 | + |
| 200 | +### Easier Areas |
| 201 | +- Query builder bugs (once familiar with the system) |
| 202 | +- Test coverage improvements |
| 203 | +- Documentation and comments |
| 204 | + |
| 205 | +### Medium Complexity |
| 206 | +- Code generation improvements |
| 207 | +- Save compiler refactoring (currently template-based, needs proper compiler) |
| 208 | +- Performance optimizations |
| 209 | + |
| 210 | +### High Complexity |
| 211 | +- Link property handling |
| 212 | +- Pydantic integration layer |
| 213 | +- Type system mapping |
| 214 | + |
| 215 | +## Important Files to Understand |
| 216 | + |
| 217 | +1. **`_codegen/_models/_pydantic.py`**: Main code generator |
| 218 | +2. **`_qbmodel/_pydantic/_models.py`**: Model implementations with all the hacks |
| 219 | +3. **`_save.py`**: Persistence layer |
| 220 | +4. **`_qbmodel/_pydantic/_fields.py`**: Field type definitions |
| 221 | +5. **`_tracked_list.py` & link collections**: Change tracking implementations |
| 222 | + |
| 223 | +The complexity comes from: |
| 224 | +- Link properties (the #1 complexity source) |
| 225 | +- Working around Pydantic limitations |
| 226 | +- Ensuring complete type safety |
| 227 | +- Gel-specific edge cases |
| 228 | + |
| 229 | +Remember: When stuck, ask in Slack. The team is responsive and the learning curve, while steep initially, becomes manageable once you understand the core patterns. |
0 commit comments