Skip to content

Commit 2415f9d

Browse files
committed
Add a high-level internals outline / onboarding guide
1 parent 0989b03 commit 2415f9d

1 file changed

Lines changed: 229 additions & 0 deletions

File tree

gel/_internal/arch.md

Lines changed: 229 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,229 @@
1+
# gel-python Architecture & Onboarding Guide
2+
3+
## Overview
4+
5+
Gel Python (referred to as "gel-python" internally) is a comprehensive Python client library for Gel that provides a fully type-safe API with query builder capabilities, ORM features, and seamless integration with Pydantic models. This document serves as an onboarding guide for engineers contributing to the project.
6+
7+
## Project Structure
8+
9+
The core code is organized with all non-public API code under `gel/_internal/` with module names prefixed with underscores to clearly indicate internal implementation:
10+
11+
```
12+
gel/
13+
├── _internal/
14+
│ ├── _qb/ # Query builder AST and code generation
15+
│ ├── _qbmodel/ # Query builder model implementation
16+
│ │ ├── _abstract/ # Abstract model layer
17+
│ │ └── _pydantic/ # Pydantic-specific bindings
18+
│ ├── _reflection/ # Schema reflection system
19+
│ ├── _codegen/ # Code generation for reflected schemas
20+
│ │ └── _models/ # Model generation
21+
│ │ └── _pydantic.py # Main generator (~6000 lines)
22+
│ ├── _save.py # Save/persistence implementation
23+
│ ├── _tracked_list.py # Multi property tracking for changes
24+
│ └── _link_set.py # Multi links & multi links with props
25+
```
26+
27+
## Core Components
28+
29+
### 1. Query Builder (`_qb/`)
30+
31+
The query builder is implemented as an AST (Abstract Syntax Tree) with self-contained code generation:
32+
33+
- **`_abstract.py`**: Base query builder expressions (AST nodes)
34+
- **`_expressions.py`**: Expression types and operations
35+
- **`_generics.py`**: Custom implementation of Python's `Annotated` for type metadata
36+
- **`_protocols.py`**: Two key protocols:
37+
- `__edgeql_qbexpr__`: Classes implementing this return AST nodes
38+
- `__edgeql_expr__`: Code generation protocol returning EdgeQL strings
39+
40+
Key insight: Query builder nodes implement their own code generation, making them self-contained units that produce EdgeQL.
41+
42+
### 2. Model System (`_qbmodel/`)
43+
44+
The model system has two layers:
45+
46+
#### Abstract Layer (`_abstract/`)
47+
- **Platform-agnostic** implementation of query builder methods
48+
- Defines `GelModel` base class for object types
49+
- Implements descriptors for property/link access patterns
50+
51+
#### Pydantic Layer (`_pydantic/`)
52+
- Contains necessary workarounds to make Pydantic work with database models
53+
- Handles partial data loading (database models may have missing required fields)
54+
- Implements custom validation and JSON schema generation
55+
- Makes link properties fit Pydantic & Python object model
56+
57+
### 3. Code Generation (`_codegen/_models/_pydantic.py`)
58+
59+
This file is responsible for:
60+
61+
- Generating type-safe Python models from Gel schema
62+
- Creating overloads for generic functions
63+
- Managing imports and module structure
64+
- Handling complex type mappings (Gel → Python)
65+
66+
Key challenges:
67+
- **Function overloads**: Must generate specific overloads for each type to maintain type safety
68+
- **Implicit casting**: Gel's implicit casts must be carefully ordered to avoid MyPy overlap errors
69+
- **Type checking**: Implements rudimentary type checker for callable types
70+
71+
### 4. Save Implementation (`_save.py`)
72+
73+
The save system traverses object graphs and generates EdgeQL mutations:
74+
75+
1. **`make_plan()`**: Analyzes objects and creates a delta tree of changes
76+
2. **Change nodes**: Each operation (property change, link addition, etc.) has a corresponding node type
77+
3. **Batching**: Groups similar operations for efficiency (up to 100 per batch)
78+
4. **Transaction handling**: The generation is structured in a way to allow transactional execution of save queries
79+
80+
### 5. Model Classes (`_qbmodel/_pydantic/_models.py`)
81+
82+
The model hierarchy:
83+
84+
```python
85+
GelSourceModel # Base Pydantic wrapper with change tracking
86+
├── GelModel # Handles objects
87+
├── GelLinkModel # Handles link properties
88+
└── ProxyModel # The nightmare - wraps objects with link properties
89+
```
90+
91+
**ProxyModel is the most complex part** - it's technically a Pydantic model but doesn't behave like one, routing attributes to wrapped objects and link properties dynamically.
92+
93+
## Link Properties: The Complexity Multiplier
94+
95+
Link properties are attributes on relationships (e.g., a "friendship" link with a "since_date" property). They complicate everything:
96+
97+
- **No native Python concept** for properties on properties
98+
- **ProxyModel hack**: Wraps objects to add link property support
99+
- **Type safety challenges**: Must maintain transparency when link properties are added
100+
- **Collection complexity**: Multi-links with properties require custom collection implementations
101+
102+
Without link properties, the codebase would be **3x simpler** (save.py would be 3x shorter, queries 10x smaller).
103+
104+
## Testing Infrastructure
105+
106+
### Type-Safe Testing
107+
108+
Tests use a custom `@tb.typecheck` decorator that:
109+
1. Extracts test code into separate files
110+
2. Runs MyPy on each test individually
111+
3. Supports `assertEqual(reveal_type(), ...)` to ensure correct type inference
112+
4. Most tests are in `tests/test_model_generator.py`, QB tests are in `tests/test_qb.py`
113+
114+
115+
### Test Models Generation
116+
117+
Run `python tools/gen_models.py` to generate test models into your virtual environment's site-packages. This enables IDE support for test development.
118+
119+
## Development Setup
120+
121+
### Prerequisites
122+
123+
1. gel development VM with gel server binary in PATH
124+
2. Install with: `pip install -e .`
125+
3. **Critical**: Edit `.pth` files because editable wheel install is broken:
126+
127+
```bash
128+
python -c 'import pathlib, gel; print(pathlib.Path(gel.__path__[0]).parent)' > \
129+
$(python -c 'import site; print(site.getsitepackages()[0])')/gel.pth
130+
```
131+
132+
4. Set environment variables for gel server path if needed
133+
134+
* If you want to use the dev server in your dev Gel environment:
135+
136+
```
137+
export GEL_SERVER_BINARY=<your-server-venv>/bin/gel-server
138+
```
139+
140+
then run `pytest` with `$ env __EDGEDB_DEVMODE=1 pytest`
141+
142+
* or
143+
144+
```
145+
export GEL_SERVER_BINARY=$(gel server info --bin-path --version '6')
146+
```
147+
148+
and you should be able to just run `$ pytest`.
149+
150+
151+
### Running Tests
152+
153+
```bash
154+
# Basic test run
155+
pytest tests/test_qb.py
156+
157+
# Parallel execution (requires pytest-xdist)
158+
pytest -n 5 # Warning: High RAM usage, each process starts own DB
159+
160+
# Run failing tests first
161+
pytest --ff
162+
163+
# Run specific tests
164+
pytest -k "test_name or other_test"
165+
```
166+
167+
## Key Technical Decisions
168+
169+
### Equality and Hashing
170+
171+
- Objects with same ID are equal regardless of data differences
172+
- New objects (no ID) are only equal to themselves
173+
- Link properties are ignored in equality comparisons
174+
- Objects with IDs are hashable; new objects are not
175+
176+
### Descriptor Magic
177+
178+
The system heavily uses Python descriptors for the query builder:
179+
180+
- Accessing `User.friends` on the class returns a query builder path
181+
- Accessing `user.friends` on an instance returns actual data
182+
- This duality enables intuitive API: `User.friends.name` for queries
183+
184+
### Type System Integration
185+
186+
- Returns proper Pydantic models from queries
187+
- Codec layer adapted to accept return types from query builder
188+
- Type inference works through the entire pipeline
189+
190+
## Common Pitfalls and Gotchas
191+
192+
1. **UUID Performance**: Type IDs use integers instead of UUIDs due to Python's slow UUID constructor
193+
2. **Type IDs Issue**: Currently rely on database-specific IDs (needs fixing to use type names)
194+
3. **Pydantic Validation**: Custom validation pipeline through Rust layer requires careful schema generation
195+
4. **Stack Inspection**: Used to work around Pydantic limitations - fragile but necessary
196+
5. **Collection Tracking**: Custom collections track changes for save operations
197+
198+
## Where to Start Contributing
199+
200+
### Easier Areas
201+
- Query builder bugs (once familiar with the system)
202+
- Test coverage improvements
203+
- Documentation and comments
204+
205+
### Medium Complexity
206+
- Code generation improvements
207+
- Save compiler refactoring (currently template-based, needs proper compiler)
208+
- Performance optimizations
209+
210+
### High Complexity
211+
- Link property handling
212+
- Pydantic integration layer
213+
- Type system mapping
214+
215+
## Important Files to Understand
216+
217+
1. **`_codegen/_models/_pydantic.py`**: Main code generator
218+
2. **`_qbmodel/_pydantic/_models.py`**: Model implementations with all the hacks
219+
3. **`_save.py`**: Persistence layer
220+
4. **`_qbmodel/_pydantic/_fields.py`**: Field type definitions
221+
5. **`_tracked_list.py` & link collections**: Change tracking implementations
222+
223+
The complexity comes from:
224+
- Link properties (the #1 complexity source)
225+
- Working around Pydantic limitations
226+
- Ensuring complete type safety
227+
- Gel-specific edge cases
228+
229+
Remember: When stuck, ask in Slack. The team is responsive and the learning curve, while steep initially, becomes manageable once you understand the core patterns.

0 commit comments

Comments
 (0)