Skip to content

Commit 6d1917d

Browse files
rustyconoverclaude
andcommitted
Add access log docs and navigation for access-log + conformance pages
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 8bbb161 commit 6d1917d

2 files changed

Lines changed: 209 additions & 0 deletions

File tree

docs/access-log.md

Lines changed: 207 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,207 @@
1+
# Access Log
2+
3+
VGI servers emit a structured access log entry for every RPC request via
4+
the `vgi_rpc.access` Python logger. When `--log-format json` is enabled,
5+
each entry is a single JSON line on stderr containing transport-level
6+
metrics, authentication context, and the full serialized request parameters.
7+
8+
## Enabling JSON Output
9+
10+
```bash
11+
vgi-serve my_worker:MyWorker --http --log-format json
12+
vgi-example-http --log-format json
13+
```
14+
15+
For pipe (subprocess) transport, the access log is written to stderr
16+
alongside other worker logs.
17+
18+
## Fields
19+
20+
Every access log entry includes these fields:
21+
22+
### Transport Fields (from `vgi_rpc`)
23+
24+
| Field | Type | Description |
25+
|-------|------|-------------|
26+
| `timestamp` | string | ISO 8601 timestamp |
27+
| `server_id` | string | Short hex identifier for this server instance |
28+
| `protocol` | string | Protocol name (e.g. `VgiProtocol`) |
29+
| `method` | string | RPC method name (e.g. `bind`, `init`, `catalog_schemas`) |
30+
| `method_type` | string | `unary` or `stream` |
31+
| `principal` | string | Caller identity (empty if unauthenticated) |
32+
| `auth_domain` | string | Authentication scheme (e.g. `bearer`, `jwt`, empty if none) |
33+
| `authenticated` | bool | Whether the caller was authenticated |
34+
| `remote_addr` | string | Client IP address (HTTP only) |
35+
| `duration_ms` | float | Request duration in milliseconds |
36+
| `status` | string | `ok` or `error` |
37+
| `error_type` | string | Exception class name (empty on success) |
38+
| `request_id` | string | Per-request correlation ID (16-char hex) |
39+
| `http_status` | int | HTTP status code (HTTP only) |
40+
| `input_batches` | int | Number of input Arrow batches |
41+
| `output_batches` | int | Number of output Arrow batches |
42+
| `input_rows` | int | Total input rows |
43+
| `output_rows` | int | Total output rows |
44+
| `input_bytes` | int | Approximate input bytes (Arrow buffer size) |
45+
| `output_bytes` | int | Approximate output bytes (Arrow buffer size) |
46+
47+
### Enrichment Fields
48+
49+
| Field | Type | When Present | Description |
50+
|-------|------|-------------|-------------|
51+
| `server_version` | string | Always | VGI package version |
52+
| `request_data` | string | Dispatch methods | Base64-encoded Arrow IPC bytes of the request batch. Contains all call parameters: function name, arguments, schemas, pushdown filters, catalog names, etc. |
53+
| `stream_id` | string | Stream methods | UUID hex correlating all HTTP requests in one logical stream execution |
54+
| `claims` | object | Authenticated requests with claims | JWT/OAuth claims dict |
55+
| `error_message` | string | Error entries | Raw exception message (truncated to 500 chars) |
56+
| `request_state` | string | HTTP stream continuations | Base64-encoded state token received from the client |
57+
| `response_state` | string | HTTP exchange stream init/continuations | Base64-encoded serialized state returned to the client |
58+
59+
## Request Types and Their Fields
60+
61+
### `bind` (unary)
62+
63+
Schema resolution and argument validation. Has `request_data` containing
64+
the function name, type, arguments, input schema, and settings.
65+
66+
### `init` (stream)
67+
68+
Stream initialization. Has `request_data` with the full `InitRequest`
69+
(including bind call, output schema, pushdown filters, projection IDs,
70+
order-by hints), plus `stream_id` for correlation.
71+
72+
For exchange streams (scalar, table-in-out), also has `response_state`
73+
with the initial serialized state token.
74+
75+
### `init` continuations (HTTP stream exchanges)
76+
77+
Subsequent HTTP requests in the same stream. Have `stream_id` (from the
78+
state token) and `request_state` (the token the client sent). Exchange
79+
continuations also have `response_state` (the updated token). The
80+
`request_data` field is absent on continuations since the call parameters
81+
were logged on the initial `init` entry.
82+
83+
For producer streams (table functions) that complete in a single HTTP
84+
request, there are no continuations — all data is returned inline.
85+
86+
### `table_function_cardinality` (unary)
87+
88+
Cardinality estimation. Has `request_data` with the bind call.
89+
90+
### `catalog_*` (unary)
91+
92+
Catalog operations (attach, schemas, table_get, etc.). Have `request_data`
93+
with the method parameters (attach_id, schema_name, table_name, etc.).
94+
95+
## Correlating Stream Requests
96+
97+
Over HTTP transport, a single DuckDB query may generate multiple HTTP
98+
requests: one `bind`, one `init`, and zero or more exchange/produce
99+
continuations. The `stream_id` field (a UUID) is the same across all
100+
requests in one stream, allowing analysis tools to reconstruct the
101+
full execution:
102+
103+
```sql
104+
-- Find all requests for a specific stream
105+
SELECT * FROM access_log WHERE stream_id = 'abcdef1234567890'
106+
ORDER BY timestamp;
107+
```
108+
109+
The initial `init` entry has the full `request_data` with function name,
110+
arguments, and pushdown info. Continuation entries have per-request
111+
`duration_ms`, `input_rows`, `output_rows`, etc.
112+
113+
## Decoding `request_data`
114+
115+
The `request_data` field contains base64-encoded Arrow IPC bytes of the
116+
request batch. Each column in the batch is a method parameter. To decode:
117+
118+
```python
119+
import base64
120+
import pyarrow as pa
121+
122+
request_data = "QVJST1cxAAA..." # from the log entry
123+
batch_bytes = base64.b64decode(request_data)
124+
reader = pa.ipc.open_stream(batch_bytes)
125+
batch = reader.read_all().to_batches()[0]
126+
print(batch.to_pydict())
127+
# {'function_name': ['sequence'], 'arguments': [...], 'function_type': [1], ...}
128+
```
129+
130+
Note: `BindRequest` contains a `secrets` field which may include sensitive
131+
data (database passwords, API keys). If your access logs are forwarded to
132+
a log aggregation system, use an external pipeline step to strip the
133+
`secrets` column before ingestion.
134+
135+
## Conformance Validation
136+
137+
The `vgi_rpc.access_log_conformance` module validates that a server's
138+
access log output meets the required field presence rules. This is
139+
language-agnostic: any VGI server implementation can be validated.
140+
141+
```bash
142+
# Validate a log file
143+
python -m vgi_rpc.access_log_conformance /tmp/vgi-http-test-server.log
144+
145+
# Pipe from stdin
146+
cat server.log | python -m vgi_rpc.access_log_conformance
147+
```
148+
149+
Rules enforced:
150+
151+
- All entries must have `server_version` and `authenticated`
152+
- Dispatch methods (`bind`, `init`, `catalog_*`, `table_function_cardinality`) must have `request_data`
153+
- Stream methods (`init`) must have `stream_id`
154+
- Error entries must have `error_message`
155+
156+
## Example Entry
157+
158+
```json
159+
{
160+
"timestamp": "2026-04-12T15:30:00.123",
161+
"level": "INFO",
162+
"logger": "vgi_rpc.access",
163+
"message": "VgiProtocol.init ok",
164+
"server_id": "a85cd92650b7",
165+
"protocol": "VgiProtocol",
166+
"method": "init",
167+
"method_type": "stream",
168+
"principal": "alice@example.com",
169+
"auth_domain": "jwt",
170+
"authenticated": true,
171+
"remote_addr": "10.0.0.1",
172+
"duration_ms": 12.3,
173+
"status": "ok",
174+
"error_type": "",
175+
"server_version": "0.2.0",
176+
"request_id": "f41f090e23b84789",
177+
"http_status": 200,
178+
"request_data": "QVJST1cxAAA...",
179+
"stream_id": "85b7099a1e9b4a6b9d1fc917f27212ee",
180+
"claims": {"sub": "alice@example.com", "aud": "vgi"},
181+
"response_state": "AAAA/////6gA...",
182+
"input_batches": 1,
183+
"output_batches": 2,
184+
"input_rows": 1,
185+
"output_rows": 1,
186+
"input_bytes": 5864,
187+
"output_bytes": 41
188+
}
189+
```
190+
191+
## Central Publishing
192+
193+
The access log uses Python's standard `logging` module. Attach any
194+
handler to `vgi_rpc.access` for forwarding to external systems:
195+
196+
```python
197+
import logging
198+
199+
# Forward to an HTTP endpoint
200+
handler = logging.handlers.HTTPHandler(
201+
"analytics.example.com", "/ingest", method="POST"
202+
)
203+
logging.getLogger("vgi_rpc.access").addHandler(handler)
204+
```
205+
206+
Container log collectors (fluentd, Vector, promtail) can parse the JSON
207+
lines from stderr directly when `--log-format json` is enabled.

mkdocs.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -128,6 +128,8 @@ nav:
128128
- Logging: api/logging.md
129129
- OpenTelemetry: api/otel.md
130130
- CLI: api/cli.md
131+
- Access Log: access-log.md
132+
- Conformance: cross-language-conformance.md
131133
- Examples: examples.md
132134
- Hosting: hosting.md
133135
- Benchmarks: benchmarks.md

0 commit comments

Comments
 (0)