|
| 1 | +# Access Log |
| 2 | + |
| 3 | +VGI servers emit a structured access log entry for every RPC request via |
| 4 | +the `vgi_rpc.access` Python logger. When `--log-format json` is enabled, |
| 5 | +each entry is a single JSON line on stderr containing transport-level |
| 6 | +metrics, authentication context, and the full serialized request parameters. |
| 7 | + |
| 8 | +## Enabling JSON Output |
| 9 | + |
| 10 | +```bash |
| 11 | +vgi-serve my_worker:MyWorker --http --log-format json |
| 12 | +vgi-example-http --log-format json |
| 13 | +``` |
| 14 | + |
| 15 | +For pipe (subprocess) transport, the access log is written to stderr |
| 16 | +alongside other worker logs. |
| 17 | + |
| 18 | +## Fields |
| 19 | + |
| 20 | +Every access log entry includes these fields: |
| 21 | + |
| 22 | +### Transport Fields (from `vgi_rpc`) |
| 23 | + |
| 24 | +| Field | Type | Description | |
| 25 | +|-------|------|-------------| |
| 26 | +| `timestamp` | string | ISO 8601 timestamp | |
| 27 | +| `server_id` | string | Short hex identifier for this server instance | |
| 28 | +| `protocol` | string | Protocol name (e.g. `VgiProtocol`) | |
| 29 | +| `method` | string | RPC method name (e.g. `bind`, `init`, `catalog_schemas`) | |
| 30 | +| `method_type` | string | `unary` or `stream` | |
| 31 | +| `principal` | string | Caller identity (empty if unauthenticated) | |
| 32 | +| `auth_domain` | string | Authentication scheme (e.g. `bearer`, `jwt`, empty if none) | |
| 33 | +| `authenticated` | bool | Whether the caller was authenticated | |
| 34 | +| `remote_addr` | string | Client IP address (HTTP only) | |
| 35 | +| `duration_ms` | float | Request duration in milliseconds | |
| 36 | +| `status` | string | `ok` or `error` | |
| 37 | +| `error_type` | string | Exception class name (empty on success) | |
| 38 | +| `request_id` | string | Per-request correlation ID (16-char hex) | |
| 39 | +| `http_status` | int | HTTP status code (HTTP only) | |
| 40 | +| `input_batches` | int | Number of input Arrow batches | |
| 41 | +| `output_batches` | int | Number of output Arrow batches | |
| 42 | +| `input_rows` | int | Total input rows | |
| 43 | +| `output_rows` | int | Total output rows | |
| 44 | +| `input_bytes` | int | Approximate input bytes (Arrow buffer size) | |
| 45 | +| `output_bytes` | int | Approximate output bytes (Arrow buffer size) | |
| 46 | + |
| 47 | +### Enrichment Fields |
| 48 | + |
| 49 | +| Field | Type | When Present | Description | |
| 50 | +|-------|------|-------------|-------------| |
| 51 | +| `server_version` | string | Always | VGI package version | |
| 52 | +| `request_data` | string | Dispatch methods | Base64-encoded Arrow IPC bytes of the request batch. Contains all call parameters: function name, arguments, schemas, pushdown filters, catalog names, etc. | |
| 53 | +| `stream_id` | string | Stream methods | UUID hex correlating all HTTP requests in one logical stream execution | |
| 54 | +| `claims` | object | Authenticated requests with claims | JWT/OAuth claims dict | |
| 55 | +| `error_message` | string | Error entries | Raw exception message (truncated to 500 chars) | |
| 56 | +| `request_state` | string | HTTP stream continuations | Base64-encoded state token received from the client | |
| 57 | +| `response_state` | string | HTTP exchange stream init/continuations | Base64-encoded serialized state returned to the client | |
| 58 | + |
| 59 | +## Request Types and Their Fields |
| 60 | + |
| 61 | +### `bind` (unary) |
| 62 | + |
| 63 | +Schema resolution and argument validation. Has `request_data` containing |
| 64 | +the function name, type, arguments, input schema, and settings. |
| 65 | + |
| 66 | +### `init` (stream) |
| 67 | + |
| 68 | +Stream initialization. Has `request_data` with the full `InitRequest` |
| 69 | +(including bind call, output schema, pushdown filters, projection IDs, |
| 70 | +order-by hints), plus `stream_id` for correlation. |
| 71 | + |
| 72 | +For exchange streams (scalar, table-in-out), also has `response_state` |
| 73 | +with the initial serialized state token. |
| 74 | + |
| 75 | +### `init` continuations (HTTP stream exchanges) |
| 76 | + |
| 77 | +Subsequent HTTP requests in the same stream. Have `stream_id` (from the |
| 78 | +state token) and `request_state` (the token the client sent). Exchange |
| 79 | +continuations also have `response_state` (the updated token). The |
| 80 | +`request_data` field is absent on continuations since the call parameters |
| 81 | +were logged on the initial `init` entry. |
| 82 | + |
| 83 | +For producer streams (table functions) that complete in a single HTTP |
| 84 | +request, there are no continuations — all data is returned inline. |
| 85 | + |
| 86 | +### `table_function_cardinality` (unary) |
| 87 | + |
| 88 | +Cardinality estimation. Has `request_data` with the bind call. |
| 89 | + |
| 90 | +### `catalog_*` (unary) |
| 91 | + |
| 92 | +Catalog operations (attach, schemas, table_get, etc.). Have `request_data` |
| 93 | +with the method parameters (attach_id, schema_name, table_name, etc.). |
| 94 | + |
| 95 | +## Correlating Stream Requests |
| 96 | + |
| 97 | +Over HTTP transport, a single DuckDB query may generate multiple HTTP |
| 98 | +requests: one `bind`, one `init`, and zero or more exchange/produce |
| 99 | +continuations. The `stream_id` field (a UUID) is the same across all |
| 100 | +requests in one stream, allowing analysis tools to reconstruct the |
| 101 | +full execution: |
| 102 | + |
| 103 | +```sql |
| 104 | +-- Find all requests for a specific stream |
| 105 | +SELECT * FROM access_log WHERE stream_id = 'abcdef1234567890' |
| 106 | +ORDER BY timestamp; |
| 107 | +``` |
| 108 | + |
| 109 | +The initial `init` entry has the full `request_data` with function name, |
| 110 | +arguments, and pushdown info. Continuation entries have per-request |
| 111 | +`duration_ms`, `input_rows`, `output_rows`, etc. |
| 112 | + |
| 113 | +## Decoding `request_data` |
| 114 | + |
| 115 | +The `request_data` field contains base64-encoded Arrow IPC bytes of the |
| 116 | +request batch. Each column in the batch is a method parameter. To decode: |
| 117 | + |
| 118 | +```python |
| 119 | +import base64 |
| 120 | +import pyarrow as pa |
| 121 | + |
| 122 | +request_data = "QVJST1cxAAA..." # from the log entry |
| 123 | +batch_bytes = base64.b64decode(request_data) |
| 124 | +reader = pa.ipc.open_stream(batch_bytes) |
| 125 | +batch = reader.read_all().to_batches()[0] |
| 126 | +print(batch.to_pydict()) |
| 127 | +# {'function_name': ['sequence'], 'arguments': [...], 'function_type': [1], ...} |
| 128 | +``` |
| 129 | + |
| 130 | +Note: `BindRequest` contains a `secrets` field which may include sensitive |
| 131 | +data (database passwords, API keys). If your access logs are forwarded to |
| 132 | +a log aggregation system, use an external pipeline step to strip the |
| 133 | +`secrets` column before ingestion. |
| 134 | + |
| 135 | +## Conformance Validation |
| 136 | + |
| 137 | +The `vgi_rpc.access_log_conformance` module validates that a server's |
| 138 | +access log output meets the required field presence rules. This is |
| 139 | +language-agnostic: any VGI server implementation can be validated. |
| 140 | + |
| 141 | +```bash |
| 142 | +# Validate a log file |
| 143 | +python -m vgi_rpc.access_log_conformance /tmp/vgi-http-test-server.log |
| 144 | + |
| 145 | +# Pipe from stdin |
| 146 | +cat server.log | python -m vgi_rpc.access_log_conformance |
| 147 | +``` |
| 148 | + |
| 149 | +Rules enforced: |
| 150 | + |
| 151 | +- All entries must have `server_version` and `authenticated` |
| 152 | +- Dispatch methods (`bind`, `init`, `catalog_*`, `table_function_cardinality`) must have `request_data` |
| 153 | +- Stream methods (`init`) must have `stream_id` |
| 154 | +- Error entries must have `error_message` |
| 155 | + |
| 156 | +## Example Entry |
| 157 | + |
| 158 | +```json |
| 159 | +{ |
| 160 | + "timestamp": "2026-04-12T15:30:00.123", |
| 161 | + "level": "INFO", |
| 162 | + "logger": "vgi_rpc.access", |
| 163 | + "message": "VgiProtocol.init ok", |
| 164 | + "server_id": "a85cd92650b7", |
| 165 | + "protocol": "VgiProtocol", |
| 166 | + "method": "init", |
| 167 | + "method_type": "stream", |
| 168 | + "principal": "alice@example.com", |
| 169 | + "auth_domain": "jwt", |
| 170 | + "authenticated": true, |
| 171 | + "remote_addr": "10.0.0.1", |
| 172 | + "duration_ms": 12.3, |
| 173 | + "status": "ok", |
| 174 | + "error_type": "", |
| 175 | + "server_version": "0.2.0", |
| 176 | + "request_id": "f41f090e23b84789", |
| 177 | + "http_status": 200, |
| 178 | + "request_data": "QVJST1cxAAA...", |
| 179 | + "stream_id": "85b7099a1e9b4a6b9d1fc917f27212ee", |
| 180 | + "claims": {"sub": "alice@example.com", "aud": "vgi"}, |
| 181 | + "response_state": "AAAA/////6gA...", |
| 182 | + "input_batches": 1, |
| 183 | + "output_batches": 2, |
| 184 | + "input_rows": 1, |
| 185 | + "output_rows": 1, |
| 186 | + "input_bytes": 5864, |
| 187 | + "output_bytes": 41 |
| 188 | +} |
| 189 | +``` |
| 190 | + |
| 191 | +## Central Publishing |
| 192 | + |
| 193 | +The access log uses Python's standard `logging` module. Attach any |
| 194 | +handler to `vgi_rpc.access` for forwarding to external systems: |
| 195 | + |
| 196 | +```python |
| 197 | +import logging |
| 198 | + |
| 199 | +# Forward to an HTTP endpoint |
| 200 | +handler = logging.handlers.HTTPHandler( |
| 201 | + "analytics.example.com", "/ingest", method="POST" |
| 202 | +) |
| 203 | +logging.getLogger("vgi_rpc.access").addHandler(handler) |
| 204 | +``` |
| 205 | + |
| 206 | +Container log collectors (fluentd, Vector, promtail) can parse the JSON |
| 207 | +lines from stderr directly when `--log-format json` is enabled. |
0 commit comments