|
| 1 | +# SDK Usage Tracking via `INFORMATION_SCHEMA.JOBS` |
| 2 | + |
| 3 | +Every BigQuery job the SDK submits is labeled. Those labels land in |
| 4 | +BigQuery's native `INFORMATION_SCHEMA.JOBS` views, so you can attribute |
| 5 | +spend and usage back to the SDK without running a separate telemetry |
| 6 | +pipeline. |
| 7 | + |
| 8 | +This document is the operator cookbook: what labels exist, how to read |
| 9 | +them, and ready-to-run SQL. |
| 10 | + |
| 11 | +--- |
| 12 | + |
| 13 | +## Label schema |
| 14 | + |
| 15 | +Applied by the SDK to every query job (`QueryJobConfig.labels`) and |
| 16 | +load job (`LoadJobConfig.labels`) it submits. |
| 17 | + |
| 18 | +| Key | Value | Scope | |
| 19 | +| ------------------ | ------------------------------------------ | ----- | |
| 20 | +| `sdk` | constant `bigquery-agent-analytics` | every SDK job | |
| 21 | +| `sdk_version` | `__version__`, BQ-safe (e.g. `0-4-0`) | every SDK job | |
| 22 | +| `sdk_surface` | `python` \| `cli` \| `remote-function` | every SDK job | |
| 23 | +| `sdk_feature` | `trace-read` \| `eval-code` \| `eval-llm-judge` \| `eval-categorical` \| `insights` \| `drift` \| `memory` \| `context-graph` \| `ontology-build` \| `ontology-gql` \| `views` \| `ai-ml` \| `feedback` | per-call site | |
| 24 | +| `sdk_ai_function` | `ai-generate` \| `ai-embed` \| `ai-classify` \| `ai-forecast` \| `ai-detect-anomalies` \| `ml-generate-text` \| `ml-generate-embedding` \| `ml-detect-anomalies` \| `ml-forecast` | AI/ML invocations only | |
| 25 | + |
| 26 | +**Reserved namespace.** All `sdk*` keys are managed by the SDK. If a |
| 27 | +caller pre-sets any of these on a `QueryJobConfig.labels` dict passed |
| 28 | +to the SDK, the SDK overrides them and logs a one-shot `WARNING`. This |
| 29 | +keeps telemetry trustworthy. Non-`sdk*` user labels (e.g. |
| 30 | +`team=search`) are preserved unchanged and show up alongside the SDK |
| 31 | +labels in `INFORMATION_SCHEMA` — useful for joining SDK spend against |
| 32 | +your own cost-center dimensions. |
| 33 | + |
| 34 | +**Privacy.** SDK labels never contain `user_id`, `session_id`, |
| 35 | +`trace_id`, or any trace-extracted value. `INFORMATION_SCHEMA.JOBS` is |
| 36 | +readable by anyone with `bigquery.jobs.listAll`; the SDK enforces the |
| 37 | +`[a-z0-9_-]{1,63}` label-value format that BigQuery itself requires, |
| 38 | +which also rejects most PII shapes (emails, UUIDs with dashes only |
| 39 | +pass, etc. — avoid adding trace-derived values to any custom labels |
| 40 | +you set). |
| 41 | + |
| 42 | +**Out of scope.** Streaming inserts via `insert_rows_json` / |
| 43 | +`tabledata.insertAll` are **not** jobs, do not support labels, and do |
| 44 | +not appear in `INFORMATION_SCHEMA.JOBS`. To observe those, use Cloud |
| 45 | +Audit Logs. |
| 46 | + |
| 47 | +--- |
| 48 | + |
| 49 | +## Prerequisites |
| 50 | + |
| 51 | +- Read access to `INFORMATION_SCHEMA.JOBS_BY_PROJECT` or |
| 52 | + `INFORMATION_SCHEMA.JOBS_BY_ORGANIZATION` — typically `bigquery.jobs.listAll` |
| 53 | + plus appropriate dataset/organization IAM. |
| 54 | +- Replace `region-us` in the queries below with your BigQuery region |
| 55 | + (e.g. `region-eu`, `region-asia-northeast1`). The region is the |
| 56 | + BigQuery **multi-region or location** of the dataset where jobs run. |
| 57 | + |
| 58 | +--- |
| 59 | + |
| 60 | +## Queries |
| 61 | + |
| 62 | +### 1. Feature adoption over the last 30 days |
| 63 | + |
| 64 | +Which SDK features are being used, from which surface, and how much |
| 65 | +do they cost? |
| 66 | + |
| 67 | +```sql |
| 68 | +SELECT |
| 69 | + (SELECT value FROM UNNEST(labels) WHERE key = 'sdk_feature') AS feature, |
| 70 | + (SELECT value FROM UNNEST(labels) WHERE key = 'sdk_surface') AS surface, |
| 71 | + COUNT(*) AS jobs, |
| 72 | + SUM(total_bytes_billed) / POW(2, 40) AS tib_billed, |
| 73 | + SUM(TIMESTAMP_DIFF(end_time, start_time, MILLISECOND)) / 1000.0 / 60 |
| 74 | + AS total_minutes |
| 75 | +FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT |
| 76 | +WHERE creation_time > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY) |
| 77 | + AND EXISTS (SELECT 1 FROM UNNEST(labels) WHERE key = 'sdk') |
| 78 | +GROUP BY feature, surface |
| 79 | +ORDER BY jobs DESC; |
| 80 | +``` |
| 81 | + |
| 82 | +### 2. AI/ML function cost breakdown |
| 83 | + |
| 84 | +Where is your `AI.GENERATE` / `AI.EMBED` / `AI.FORECAST` spend going? |
| 85 | + |
| 86 | +```sql |
| 87 | +SELECT |
| 88 | + (SELECT value FROM UNNEST(labels) WHERE key = 'sdk_ai_function') |
| 89 | + AS ai_function, |
| 90 | + (SELECT value FROM UNNEST(labels) WHERE key = 'sdk_feature') AS feature, |
| 91 | + COUNT(*) AS jobs, |
| 92 | + SUM(total_bytes_billed) / POW(2, 40) AS tib_billed, |
| 93 | + AVG(TIMESTAMP_DIFF(end_time, start_time, MILLISECOND)) AS avg_ms |
| 94 | +FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT |
| 95 | +WHERE creation_time > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY) |
| 96 | + AND EXISTS ( |
| 97 | + SELECT 1 FROM UNNEST(labels) WHERE key = 'sdk_ai_function' |
| 98 | + ) |
| 99 | +GROUP BY ai_function, feature |
| 100 | +ORDER BY tib_billed DESC; |
| 101 | +``` |
| 102 | + |
| 103 | +### 3. Slowest feature per day (p50 / p95 latency) |
| 104 | + |
| 105 | +Which features are degrading or have runaway outliers? |
| 106 | + |
| 107 | +```sql |
| 108 | +SELECT |
| 109 | + DATE(creation_time) AS day, |
| 110 | + (SELECT value FROM UNNEST(labels) WHERE key = 'sdk_feature') AS feature, |
| 111 | + COUNT(*) AS jobs, |
| 112 | + APPROX_QUANTILES( |
| 113 | + TIMESTAMP_DIFF(end_time, start_time, MILLISECOND), 100 |
| 114 | + )[OFFSET(50)] AS p50_ms, |
| 115 | + APPROX_QUANTILES( |
| 116 | + TIMESTAMP_DIFF(end_time, start_time, MILLISECOND), 100 |
| 117 | + )[OFFSET(95)] AS p95_ms |
| 118 | +FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT |
| 119 | +WHERE creation_time > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 14 DAY) |
| 120 | + AND EXISTS (SELECT 1 FROM UNNEST(labels) WHERE key = 'sdk') |
| 121 | + AND state = 'DONE' |
| 122 | +GROUP BY day, feature |
| 123 | +HAVING jobs >= 5 |
| 124 | +ORDER BY day DESC, p95_ms DESC; |
| 125 | +``` |
| 126 | + |
| 127 | +### 4. Version adoption after a release |
| 128 | + |
| 129 | +How many jobs are still on the old version after you cut a new one? |
| 130 | + |
| 131 | +```sql |
| 132 | +SELECT |
| 133 | + (SELECT value FROM UNNEST(labels) WHERE key = 'sdk_version') AS sdk_version, |
| 134 | + DATE(creation_time) AS day, |
| 135 | + COUNT(*) AS jobs |
| 136 | +FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT |
| 137 | +WHERE creation_time > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 14 DAY) |
| 138 | + AND EXISTS (SELECT 1 FROM UNNEST(labels) WHERE key = 'sdk') |
| 139 | +GROUP BY sdk_version, day |
| 140 | +ORDER BY day DESC, jobs DESC; |
| 141 | +``` |
| 142 | + |
| 143 | +### 5. Surface attribution (who is calling the SDK?) |
| 144 | + |
| 145 | +Split spend across direct Python users, CLI invocations, and the |
| 146 | +deployed remote-function runtime. |
| 147 | + |
| 148 | +```sql |
| 149 | +SELECT |
| 150 | + (SELECT value FROM UNNEST(labels) WHERE key = 'sdk_surface') AS surface, |
| 151 | + COUNT(*) AS jobs, |
| 152 | + SUM(total_bytes_billed) / POW(2, 40) AS tib_billed |
| 153 | +FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT |
| 154 | +WHERE creation_time > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY) |
| 155 | + AND EXISTS (SELECT 1 FROM UNNEST(labels) WHERE key = 'sdk') |
| 156 | +GROUP BY surface |
| 157 | +ORDER BY tib_billed DESC; |
| 158 | +``` |
| 159 | + |
| 160 | +### 6. Errors by feature |
| 161 | + |
| 162 | +Are any SDK features failing disproportionately? |
| 163 | + |
| 164 | +```sql |
| 165 | +SELECT |
| 166 | + (SELECT value FROM UNNEST(labels) WHERE key = 'sdk_feature') AS feature, |
| 167 | + error_result.reason AS reason, |
| 168 | + COUNT(*) AS failed_jobs |
| 169 | +FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT |
| 170 | +WHERE creation_time > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY) |
| 171 | + AND EXISTS (SELECT 1 FROM UNNEST(labels) WHERE key = 'sdk') |
| 172 | + AND state = 'DONE' |
| 173 | + AND error_result.reason IS NOT NULL |
| 174 | +GROUP BY feature, reason |
| 175 | +ORDER BY failed_jobs DESC; |
| 176 | +``` |
| 177 | + |
| 178 | +### 7. Custom caller labels joined with SDK labels |
| 179 | + |
| 180 | +If your callers add their own labels (e.g. `team=search`, |
| 181 | +`env=prod`) before handing a `QueryJobConfig` to the SDK, those |
| 182 | +survive and co-exist with the SDK's labels. You can slice SDK usage |
| 183 | +by your own cost-center dimensions: |
| 184 | + |
| 185 | +```sql |
| 186 | +SELECT |
| 187 | + (SELECT value FROM UNNEST(labels) WHERE key = 'team') AS team, |
| 188 | + (SELECT value FROM UNNEST(labels) WHERE key = 'sdk_feature') AS feature, |
| 189 | + COUNT(*) AS jobs, |
| 190 | + SUM(total_bytes_billed) / POW(2, 40) AS tib_billed |
| 191 | +FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT |
| 192 | +WHERE creation_time > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY) |
| 193 | + AND EXISTS (SELECT 1 FROM UNNEST(labels) WHERE key = 'sdk') |
| 194 | + AND EXISTS (SELECT 1 FROM UNNEST(labels) WHERE key = 'team') |
| 195 | +GROUP BY team, feature |
| 196 | +ORDER BY tib_billed DESC; |
| 197 | +``` |
| 198 | + |
| 199 | +--- |
| 200 | + |
| 201 | +## Opting in and out |
| 202 | + |
| 203 | +### By default: opt-in |
| 204 | + |
| 205 | +Constructing the SDK the normal way gets you labels on every job: |
| 206 | + |
| 207 | +```python |
| 208 | +from bigquery_agent_analytics import Client |
| 209 | + |
| 210 | +# sdk_surface defaults to "python"; bq_client is lazily built via |
| 211 | +# make_bq_client, which returns a LabeledBigQueryClient. |
| 212 | +client = Client(project_id="my-proj", dataset_id="analytics") |
| 213 | +``` |
| 214 | + |
| 215 | +### Explicitly construct the labeled client |
| 216 | + |
| 217 | +If you need your own `google.cloud.bigquery.Client` configuration |
| 218 | +(custom `client_info`, `default_query_job_config`, transport, etc.) |
| 219 | +but still want SDK labels, use `make_bq_client`: |
| 220 | + |
| 221 | +```python |
| 222 | +from bigquery_agent_analytics import make_bq_client, Client |
| 223 | + |
| 224 | +bq = make_bq_client(project="my-proj", location="US", sdk_surface="python") |
| 225 | +# ... mutate bq.default_query_job_config, etc., if you want. |
| 226 | + |
| 227 | +client = Client(project_id="my-proj", dataset_id="analytics", bq_client=bq) |
| 228 | +``` |
| 229 | + |
| 230 | +### Pass your own client — labels are NOT applied |
| 231 | + |
| 232 | +If you pass a vanilla `bigquery.Client` to `Client(bq_client=...)`, |
| 233 | +the SDK honors it as-is (no reconstruction, so your |
| 234 | +`default_query_job_config` and other settings survive) and logs a |
| 235 | +one-shot `WARNING` noting that SDK labels will not be applied: |
| 236 | + |
| 237 | +```python |
| 238 | +from google.cloud import bigquery |
| 239 | +from bigquery_agent_analytics import Client |
| 240 | + |
| 241 | +client = Client( |
| 242 | + project_id="my-proj", |
| 243 | + dataset_id="analytics", |
| 244 | + bq_client=bigquery.Client(project="my-proj"), |
| 245 | + # Jobs from this Client will NOT carry sdk_* labels. |
| 246 | + # The SDK logs one WARNING explaining how to opt in. |
| 247 | +) |
| 248 | +``` |
| 249 | + |
| 250 | +--- |
| 251 | + |
| 252 | +## Related |
| 253 | + |
| 254 | +- See `SDK.md` for the full consumption-layer API reference. |
| 255 | +- See [issue #52 on GoogleCloudPlatform/BigQuery-Agent-Analytics-SDK][issue-52] |
| 256 | + for the design discussion and rollout history. |
| 257 | + |
| 258 | +[issue-52]: https://github.com/GoogleCloudPlatform/BigQuery-Agent-Analytics-SDK/issues/52 |
0 commit comments