Skip to content

Commit 198a4ae

Browse files
srnicholsCopilot
andcommitted
docs(foundry-quota): add quota-preflight integration guide + CHANGELOG
Phase-FOUNDRY-QUOTA-PREFLIGHT Slice 5 — Docs + CHANGELOG. - docs/integrations/foundry-quota-preflight.md: operator guide covering activation (warn/block modes), threshold reference (safe/warning/critical/ unknown), Cognitive Services Usages Reader RBAC role, 5-minute TTL cache, quota response shape, az role assignment example, troubleshooting table. - CHANGELOG.md: Phase-FOUNDRY-QUOTA-PREFLIGHT entry under [Unreleased] documenting foundry-quota.mjs, tests, and the integration doc. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1 parent 18fc888 commit 198a4ae

2 files changed

Lines changed: 204 additions & 0 deletions

File tree

CHANGELOG.md

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,25 @@ Format follows [Keep a Changelog](https://keepachangelog.com/).
99

1010
---
1111

12+
### Phase-FOUNDRY-QUOTA-PREFLIGHT — Azure AI Foundry deployment quota preflight
13+
14+
> **One-liner**: Adds a quota preflight step to `forge_run_plan` for Microsoft Foundry / BYO Azure OpenAI deployments — fetches TPM capacity from the Azure Cognitive Services control-plane REST API, caches results for 5 minutes, compares the slice token estimate against available headroom (safe ≥ 30 %, warning 10–30 %, critical < 10 %), and logs a structured `[foundry-quota]` annotation on every slice. Fail-open: any quota fetch error produces `status: "unknown"` and never blocks execution. Block mode (`PFORGE_FOUNDRY_QUOTA_PREFLIGHT=block`) halts execution on `critical` status.
15+
16+
#### Added
17+
- `pforge-mcp/foundry-quota.mjs` — Core quota module. Exports `getDeploymentQuota()` (async REST call to `management.azure.com` Cognitive Services control-plane), `quotaCacheGet` / `quotaCacheSet` (5-minute in-process TTL cache keyed by `sub/rg/account/deployment`), and `compareSliceEstimate()` (synchronous comparator returning `{ status, headroomPct, message }`).
18+
- `pforge-mcp/tests/foundry-quota.test.mjs` — 20 unit tests covering: TTL cache behaviour (get/set/expire/overwrite), missing-param validation, credential/token error paths, all HTTP error codes (401, 403, 429, 503, generic), success path with field parsing, cache-hit skip, network failure / timeout fail-open, and all four `compareSliceEstimate` threshold bands including negative headroom.
19+
- `docs/integrations/foundry-quota-preflight.md` — Operator guide: activation (`PFORGE_FOUNDRY_QUOTA_PREFLIGHT=warn|block`), threshold reference, required Azure RBAC role (**Cognitive Services Usages Reader**), cache behaviour, quota response shape, `az role assignment create` example, and troubleshooting table.
20+
21+
#### Notes
22+
- **Fail-open guarantee**: `timeout`, `rate_limited`, `forbidden`, `network_error`, and all other error reasons return `status: "unknown"` and never block execution regardless of mode.
23+
- **Required RBAC role**: `Cognitive Services Usages Reader` (built-in) on the AOAI account or resource group — read-only `Microsoft.CognitiveServices/*/read`, no data-plane permissions.
24+
- Token scope: `https://management.azure.com/.default` (commercial) or `https://management.azure.us/.default` (Azure Government — detected via endpoint suffix `.azure.us`).
25+
- PTU (provisioned throughput) deployments do not report `tpmCapacity` on this endpoint; those slices receive `status: "unknown"` and proceed normally.
26+
- `costForLeg()` and `priceSlice()` in `cost-service.mjs` are untouched.
27+
- No release in this phase.
28+
29+
---
30+
1231
## [2.92.0] — 2026-05-08 — Docs UX lift (BCDR patterns adopted)
1332

1433
> **One-liner**: Documentation-only minor that adopts three reusable UX patterns from the BCDR-Digital-Twin sibling repo — a book-style manual spine, scroll-snap briefing decks, and an architecture hub — plus a shared design-token layer and site-wide navigation include. 14 slices executed via gh-copilot subscription path in 27.8 minutes; $0.14 declared / $0.00 wall. Zero `pforge-mcp/` or `pforge-master/` code touched.
Lines changed: 185 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,185 @@
1+
# Foundry Quota Preflight
2+
3+
> **Applies to**: Plan Forge v2.92.1-dev+
4+
> **Source**: Phase-FOUNDRY-QUOTA-PREFLIGHT (enterprise-fleet-readiness.md §11.6)
5+
6+
Before Plan Forge sends tokens to your Azure OpenAI / Azure AI Foundry deployment it can
7+
check the deployment's TPM capacity and compare it against the slice token estimate. This
8+
**quota preflight** keeps plan execution from hitting a rate-limit wall mid-run.
9+
10+
---
11+
12+
## How It Works
13+
14+
1. `forge_run_plan` calls `getDeploymentQuota()` at the start of each slice (before the
15+
worker is dispatched).
16+
2. The result is passed to `compareSliceEstimate()`, which classifies headroom as
17+
**safe / warning / critical / unknown**.
18+
3. A `[foundry-quota]` annotation is injected into the slice log. If the status is
19+
`critical`, the orchestrator emits a `quota-warning` event and, when
20+
`PFORGE_FOUNDRY_QUOTA_PREFLIGHT=block` is set, halts execution with an actionable error.
21+
22+
```
23+
[foundry-quota] safe — 68.3% headroom (eastus-prod-gpt-4.1).
24+
Cap=100,000 tpm, used=0 tpm, slice est=31,700 tokens.
25+
```
26+
27+
---
28+
29+
## Prerequisites
30+
31+
- An Azure OpenAI Service or Azure AI Foundry deployment already configured per
32+
`docs/integrations/byo-azure-openai.md`.
33+
- A credential that satisfies `credential.getToken("https://management.azure.com/.default")`:
34+
- **Entra / Managed Identity** — set `AZURE_AUTH_MODE=entra` (requires `@azure/identity`).
35+
- **Service Principal**`AZURE_AUTH_MODE=managed-identity` with env vars
36+
`AZURE_TENANT_ID`, `AZURE_CLIENT_ID`, `AZURE_CLIENT_SECRET`.
37+
38+
> **Required Azure RBAC role**: The identity used must hold the
39+
> **Cognitive Services Usages Reader** role (built-in) on the Azure OpenAI account or its
40+
> resource group. This role grants read-only access to the control-plane quota endpoint
41+
> (`Microsoft.CognitiveServices/accounts/deployments/read`) without allowing any data-plane
42+
> or model-serving operations.
43+
44+
---
45+
46+
## Activation
47+
48+
### Warn-only mode (default)
49+
50+
Set the feature flag — quota checks run, headroom is logged, but execution never blocks:
51+
52+
```bash
53+
export PFORGE_FOUNDRY_QUOTA_PREFLIGHT=warn # or just set the var with any non-empty value
54+
```
55+
56+
Or in `.forge/secrets.json`:
57+
58+
```json
59+
{
60+
"PFORGE_FOUNDRY_QUOTA_PREFLIGHT": "warn"
61+
}
62+
```
63+
64+
### Block mode
65+
66+
Stop the run before a slice that would exceed quota:
67+
68+
```bash
69+
export PFORGE_FOUNDRY_QUOTA_PREFLIGHT=block
70+
```
71+
72+
With `block` mode, execution halts on `critical` status and the following structured error
73+
is returned:
74+
75+
```json
76+
{
77+
"ok": false,
78+
"reason": "quota_preflight_critical",
79+
"message": "[foundry-quota] critical — -3.2% headroom …",
80+
"deployment": "eastus-prod-gpt-4.1"
81+
}
82+
```
83+
84+
### Disable
85+
86+
```bash
87+
unset PFORGE_FOUNDRY_QUOTA_PREFLIGHT # or set to empty string / "false" / "off"
88+
```
89+
90+
---
91+
92+
## Threshold Reference
93+
94+
| Status | Headroom after subtracting current usage + slice estimate |
95+
|---|---|
96+
| `safe` | ≥ 30 % |
97+
| `warning` | 10 – 30 % |
98+
| `critical` | < 10 % (including negative — over-budget) |
99+
| `unknown` | Quota unavailable (fail-open; execution continues) |
100+
101+
**Fail-open guarantee**: any error fetching quota (`timeout`, `rate_limited`, `forbidden`,
102+
`network_error`, etc.) returns `status: "unknown"` and never blocks execution, regardless of
103+
the `PFORGE_FOUNDRY_QUOTA_PREFLIGHT` mode.
104+
105+
---
106+
107+
## Cache Behaviour
108+
109+
Quota values are cached in-process for **5 minutes** (configurable via the `ttlMs`
110+
parameter in `foundry-quota.mjs`). This means:
111+
112+
- A plan with 10 slices hitting the same deployment makes **at most 1** control-plane call
113+
per 5-minute window, not 10.
114+
- If you resize a deployment mid-run, the new capacity is reflected within 5 minutes.
115+
116+
---
117+
118+
## Required Azure Permissions
119+
120+
| Action | Required role |
121+
|---|---|
122+
| Read deployment quota (`GET /deployments/{name}`) | **Cognitive Services Usages Reader** |
123+
| Acquire token for `management.azure.com` | Any Entra identity / service principal |
124+
125+
The `Cognitive Services Usages Reader` role is a built-in Azure role that grants
126+
`Microsoft.CognitiveServices/*/read` without any write or data-plane permissions. Assign it
127+
at the **resource-group** or **subscription** level to cover all AOAI accounts in scope.
128+
129+
```bash
130+
az role assignment create \
131+
--assignee "<service-principal-client-id>" \
132+
--role "Cognitive Services Usages Reader" \
133+
--scope "/subscriptions/<sub-id>/resourceGroups/<rg-name>"
134+
```
135+
136+
---
137+
138+
## Quota Response Shape
139+
140+
`getDeploymentQuota()` returns either a success object or a fail-open error:
141+
142+
```ts
143+
// Success
144+
{
145+
ok: true,
146+
deploymentName: string,
147+
model: string, // e.g. "gpt-4.1"
148+
tpmCapacity: number | null, // tokens-per-minute capacity from control plane
149+
tpmUsage: number | null, // current usage (null = not reported by this endpoint)
150+
ptuCapacity: number | null, // provisioned throughput capacity (future)
151+
ptuUsage: number | null,
152+
sku: string | null,
153+
fetchedAt: string, // ISO 8601 timestamp
154+
}
155+
156+
// Fail-open
157+
{
158+
ok: false,
159+
reason: "missing_required_params" | "no_credential" | "no_token" | "token_error"
160+
| "rate_limited" | "forbidden" | "service_unavailable" | "timeout"
161+
| "network_error" | "http_<code>",
162+
}
163+
```
164+
165+
---
166+
167+
## Troubleshooting
168+
169+
| Symptom | Likely cause | Fix |
170+
|---|---|---|
171+
| `reason: "no_credential"` | `credential` not provided | Set `AZURE_AUTH_MODE=entra` or `managed-identity` |
172+
| `reason: "forbidden"` | Missing RBAC role | Assign **Cognitive Services Usages Reader** to the identity |
173+
| `reason: "rate_limited"` | Too many control-plane calls | Cache TTL is already 5 min; check for multiple concurrent workers |
174+
| `reason: "timeout"` | Control-plane slow or unreachable | Check network connectivity to `management.azure.com`; quota check fails open |
175+
| `status: "unknown"` on every slice | Any of the above | Execution continues; review the `[foundry-quota]` log annotation for the `reason` field |
176+
| `tpmCapacity: null` | Deployment uses PTU (provisioned) | PTU capacity is not reported on the same endpoint; status will be `unknown` |
177+
178+
---
179+
180+
## Related Docs
181+
182+
- `docs/integrations/byo-azure-openai.md` — BYO AOAI / Foundry provider setup
183+
- `docs/integrations/foundry-toolbox-mcp.md` — Foundry Toolbox MCP server wiring
184+
- `pforge-mcp/foundry-quota.mjs` — Implementation (`getDeploymentQuota`, `compareSliceEstimate`, cache)
185+
- `pforge-mcp/tests/foundry-quota.test.mjs` — 20 unit tests covering all error codes and threshold boundaries

0 commit comments

Comments
 (0)