+ "details": "## Summary\n\nAll rate limit buckets for a single entity share the same DynamoDB partition key (`namespace/ENTITY#{id}`). A high-traffic entity can exceed DynamoDB's per-partition throughput limits (~1,000 WCU/sec), causing throttling that degrades service for that entity — and potentially co-located entities in the same partition.\n\n## Details\n\nEach `acquire()` call performs a `TransactWriteItems` (or `UpdateItem` in speculative mode) against items sharing the same partition key. For cascade entities, this doubles to 2-4 writes per request (child + parent). At sustained rates above ~500 req/sec for a single entity, DynamoDB's adaptive capacity may not redistribute fast enough, causing `ProvisionedThroughputExceededException`.\n\nThe library has no built-in mitigation:\n- No partition key sharding/salting\n- No write coalescing or batching\n- No client-side admission control before hitting DynamoDB\n- `RateLimiterUnavailable` is raised but the caller has already been delayed\n\n## Impact\n\n- **Availability**: High-traffic entities experience elevated latency and rejected requests beyond what their rate limits specify\n- **Fairness**: Other entities sharing the same DynamoDB partition may experience collateral throttling\n- **Multi-tenant risk**: In a shared LLM proxy scenario, one tenant's burst traffic could degrade service for others\n\n## Reproduction\n\n1. Create an entity with high rate limits (e.g., 100,000 rpm)\n2. Send sustained traffic at 1,000+ req/sec to a single entity\n3. Observe DynamoDB `ThrottledRequests` CloudWatch metric increasing\n4. Observe `acquire()` latency spikes and `RateLimiterUnavailable` exceptions\n\n## Remediation Design: Pre-Shard Buckets\n\n- Move buckets to `PK={ns}/BUCKET#{entity}#{resource}#{shard}, SK=#STATE` — one partition per (entity, resource, shard)\n- Auto-inject `wcu:1000` reserved limit on every bucket — tracks DynamoDB partition write pressure in-band (name may change during implementation)\n- Shard doubling (1→2→4→8) triggered by client on `wcu` exhaustion or proactively by aggregator\n- Shard 0 at suffix `#0` is source of truth for `shard_count`. Aggregator propagates to other shards\n- Original limits stored on bucket, effective limits derived: `original / shard_count`. Infrastructure limits (`wcu`) not divided\n- Shard selection: random/round-robin. On application limit exhaustion, retry on another shard (max 2 retries)\n- Lazy shard creation on first access\n- Bucket discovery via GSI3 (KEYS_ONLY) + BatchGetItem. GSI2 for resource aggregation unchanged\n- Cascade: parent unaware, protected by own `wcu`\n- Aggregator: parse new PK format, key by shard_id, effective limits for refill, filter `wcu` from snapshots\n- Clean break migration: schema version bump, old buckets ignored, new buckets created on first access\n- **$0.625/M preserved on hot path**",
0 commit comments