Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
91e887e
Add OHTTP-style anonymous inference endpoint
claude May 13, 2026
5dcbdc8
Update test_ohttp.py
adambalogh May 15, 2026
a8c5c89
lint
May 15, 2026
6425c31
Add OHTTP anonymous chat completions with x402 payment integration (#71)
adambalogh May 15, 2026
e1a7204
Relay-pays OHTTP: x-payment from outer header, surface usage to relay
claude May 15, 2026
9d55a8e
Potential fix for pull request finding
adambalogh May 15, 2026
38419fd
Chunked OHTTP: stream SSE inference responses end-to-end
claude May 16, 2026
b34baa7
README: document /v1/ohttp anonymous inference + chunked streaming
claude May 16, 2026
9fc7134
Add scripts/test_ohttp.py local smoke-test client
claude May 16, 2026
fea4d01
Forward Authorization header on OHTTP sub-request
claude May 16, 2026
b12e70f
OHTTP: use a fixed dummy Authorization on the inner sub-request
claude May 16, 2026
58908aa
Add TEE_GATEWAY_DEV_SKIP_X402 dev escape hatch
claude May 16, 2026
d18be75
test_ohttp.py: dump the outgoing OHTTP request so you can eyeball it
claude May 16, 2026
366d2f5
Revert "Add TEE_GATEWAY_DEV_SKIP_X402 dev escape hatch"
claude May 16, 2026
87eec3c
tee_manager: refuse to register if HPKE pubkey is missing
claude May 16, 2026
4dbd35e
ohttp: normalize decap failures to ValueError with a generic message
claude May 16, 2026
30e8737
ohttp_controller: fix inaccurate privacy claim in docstring
claude May 16, 2026
2eae9e9
README: fix the Anonymous Inference trust-split wording
claude May 16, 2026
deae243
docs: clarify how usage stats reach the relay vs. how x402 settles
claude May 16, 2026
de38f89
ci: add tee_gateway/test/test_ohttp.py to the CI test list
claude May 16, 2026
879e953
ci: discover tee_gateway/test/ as a directory
claude May 16, 2026
2a0c4b5
ohttp_controller: preserve duplicate forwarded headers (no dict colla…
claude May 16, 2026
ac51a35
ohttp_controller: drop unused OHTTP_MEDIA_TYPE constant
claude May 16, 2026
fe780d0
pricing
May 16, 2026
678ab00
size limit
May 16, 2026
c2be6e1
lint
May 16, 2026
c81fe6e
Potential fix for pull request finding
adambalogh May 16, 2026
46ffb56
Potential fix for pull request finding
adambalogh May 16, 2026
f211297
usage
May 16, 2026
399c0bb
todo
May 16, 2026
7cf86da
simplify pricing
May 16, 2026
2130650
cost
May 16, 2026
5f7942e
Potential fix for pull request finding
adambalogh May 16, 2026
625354c
controller test
May 16, 2026
16d9fcb
Merge branch 'claude/anonymous-inference-privacy-SgzWN' of github.com…
May 16, 2026
dd19905
lint
May 16, 2026
61471cd
Potential fix for pull request finding
adambalogh May 16, 2026
d44ab8a
Potential fix for pull request finding
adambalogh May 16, 2026
95a17ee
OHTTP updates (#74)
dixitaniket May 18, 2026
715b075
readme
May 18, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 6 additions & 1 deletion .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,12 @@ jobs:
- uses: actions/checkout@v4
- uses: astral-sh/setup-uv@v5
- name: Run unit tests
run: uv run --group test pytest tee_gateway/test/test_tool_forwarding.py tee_gateway/test/test_tee_core.py tee_gateway/test/test_price_feed.py tests/test_pricing.py -v --import-mode=importlib
# Discover the whole tee_gateway/test/ directory so new test files
# aren't silently excluded from CI — the previous explicit list had
# left test_chat_controller, test_completions_controller, and
# test_ohttp out. test_price_feed_integration.py self-gates on
# RUN_INTEGRATION_TESTS and skips cleanly without it.
run: uv run --group test pytest tee_gateway/test/ tests/test_pricing.py tests/test_opengradient_field.py -v --import-mode=importlib
# To also run integration tests (real CoinGecko network calls), add:
# env:
# RUN_INTEGRATION_TESTS: "1"
35 changes: 35 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -121,6 +121,8 @@ The `measurements.txt` checked into this repository reflects the OpenGradient-op
| `/signing-key` | GET | TEE public key (PEM format) and tee_id |
| `/v1/completions` | POST | Text completion (signed) |
| `/v1/chat/completions` | POST | Chat completion (signed) |
| `/v1/ohttp` | POST | Anonymous chat completion (OHTTP-encapsulated, relay-paid) |
| `/v1/ohttp/config` | GET | HPKE key configuration (RFC 9458) for OHTTP clients |

### Request Format

Expand Down Expand Up @@ -170,6 +172,39 @@ The `tee_*` fields provide cryptographic proof of the response:
- **`tee_timestamp`** — Unix timestamp when the response was signed (proves freshness)
- **`tee_id`** — keccak256 of the enclave's DER-encoded public key (stable identifier for this enclave instance)

## Anonymous Inference (Oblivious HTTP)

`/v1/ohttp` is a thin wrapper around `/v1/chat/completions` that adds **client unlinkability** via [RFC 9458 OHTTP](https://www.rfc-editor.org/rfc/rfc9458) + [draft-ietf-ohai-chunked-ohttp-08](https://datatracker.ietf.org/doc/draft-ietf-ohai-chunked-ohttp/). HPKE ciphersuite is fixed: DHKEM(X25519,HKDF-SHA256) / HKDF-SHA256 / ChaCha20-Poly1305.

**Flow:**

1. Client fetches `/v1/ohttp/config` (HPKE pubkey, key_id, suite IDs) and verifies it against the Nitro attestation.
2. Client HPKE-encapsulates a normal chat-completion JSON body and POSTs the ciphertext to a **relay**. The client carries no payment material.
3. Relay forwards the ciphertext to `/v1/ohttp` and attaches its own `X-Payment: <x402 payload>` header. **`/v1/ohttp` is the x402-paid boundary** — verification and settlement happen on this outer request, against the relay's payment.
4. Enclave decrypts → re-issues the request in-process to `/v1/chat/completions` against the pre-x402 WSGI app (so connexion routing, validation, TEE signing and the LLM call still run, but x402 does **not** fire a second time and the relay's `X-Payment` is **not** forwarded into the inner dispatch) → response is sealed back to the client.

Comment on lines +181 to +185
**Two response modes** (chosen by the inner `stream` flag):

| Mode | Outer content-type | Body |
|------|---|---|
| `stream=false` | `message/ohttp-res` | Single-shot sealed body (RFC 9458 §4.5) |
| `stream=true` | `message/ohttp-chunked-res` | `response_nonce \|\| (varint(len) \|\| sealed_ct)+ \|\| varint(0) \|\| sealed_final_ct` — one OHTTP chunk per SSE event, AAD=`b"final"` on the last chunk (chunked-ohttp draft §3) |

**Billing channel for the relay.** Both modes settle the actual cost via x402 against the relay's `X-Payment` (`upto` scheme); the gateway is the source of truth for the amount.

- `stream=false`: outer response exposes billing/cost headers — `X-Inference-Cost-OPG`, `X-Inference-Cost-USD`, `X-Inference-Price-OPG-USD` — for the relay's own bookkeeping. Per-token `usage` detail is carried in the sealed body for the client, not in outer `X-Usage-*` headers.
- `stream=true`: **no** per-token detail in outer headers (they're flushed before any body chunk, so we can't know token counts at header-write time) and the sealed chunks are opaque to the relay. The relay reads the actual settled amount from x402 — either by querying the facilitator with its `X-Upto-Session`, or via `X-Payment-Response` on its next call. The client still sees per-token detail in the final SSE event inside the decrypted stream.

On non-2xx (e.g. 402 payment required) the body is forwarded plaintext so the relay can read x402 payment requirements and retry — those bodies never contain prompts or completions.

**Trust split:**

- **Relay** terminates the client's TCP/TLS connection, so it does see the client's IP — that's unavoidable. What it doesn't see is content: only OHTTP ciphertext + its own wallet's `x-payment` material + the outer billing/cost headers used to settle and reconcile charges.
- **Enclave** sees plaintext prompts/completions (it has to run the LLM call) but at the network layer only sees the relay's IP, never the client's. This is the unlinkability claim — the enclave can't tie a plaintext request to a specific end user.
- **Client** decrypts and verifies the TEE signature embedded in the response body against the attested public key.

Unlinkability between a client identity and a plaintext request holds unless relay and enclave collude (the relay would have to share its client-IP log alongside the enclave's plaintext log). Streaming additionally leaks per-chunk timing and length — clients who can't accept that signal should use `stream=false`.

## Verification

### 1. Verify Attestation
Expand Down
1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ dependencies = [
"psutil>=7.2.1",
"python-dotenv>=1.2.1",
"eth-account>=0.13.0",
"pyhpke>=0.6.0",
]

[dependency-groups]
Expand Down
Loading
Loading