A secure LLM routing service designed to run within AWS Nitro Enclave TEE (Trusted Execution Environment). Provides cryptographically verifiable LLM responses with remote attestation, enabling clients to prove that responses were generated inside a trusted enclave and were not tampered with.
When using third-party LLM providers, you typically must trust:
- The routing service operator isn't modifying your requests/responses
- Responses actually came from the claimed LLM provider
- Your requests weren't logged or intercepted
The gateway solves this by running inside a hardware-isolated Nitro Enclave where:
- Every response is cryptographically signed with a key generated inside the enclave
- The signing key is bound to remote attestation proving the enclave's code integrity
- Clients can verify signatures to ensure responses weren't tampered with
- Multi-provider routing - OpenAI, Anthropic, Google Gemini, xAI Grok, ByteDance (BytePlus ModelArk)
- Remote attestation - AWS Nitro attestation documents with PCR measurements
- Response signing - RSA-PSS signatures on all inference results
- Request integrity - SHA256 hash of original request included in signed response
- Streaming support - SSE streaming for chat completions
- Tool/function calling - Full support for LLM tool use
| Provider | Models |
|---|---|
| OpenAI | gpt-4.1, gpt-5, gpt-5-mini, o4-mini |
| Anthropic | claude-sonnet-4-5, claude-sonnet-4-6, claude-haiku-4-5, claude-opus-4-5, claude-opus-4-6 |
| gemini-2.5-flash, gemini-2.5-flash-lite, gemini-2.5-pro, gemini-3-pro-preview, gemini-3-flash-preview | |
| xAI | grok-4, grok-4-fast, grok-4-1-fast, grok-4-1-fast-non-reasoning |
| ByteDance | seed-1.6, seed-1.8, seed-2.0-lite |
# Install dependencies
pip install -r requirements.txt
# Set API keys
export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-...
export GOOGLE_API_KEY=...
export XAI_API_KEY=...
export ARK_API_KEY=... # BytePlus / ByteDance ModelArk
# Run server (starts the Flask/connexion app on port 8000)
make test-local
# or: python3 -m tee_gateway# Chat completion
curl -X POST http://127.0.0.1:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4.1",
"messages": [{"role": "user", "content": "Hello!"}],
"temperature": 0.7
}'
# Streaming
curl -X POST http://127.0.0.1:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-N \
-d '{
"model": "gemini-2.5-flash",
"messages": [{"role": "user", "content": "Write a haiku"}],
"stream": true
}'
# Text completion
curl -X POST http://127.0.0.1:8000/v1/completions \
-H "Content-Type: application/json" \
-d '{
"model": "claude-3.7-sonnet",
"prompt": "Explain quantum computing in one sentence"
}'Requires an EC2 instance with Nitro Enclave support (e.g., m5.xlarge with enclave enabled).
# Build enclave image
make image
# Build EIF and run enclave
make runThe enclave runs with:
- 2 CPUs
- 8GB memory
- Port 443 (HTTPS via nitriding)
- Port 8000 (internal server)
PCR (Platform Configuration Register) measurements uniquely fingerprint the enclave image — they change whenever the code or build environment changes. They are automatically written to measurements.txt by scripts/run-enclave.sh when the enclave starts.
The measurements.txt checked into this repository reflects the OpenGradient-operated deployment. If you build and run your own enclave image, your PCR values will differ. After running make run, your measurements.txt will be updated with your enclave's measurements. Share this file with your clients so they can verify attestation documents match your specific build.
| Endpoint | Method | Description |
|---|---|---|
/health |
GET | Health check (status, version, tee_enabled) |
/enclave/attestation?nonce={nonce} |
GET | Nitro-enclave TEE attestation with public key hash and PCR information |
/signing-key |
GET | TEE public key (PEM format) and tee_id |
/v1/completions |
POST | Text completion (signed) |
/v1/chat/completions |
POST | Chat completion (signed) |
/v1/ohttp |
POST | Anonymous chat completion (OHTTP-encapsulated, relay-paid) |
/v1/ohttp/config |
GET | HPKE key configuration (RFC 9458) for OHTTP clients |
{
"model": "gpt-4.1",
"messages": [
{"role": "system", "content": "You are helpful."},
{"role": "user", "content": "Hello!"}
],
"temperature": 0.7,
"max_tokens": 100,
"tools": [...] // optional
}{
"id": "chatcmpl-...",
"object": "chat.completion",
"created": 1747000000,
"model": "gpt-4.1",
"choices": [{
"index": 0,
"message": {"role": "assistant", "content": "Hello! How can I help?"},
"finish_reason": "stop"
}],
"usage": {
"prompt_tokens": 10,
"completion_tokens": 8,
"total_tokens": 18
},
"tee_signature": "PLyCgScL1Jr6OSb7wazEbor4yhBYJpau...",
"tee_request_hash": "3cd5e62557ea16dc77aef5c2c66188d1...",
"tee_output_hash": "a7f3d91c4b08e2f50c3a6d8e...",
"tee_timestamp": 1747000000,
"tee_id": "0x4a2b..."
}The tee_* fields provide cryptographic proof of the response:
tee_request_hash— keccak256 of the canonicalized request JSON (proves input wasn't modified)tee_output_hash— keccak256 of the response content (proves output wasn't modified)tee_signature— RSA-PSS-SHA256 signature overkeccak256(requestHash || outputHash || timestamp)tee_timestamp— Unix timestamp when the response was signed (proves freshness)tee_id— keccak256 of the enclave's DER-encoded public key (stable identifier for this enclave instance)
/v1/ohttp is a thin wrapper around /v1/chat/completions that adds client unlinkability via RFC 9458 OHTTP + draft-ietf-ohai-chunked-ohttp-08. HPKE ciphersuite is fixed: DHKEM(X25519,HKDF-SHA256) / HKDF-SHA256 / ChaCha20-Poly1305.
Flow:
- Client fetches
/v1/ohttp/config(HPKE pubkey, key_id, suite IDs) and verifies it against the Nitro attestation. - Client HPKE-encapsulates a normal chat-completion JSON body and POSTs the ciphertext to a relay. The client carries no payment material.
- Relay forwards the ciphertext to
/v1/ohttpand attaches its ownX-Payment: <x402 payload>header./v1/ohttpis the x402-paid boundary — verification and settlement happen on this outer request, against the relay's payment. - Enclave decrypts → re-issues the request in-process to
/v1/chat/completionsagainst the pre-x402 WSGI app (so connexion routing, validation, TEE signing and the LLM call still run, but x402 does not fire a second time and the relay'sX-Paymentis not forwarded into the inner dispatch) → response is sealed back to the client.
Two response modes (chosen by the inner stream flag):
| Mode | Outer content-type | Body |
|---|---|---|
stream=false |
message/ohttp-res |
Single-shot sealed body (RFC 9458 §4.5) |
stream=true |
message/ohttp-chunked-res |
response_nonce || (varint(len) || sealed_ct)+ || varint(0) || sealed_final_ct — one OHTTP chunk per SSE event, AAD=b"final" on the last chunk (chunked-ohttp draft §3) |
Billing channel for the relay. Both modes settle the actual cost via x402 against the relay's X-Payment (upto scheme); the gateway is the source of truth for the amount.
stream=false: outer response exposes billing/cost headers —X-Inference-Cost-OPG,X-Inference-Cost-USD,X-Inference-Price-OPG-USD— for the relay's own bookkeeping. Per-tokenusagedetail is carried in the sealed body for the client, not in outerX-Usage-*headers.stream=true: no per-token detail in outer headers (they're flushed before any body chunk, so we can't know token counts at header-write time) and the sealed chunks are opaque to the relay. The relay reads the actual settled amount from x402 — either by querying the facilitator with itsX-Upto-Session, or viaX-Payment-Responseon its next call. The client still sees per-token detail in the final SSE event inside the decrypted stream.
On non-2xx (e.g. 402 payment required) the body is forwarded plaintext so the relay can read x402 payment requirements and retry — those bodies never contain prompts or completions.
Trust split:
- Relay terminates the client's TCP/TLS connection, so it does see the client's IP — that's unavoidable. What it doesn't see is content: only OHTTP ciphertext + its own wallet's
x-paymentmaterial + the outer billing/cost headers used to settle and reconcile charges. - Enclave sees plaintext prompts/completions (it has to run the LLM call) but at the network layer only sees the relay's IP, never the client's. This is the unlinkability claim — the enclave can't tie a plaintext request to a specific end user.
- Client decrypts and verifies the TEE signature embedded in the response body against the attested public key.
Unlinkability between a client identity and a plaintext request holds unless relay and enclave collude (the relay would have to share its client-IP log alongside the enclave's plaintext log). Streaming additionally leaks per-chunk timing and length — clients who can't accept that signal should use stream=false.
Get the attestation document and verify it against AWS Nitro root certificate:
curl https://your-enclave:443/enclave/attestation?nonce=your-nonceSee examples/verify_attestation.py for full verification including:
- PCR measurement validation
- Certificate chain verification
- Nonce verification
- Public key extraction
After getting a response, verify the signature using the attested public key:
import base64, json
from eth_hash.auto import keccak
from cryptography.hazmat.primitives.asymmetric import padding
from cryptography.hazmat.primitives import hashes, serialization
# Load attested public key (from /signing-key endpoint)
public_key = serialization.load_pem_public_key(public_key_pem.encode())
# Reconstruct the msg_hash the server signed:
# keccak256(abi.encodePacked(inputHash, outputHash, timestamp))
request_hash = bytes.fromhex(response["tee_request_hash"])
output_hash = bytes.fromhex(response["tee_output_hash"])
timestamp_bytes = response["tee_timestamp"].to_bytes(32, "big")
msg_hash = keccak(request_hash + output_hash + timestamp_bytes)
# Verify RSA-PSS-SHA256 signature (salt_length=32 matches server)
public_key.verify(
base64.b64decode(response["tee_signature"]),
msg_hash,
padding.PSS(
mgf=padding.MGF1(hashes.SHA256()),
salt_length=32,
),
hashes.SHA256(),
)See examples/verify_signature_example.py for a complete example.
The tee_request_hash proves your original request wasn't modified:
from eth_hash.auto import keccak
import json
# Canonical request (same fields the server serializes, sorted keys)
original_request = {
"model": "gpt-4.1",
"messages": [{"role": "user", "content": "Hello!"}],
"temperature": 0.7,
}
request_bytes = json.dumps(original_request, sort_keys=True).encode()
computed_hash = keccak(request_bytes).hex()
assert computed_hash == response["tee_request_hash"]┌─────────────────────────────────────────────────────────────┐
│ Nitro Enclave │
│ ┌─────────────────┐ ┌─────────────────────────────────┐ │
│ │ nitriding │ │ tee_gateway/ │ │
│ │ (TLS/443) │───▶│ TEEKeyManager (RSA keys) │ │
│ │ │ │ LangChain routing │ │
│ │ /enclave/* │ │ Response signing │ │
│ └─────────────────┘ └─────────────────────────────────┘ │
│ │ │ │
│ │ Register key hash │ LLM API calls │
│ ▼ ▼ │
│ PCR measurements OpenAI/Anthropic/etc │
└─────────────────────────────────────────────────────────────┘
│
│ HTTPS (port 443)
▼
gvproxy (EC2 host) ◀──── Internet
Flow:
- On startup,
TEEKeyManagergenerates RSA-2048 keypair - Public key hash registered with nitriding for attestation binding
- Incoming requests routed to LLM provider via LangChain
- Response signed with private key (includes request hash + timestamp)
- Clients verify attestation → get public key → verify signatures
This gateway uses x402 micropayments for access control. Clients pay per request using on-chain EVM transactions (USDC or OPG on supported networks).
To operate your own gateway:
- Set
EVM_PAYMENT_ADDRESSto your wallet address in.env - Set
FACILITATOR_URLto point to your facilitator service (or use the default) - Configure payment amounts in
tee_gateway/definitions.py(CHAT_COMPLETIONS_USDC_AMOUNT, etc.)
Clients use an x402-compatible client (e.g., the x402 SDK) to authorize payments and include them in request headers.
| Variable | Default | Description |
|---|---|---|
API_SERVER_PORT |
8000 | Internal server port |
API_SERVER_HOST |
0.0.0.0 | Server bind address |
OPENAI_API_KEY |
- | OpenAI API key |
ANTHROPIC_API_KEY |
- | Anthropic API key |
GOOGLE_API_KEY |
- | Google AI API key |
XAI_API_KEY |
- | xAI API key |
ARK_API_KEY |
- | BytePlus / ByteDance ModelArk API key (injected as bytedance_api_key) |
EVM_PAYMENT_ADDRESS |
- | Wallet address to receive x402 payments |
FACILITATOR_URL |
see tee_gateway/__main__.py |
x402 payment facilitator endpoint |
API keys can also be injected at runtime via POST /v1/keys (preferred for TEE deployments to avoid baking secrets into the image).
See LICENSE file for details.