feat(provider): add LiteLLM as embedded AI gateway provider by RheagalFire · Pull Request #782 · evalstate/fast-agent

RheagalFire · 2026-05-01T18:03:13Z

Closes #127.

Summary

Adds LiteLLM as a first-class provider alongside the existing Anthropic / OpenAI / Google / Azure / Bedrock / TensorZero options. Embedded SDK mode (no proxy required) — every call goes through litellm.acompletion(model=...), which routes to 100+ underlying providers (Anthropic, OpenAI, AWS Bedrock, Vertex AI, Cohere, Mistral, Groq, Perplexity, Together, Fireworks, Cerebras, Databricks, IBM Watsonx, AI21, Replicate, DeepInfra, NVIDIA NIM, xAI, Sambanova, …) using each backing's standard auth conventions.

The interactive model picker gets a new LiteLLM [available] (20 curated) row at the top of the Providers column. Selecting it shows 20 curated entries spanning the major backings; pressing c swaps to the "all" scope which discovers ~2.3k models from litellm.models_by_provider at runtime. The ✓/✗ marker on each row reflects whether that backing's credentials are present in the user's environment, so users see at a glance which models will actually chat vs. which will fail at runtime with a missing-key error.

📸 Wizard preview — LiteLLM row at the top of the Providers column with curated models in the right panel

Three credential modes work today (precedence: env → config → proxy):

# Path 1 — env vars (LiteLLM convention; zero config)
# export ANTHROPIC_API_KEY=sk-ant-...
# export OPENAI_API_KEY=sk-...
# export COHERE_API_KEY=...
# (etc. for any backing)

# Path 2 — fastagent.config.yaml backing-provider sections
# (auto-bridged into env vars at LiteLLMLLM init so litellm.acompletion picks them up)
anthropic:
  api_key: sk-ant-...
openai:
  api_key: sk-...

# Path 3 — LiteLLM proxy server (multi-tenant deployments)
litellm:
  api_base: http://localhost:4000
  api_key: sk-fastagent-proxy-1234

Quickstart (60 seconds)

# 1. Install
pip install fast-agent-mcp[litellm]

# 2. Set the credential for whatever backing you want to call
export ANTHROPIC_API_KEY=sk-ant-...      # or OPENAI_API_KEY, COHERE_API_KEY, etc.

# 3. Open the picker
fast-agent go

#   In the picker:
#   - LiteLLM is the first row in the Providers column
#   - Press → to focus the Models column
#   - Press c to toggle scope between curated (20 popular models) and all (~2.3k)
#   - Press Enter on the row you want; chat starts immediately

# Or skip the picker entirely:
fast-agent go --model=litellm.anthropic/claude-sonnet-4-6 --message="hello"

Why

A LiteLLM provider lets users:

Call providers fast-agent doesn't have native support for (Cohere, Mistral, Together, Replicate, DeepInfra, Fireworks, Cerebras, Sambanova, IBM Watsonx, NVIDIA NIM, AI21, Perplexity, Databricks, Cloudflare Workers AI, Vercel AI Gateway, …) without one-off provider PRs.
Switch a model spec across providers by changing a single string (litellm.openai/gpt-4o → litellm.anthropic/claude-sonnet-4-6 → litellm.bedrock/...) without touching any other config.
Use existing LiteLLM proxy infrastructure (centralized key management, audit logging, cost tracking, rate limiting, fallback chains) by setting two config fields.

The picker UX (curated + dynamic discovery + per-row availability markers) keeps fast-agent's "model selection is painless" story intact — users don't have to memorize 2k+ model names.

Prior art

Issue [#127](feature: add support for LiteLLM Provider #127) — "feature: add support for LiteLLM Provider".
PR [#129](LiteLLM+LangFuse Support: Enhancing how request_params are overridden #129) (theobjectivedad, merged) — request_params deep-merge for LiteLLM+LangFuse downstream users; this PR builds on the same primary use case.

Installation & usage

Install

pip install fast-agent-mcp[litellm]            # persistent install
pip install fast-agent-mcp[all-providers]      # everything
uvx --with litellm fast-agent                  # one-shot via uvx

Path 1 — env vars (simplest, zero config)

The LiteLLM SDK reads each backing's standard env var (ANTHROPIC_API_KEY, OPENAI_API_KEY, COHERE_API_KEY, MISTRAL_API_KEY, …). Export whichever backings you want to use, then:

export ANTHROPIC_API_KEY=sk-ant-...
fast-agent go

In the picker, navigate to LiteLLM, focus the Models column, pick Anthropic Claude Sonnet → litellm.anthropic/claude-sonnet-4-6 (✓ marker), Enter, chat.

Path 2 — config file (no shell exports)

fastagent.config.yaml:

anthropic:
  api_key: sk-ant-...
  base_url: https://api.anthropic.com   # optional, for proxies/Foundry/Bedrock-compat
openai:
  api_key: sk-...
google:                                  # → GEMINI_API_KEY for litellm.gemini/...
  api_key: AIza...

Reads from the same <provider>: sections fast-agent already uses for native providers. LiteLLMLLM.__init__ bridges them into env vars at startup so the LiteLLM SDK's per-backing auth resolution picks them up. Env vars always win if both are set.

Bridged backings: anthropic → ANTHROPIC_API_KEY / ANTHROPIC_BASE_URL, openai → OPENAI_API_KEY / OPENAI_BASE_URL, google → GEMINI_API_KEY, xai → XAI_API_KEY, groq → GROQ_API_KEY, deepseek → DEEPSEEK_API_KEY, openrouter → OPENROUTER_API_KEY. Other backings (Cohere, Mistral, Perplexity, Bedrock, Vertex, …) still use env vars or the proxy.

Path 3 — LiteLLM proxy server (centralized auth)

Run a LiteLLM proxy with model deployments:

# litellm_proxy.yaml
model_list:
  - model_name: claude-sonnet-4-6
    litellm_params:
      model: anthropic/claude-sonnet-4-6
      api_key: os.environ/ANTHROPIC_API_KEY
general_settings:
  master_key: sk-fastagent-proxy-1234

litellm --config litellm_proxy.yaml --port 4000

Point fast-agent at it:

# fastagent.config.yaml
litellm:
  api_base: http://localhost:4000
  api_key: sk-fastagent-proxy-1234

fast-agent go --model=litellm/claude-sonnet-4-6

In proxy mode the model spec is litellm/<deployment-name> (the name from the proxy's model_list), not the upstream litellm.<backing>/<model> shape.

Optional `litellm:` config knobs

litellm:
  api_key:        sk-...                    # for proxy mode only
  api_base:       http://localhost:4000     # for proxy mode only
  default_model:  anthropic/claude-sonnet-4-6   # used when --model is omitted
  drop_params:    true                      # default: true; lets LiteLLM strip unsupported kwargs per backing
  extra_kwargs:                             # forwarded verbatim to litellm.acompletion
    metadata:
      tags: ["fast-agent"]
  default_headers:                          # forwarded as extra_headers
    X-Trace: my-trace-id

Architecture

LiteLLMLLM extends the existing OpenAILLM and only swaps the underlying client. LiteLLM normalizes every backing's response into OpenAI's ChatCompletion shape, so the existing OpenAI streaming, tool-call accumulation, structured-output, reasoning-effort, and cache-token handling all work unchanged.

fast-agent agent
       │
       ▼
LiteLLMLLM (extends OpenAILLM)
       │  reuses _process_stream_manual, _prepare_api_request, tool/structured-output
       ▼
_LiteLLMClientShim   (AsyncOpenAI-shaped; only chat.completions.create needed)
       │
       ▼
litellm.acompletion(model=..., **kwargs)
       │
       ▼
[Anthropic | OpenAI | Bedrock | Vertex | Cohere | Mistral | ...]

The shim is intentionally narrow — it implements only __aenter__ / __aexit__ and chat.completions.create. Adding the full AsyncOpenAI surface (files, embeddings, audio, etc.) is left out of scope; calls to those raise NotImplementedError so the failure mode is loud rather than silent.

For wizard rendering, per-row credential markers use a static map (_LITELLM_BACKING_ENV_KEYS) instead of litellm.validate_environment(...) because the latter triggers OAuth device-code flows for some backings (e.g. github_copilot) and would block the picker render for 60+ seconds.

Files

src/fast_agent/llm/provider/litellm/__init__.py (1) — package marker.
src/fast_agent/llm/provider/litellm/llm_litellm.py (193) — LiteLLMLLM(OpenAILLM), _LiteLLMClientShim, _bridge_fastagent_config_to_litellm_env.
src/fast_agent/llm/provider_types.py (+1) — LITELLM = ("litellm", "LiteLLM").
src/fast_agent/llm/model_factory.py (+5) — provider-class dispatch.
src/fast_agent/llm/provider_key_manager.py (+13) — keyless registration; LITELLM_API_KEY is read for proxy mode but isn't required.
src/fast_agent/llm/provider_model_catalog.py (+47) — LiteLLMModelCatalogAdapter returns the full LiteLLM catalog (~2.3k specs).
src/fast_agent/llm/model_selection.py (+118) — 20 curated entries spanning 16 backings.
src/fast_agent/ui/model_picker.py (+10) — per-row ✓/✗ marker honors model.backing_available override.
src/fast_agent/ui/model_picker_common.py (+135) — Provider.LITELLM first in PICKER_PROVIDER_ORDER, _provider_is_active, litellm_backing_creds_present static map, ModelOption.backing_available plumbing into curated + dynamic discovery paths.
src/fast_agent/llm/provider/openai/llm_openai.py (+1) — adds Provider.LITELLM to providers using _process_stream_manual (LiteLLM chunks omit fields like delta.refusal).
src/fast_agent/config.py (+48) — LiteLLMSettings schema.
pyproject.toml (+5) — new litellm = ["litellm>=1.60,<1.85"] optional extra; also added to all-providers.
README.md (+1 phrase) — providers paragraph mentions the new optional extra.
tests/unit/fast_agent/llm/test_litellm_provider.py (new, 280) — 24 tests.
tests/unit/fast_agent/llm/test_model_factory.py (+5) — skip LiteLLM in test_curated_catalog_aliases_are_parseable (same pattern as anthropic-vertex; LiteLLM aliases intentionally mirror native short names).

Tests

Unit tests (24 / 24 pass)

$ pytest tests/unit/fast_agent/llm/test_litellm_provider.py -v
24 passed in 1.04s

Coverage:

Provider enum + factory dispatch + Provider.LITELLM in PICKER_PROVIDER_ORDER
_provider_is_active true when litellm importable, false when not
Keyless API-key handling; env + config api-key precedence
Dynamic catalog adapter shape (>1k models, prefixed correctly, popular specs round-trip cleanly, returns empty when litellm is missing)
Per-row backing-available flag (env present / env absent / non-litellm spec returns None)
Shim is async-context-manager
api_base / api_key / timeout / drop_params / extra_headers forwarded to litellm.acompletion
Caller-supplied kwargs win over shim defaults
Config bridge to env vars: sets when missing, doesn't overwrite when present, handles None config

Regression (1499 / 1499 adjacent unit tests pass)

$ pytest tests/unit/fast_agent/llm/ tests/unit/fast_agent/ui/ -q
1499 passed in 8.76s

Type checking (ty) clean on touched files

$ ty check src/fast_agent/llm/provider/litellm/ \
    src/fast_agent/llm/provider_types.py \
    src/fast_agent/llm/provider_model_catalog.py \
    src/fast_agent/llm/provider_key_manager.py \
    src/fast_agent/ui/model_picker_common.py \
    src/fast_agent/ui/model_picker.py
All checks passed!

One narrow # ty: ignore[invalid-method-override] on LiteLLMLLM._openai_client (intentional — the shim is a duck-typed substitute for AsyncOpenAI; OpenAILLM only calls chat.completions.create on it).

Lint clean

$ ruff check <touched files>
All checks passed!
$ ruff format --check <touched files>
files already formatted

Live E2E

Path 1 — env vars + wizard pick:

$ ANTHROPIC_API_KEY=... ANTHROPIC_BASE_URL=... fast-agent go
   [picker: navigate to LiteLLM (top of Providers column), pick Anthropic Claude Sonnet]

▎▶ agent ────────
hi there, how are you
▎◀ agent claude-sonnet-4-6
Hi! I'm doing well, thanks for asking! ...

agent  108 input / 82 output / 190 total

Path 2 — config file, no env vars exported:

$ cat fastagent.config.yaml
anthropic:
  api_key: ...
  base_url: ...
$ unset ANTHROPIC_API_KEY ANTHROPIC_BASE_URL
$ fast-agent go --model=litellm.anthropic/claude-sonnet-4-6 \
    --message="who are you?"
... real reply, same model ...

Path 3 — LiteLLM proxy: verified locally with litellm --config litellm_proxy.yaml --port 4000 and litellm.api_base: http://localhost:4000 in fast-agent config. Round-trips identically.

Troubleshooting

`litellm.AuthenticationError: Missing <Provider> API Key`

The chosen model's backing provider doesn't have credentials in your environment. Look up the env var that backing needs (e.g. OPENAI_API_KEY for litellm.openai/..., COHERE_API_KEY for litellm.cohere/...), then export it (Path 1) or add it under the matching <provider>: section in fastagent.config.yaml (Path 2, for bridged backings only).

`litellm.NotFoundError: ... DeploymentNotFound` / `model not found`

The backing provider authenticated successfully, but the specific model name isn't deployed at that endpoint. Common causes:

Using a private proxy / Foundry / Bedrock account that only deploys a subset of model versions — pick a model the user has access to.
The model spec is from LiteLLM's general catalog but the backing is region-specific (e.g. some Vertex models are only in us-central1).

Workaround: press c in the picker to switch to the all-scope and find the deployed name, or pass --model=litellm.<backing>/<exact-name> directly.

Wizard shows ✗ on a model whose key is set

Your env var name might not match LiteLLM's expectation (e.g. OPENAI_KEY instead of OPENAI_API_KEY).
The picker only checks env vars at render time — restart fast-agent go after exporting new keys.
For backings not in the bridged list (Cohere, Mistral, Perplexity, …), only env vars and proxy work — config-file creds aren't read for those.

Out of scope / future work

Documentation site PR — the docs live in the [fast-agent-docs](https://github.com/evalstate/fast-agent-docs) submodule. Happy to follow up with a docs PR there once this one lands.
Path 2 bridging for backings without a fast-agent native config section (Cohere, Mistral, Perplexity, …). They still work via Path 1 (env vars) or Path 3 (proxy).

RheagalFire · 2026-05-01T18:06:56Z

cc @evalstate

RheagalFire added 2 commits May 1, 2026 23:05

feat(provider): add LiteLLM as embedded AI gateway provider

a734787

feat(provider): tighten LiteLLM types + README mention

f1e4611

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(provider): add LiteLLM as embedded AI gateway provider #782

feat(provider): add LiteLLM as embedded AI gateway provider #782
RheagalFire wants to merge 2 commits intoevalstate:mainfrom
RheagalFire:feat/add-litellm-provider

RheagalFire commented May 1, 2026 •

edited

Loading

Uh oh!

RheagalFire commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

RheagalFire commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Three credential modes work today (precedence: env → config → proxy):

Quickstart (60 seconds)

Why

Prior art

Installation & usage

Install

Path 1 — env vars (simplest, zero config)

Path 2 — config file (no shell exports)

Path 3 — LiteLLM proxy server (centralized auth)

Optional litellm: config knobs

Architecture

Files

Tests

Unit tests (24 / 24 pass)

Regression (1499 / 1499 adjacent unit tests pass)

Type checking (ty) clean on touched files

Lint clean

Live E2E

Troubleshooting

litellm.AuthenticationError: Missing <Provider> API Key

litellm.NotFoundError: ... DeploymentNotFound / model not found

Wizard shows ✗ on a model whose key is set

Out of scope / future work

Uh oh!

RheagalFire commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

RheagalFire commented May 1, 2026 •

edited

Loading

Optional `litellm:` config knobs

`litellm.AuthenticationError: Missing <Provider> API Key`

`litellm.NotFoundError: ... DeploymentNotFound` / `model not found`