Skip to content

feat(provider): add LiteLLM as embedded AI gateway provider #782

Open
RheagalFire wants to merge 2 commits intoevalstate:mainfrom
RheagalFire:feat/add-litellm-provider
Open

feat(provider): add LiteLLM as embedded AI gateway provider #782
RheagalFire wants to merge 2 commits intoevalstate:mainfrom
RheagalFire:feat/add-litellm-provider

Conversation

@RheagalFire
Copy link
Copy Markdown

@RheagalFire RheagalFire commented May 1, 2026

Closes #127.

Summary

Adds LiteLLM as a first-class provider alongside the existing Anthropic / OpenAI / Google / Azure / Bedrock / TensorZero options. Embedded SDK mode (no proxy required) — every call goes through litellm.acompletion(model=...), which routes to 100+ underlying providers (Anthropic, OpenAI, AWS Bedrock, Vertex AI, Cohere, Mistral, Groq, Perplexity, Together, Fireworks, Cerebras, Databricks, IBM Watsonx, AI21, Replicate, DeepInfra, NVIDIA NIM, xAI, Sambanova, …) using each backing's standard auth conventions.

The interactive model picker gets a new LiteLLM [available] (20 curated) row at the top of the Providers column. Selecting it shows 20 curated entries spanning the major backings; pressing c swaps to the "all" scope which discovers ~2.3k models from litellm.models_by_provider at runtime. The ✓/✗ marker on each row reflects whether that backing's credentials are present in the user's environment, so users see at a glance which models will actually chat vs. which will fail at runtime with a missing-key error.

📸 Wizard previewLiteLLM row at the top of the Providers column with curated models in the right panel

Screenshot 2026-05-01 at 22 57 26

Three credential modes work today (precedence: env → config → proxy):

# Path 1 — env vars (LiteLLM convention; zero config)
# export ANTHROPIC_API_KEY=sk-ant-...
# export OPENAI_API_KEY=sk-...
# export COHERE_API_KEY=...
# (etc. for any backing)

# Path 2 — fastagent.config.yaml backing-provider sections
# (auto-bridged into env vars at LiteLLMLLM init so litellm.acompletion picks them up)
anthropic:
  api_key: sk-ant-...
openai:
  api_key: sk-...

# Path 3 — LiteLLM proxy server (multi-tenant deployments)
litellm:
  api_base: http://localhost:4000
  api_key: sk-fastagent-proxy-1234

Quickstart (60 seconds)

# 1. Install
pip install fast-agent-mcp[litellm]

# 2. Set the credential for whatever backing you want to call
export ANTHROPIC_API_KEY=sk-ant-...      # or OPENAI_API_KEY, COHERE_API_KEY, etc.

# 3. Open the picker
fast-agent go

#   In the picker:
#   - LiteLLM is the first row in the Providers column
#   - Press → to focus the Models column
#   - Press c to toggle scope between curated (20 popular models) and all (~2.3k)
#   - Press Enter on the row you want; chat starts immediately

# Or skip the picker entirely:
fast-agent go --model=litellm.anthropic/claude-sonnet-4-6 --message="hello"

Why

A LiteLLM provider lets users:

  • Call providers fast-agent doesn't have native support for (Cohere, Mistral, Together, Replicate, DeepInfra, Fireworks, Cerebras, Sambanova, IBM Watsonx, NVIDIA NIM, AI21, Perplexity, Databricks, Cloudflare Workers AI, Vercel AI Gateway, …) without one-off provider PRs.
  • Switch a model spec across providers by changing a single string (litellm.openai/gpt-4olitellm.anthropic/claude-sonnet-4-6litellm.bedrock/...) without touching any other config.
  • Use existing LiteLLM proxy infrastructure (centralized key management, audit logging, cost tracking, rate limiting, fallback chains) by setting two config fields.

The picker UX (curated + dynamic discovery + per-row availability markers) keeps fast-agent's "model selection is painless" story intact — users don't have to memorize 2k+ model names.

Prior art

Installation & usage

Install

pip install fast-agent-mcp[litellm]            # persistent install
pip install fast-agent-mcp[all-providers]      # everything
uvx --with litellm fast-agent                  # one-shot via uvx

Path 1 — env vars (simplest, zero config)

The LiteLLM SDK reads each backing's standard env var (ANTHROPIC_API_KEY, OPENAI_API_KEY, COHERE_API_KEY, MISTRAL_API_KEY, …). Export whichever backings you want to use, then:

export ANTHROPIC_API_KEY=sk-ant-...
fast-agent go

In the picker, navigate to LiteLLM, focus the Models column, pick Anthropic Claude Sonnet → litellm.anthropic/claude-sonnet-4-6 (✓ marker), Enter, chat.

Path 2 — config file (no shell exports)

fastagent.config.yaml:

anthropic:
  api_key: sk-ant-...
  base_url: https://api.anthropic.com   # optional, for proxies/Foundry/Bedrock-compat
openai:
  api_key: sk-...
google:                                  # → GEMINI_API_KEY for litellm.gemini/...
  api_key: AIza...

Reads from the same <provider>: sections fast-agent already uses for native providers. LiteLLMLLM.__init__ bridges them into env vars at startup so the LiteLLM SDK's per-backing auth resolution picks them up. Env vars always win if both are set.

Bridged backings: anthropicANTHROPIC_API_KEY / ANTHROPIC_BASE_URL, openaiOPENAI_API_KEY / OPENAI_BASE_URL, googleGEMINI_API_KEY, xaiXAI_API_KEY, groqGROQ_API_KEY, deepseekDEEPSEEK_API_KEY, openrouterOPENROUTER_API_KEY. Other backings (Cohere, Mistral, Perplexity, Bedrock, Vertex, …) still use env vars or the proxy.

Path 3 — LiteLLM proxy server (centralized auth)

Run a LiteLLM proxy with model deployments:

# litellm_proxy.yaml
model_list:
  - model_name: claude-sonnet-4-6
    litellm_params:
      model: anthropic/claude-sonnet-4-6
      api_key: os.environ/ANTHROPIC_API_KEY
general_settings:
  master_key: sk-fastagent-proxy-1234
litellm --config litellm_proxy.yaml --port 4000

Point fast-agent at it:

# fastagent.config.yaml
litellm:
  api_base: http://localhost:4000
  api_key: sk-fastagent-proxy-1234
fast-agent go --model=litellm/claude-sonnet-4-6

In proxy mode the model spec is litellm/<deployment-name> (the name from the proxy's model_list), not the upstream litellm.<backing>/<model> shape.

Optional litellm: config knobs

litellm:
  api_key:        sk-...                    # for proxy mode only
  api_base:       http://localhost:4000     # for proxy mode only
  default_model:  anthropic/claude-sonnet-4-6   # used when --model is omitted
  drop_params:    true                      # default: true; lets LiteLLM strip unsupported kwargs per backing
  extra_kwargs:                             # forwarded verbatim to litellm.acompletion
    metadata:
      tags: ["fast-agent"]
  default_headers:                          # forwarded as extra_headers
    X-Trace: my-trace-id

Architecture

LiteLLMLLM extends the existing OpenAILLM and only swaps the underlying client. LiteLLM normalizes every backing's response into OpenAI's ChatCompletion shape, so the existing OpenAI streaming, tool-call accumulation, structured-output, reasoning-effort, and cache-token handling all work unchanged.

fast-agent agent
       │
       ▼
LiteLLMLLM (extends OpenAILLM)
       │  reuses _process_stream_manual, _prepare_api_request, tool/structured-output
       ▼
_LiteLLMClientShim   (AsyncOpenAI-shaped; only chat.completions.create needed)
       │
       ▼
litellm.acompletion(model=..., **kwargs)
       │
       ▼
[Anthropic | OpenAI | Bedrock | Vertex | Cohere | Mistral | ...]

The shim is intentionally narrow — it implements only __aenter__ / __aexit__ and chat.completions.create. Adding the full AsyncOpenAI surface (files, embeddings, audio, etc.) is left out of scope; calls to those raise NotImplementedError so the failure mode is loud rather than silent.

For wizard rendering, per-row credential markers use a static map (_LITELLM_BACKING_ENV_KEYS) instead of litellm.validate_environment(...) because the latter triggers OAuth device-code flows for some backings (e.g. github_copilot) and would block the picker render for 60+ seconds.

Files

  • src/fast_agent/llm/provider/litellm/__init__.py (1) — package marker.
  • src/fast_agent/llm/provider/litellm/llm_litellm.py (193) — LiteLLMLLM(OpenAILLM), _LiteLLMClientShim, _bridge_fastagent_config_to_litellm_env.
  • src/fast_agent/llm/provider_types.py (+1) — LITELLM = ("litellm", "LiteLLM").
  • src/fast_agent/llm/model_factory.py (+5) — provider-class dispatch.
  • src/fast_agent/llm/provider_key_manager.py (+13) — keyless registration; LITELLM_API_KEY is read for proxy mode but isn't required.
  • src/fast_agent/llm/provider_model_catalog.py (+47) — LiteLLMModelCatalogAdapter returns the full LiteLLM catalog (~2.3k specs).
  • src/fast_agent/llm/model_selection.py (+118) — 20 curated entries spanning 16 backings.
  • src/fast_agent/ui/model_picker.py (+10) — per-row ✓/✗ marker honors model.backing_available override.
  • src/fast_agent/ui/model_picker_common.py (+135) — Provider.LITELLM first in PICKER_PROVIDER_ORDER, _provider_is_active, litellm_backing_creds_present static map, ModelOption.backing_available plumbing into curated + dynamic discovery paths.
  • src/fast_agent/llm/provider/openai/llm_openai.py (+1) — adds Provider.LITELLM to providers using _process_stream_manual (LiteLLM chunks omit fields like delta.refusal).
  • src/fast_agent/config.py (+48) — LiteLLMSettings schema.
  • pyproject.toml (+5) — new litellm = ["litellm>=1.60,<1.85"] optional extra; also added to all-providers.
  • README.md (+1 phrase) — providers paragraph mentions the new optional extra.
  • tests/unit/fast_agent/llm/test_litellm_provider.py (new, 280) — 24 tests.
  • tests/unit/fast_agent/llm/test_model_factory.py (+5) — skip LiteLLM in test_curated_catalog_aliases_are_parseable (same pattern as anthropic-vertex; LiteLLM aliases intentionally mirror native short names).

Tests

Unit tests (24 / 24 pass)

$ pytest tests/unit/fast_agent/llm/test_litellm_provider.py -v
24 passed in 1.04s

Coverage:

  • Provider enum + factory dispatch + Provider.LITELLM in PICKER_PROVIDER_ORDER
  • _provider_is_active true when litellm importable, false when not
  • Keyless API-key handling; env + config api-key precedence
  • Dynamic catalog adapter shape (>1k models, prefixed correctly, popular specs round-trip cleanly, returns empty when litellm is missing)
  • Per-row backing-available flag (env present / env absent / non-litellm spec returns None)
  • Shim is async-context-manager
  • api_base / api_key / timeout / drop_params / extra_headers forwarded to litellm.acompletion
  • Caller-supplied kwargs win over shim defaults
  • Config bridge to env vars: sets when missing, doesn't overwrite when present, handles None config

Regression (1499 / 1499 adjacent unit tests pass)

$ pytest tests/unit/fast_agent/llm/ tests/unit/fast_agent/ui/ -q
1499 passed in 8.76s

Type checking (ty) clean on touched files

$ ty check src/fast_agent/llm/provider/litellm/ \
    src/fast_agent/llm/provider_types.py \
    src/fast_agent/llm/provider_model_catalog.py \
    src/fast_agent/llm/provider_key_manager.py \
    src/fast_agent/ui/model_picker_common.py \
    src/fast_agent/ui/model_picker.py
All checks passed!

One narrow # ty: ignore[invalid-method-override] on LiteLLMLLM._openai_client (intentional — the shim is a duck-typed substitute for AsyncOpenAI; OpenAILLM only calls chat.completions.create on it).

Lint clean

$ ruff check <touched files>
All checks passed!
$ ruff format --check <touched files>
files already formatted

Live E2E

Path 1 — env vars + wizard pick:

$ ANTHROPIC_API_KEY=... ANTHROPIC_BASE_URL=... fast-agent go
   [picker: navigate to LiteLLM (top of Providers column), pick Anthropic Claude Sonnet]

▎▶ agent ────────
hi there, how are you
▎◀ agent claude-sonnet-4-6
Hi! I'm doing well, thanks for asking! ...

agent  108 input / 82 output / 190 total

Path 2 — config file, no env vars exported:

$ cat fastagent.config.yaml
anthropic:
  api_key: ...
  base_url: ...
$ unset ANTHROPIC_API_KEY ANTHROPIC_BASE_URL
$ fast-agent go --model=litellm.anthropic/claude-sonnet-4-6 \
    --message="who are you?"
... real reply, same model ...

Path 3 — LiteLLM proxy: verified locally with litellm --config litellm_proxy.yaml --port 4000 and litellm.api_base: http://localhost:4000 in fast-agent config. Round-trips identically.

Troubleshooting

litellm.AuthenticationError: Missing <Provider> API Key

The chosen model's backing provider doesn't have credentials in your environment. Look up the env var that backing needs (e.g. OPENAI_API_KEY for litellm.openai/..., COHERE_API_KEY for litellm.cohere/...), then export it (Path 1) or add it under the matching <provider>: section in fastagent.config.yaml (Path 2, for bridged backings only).

litellm.NotFoundError: ... DeploymentNotFound / model not found

The backing provider authenticated successfully, but the specific model name isn't deployed at that endpoint. Common causes:

  • Using a private proxy / Foundry / Bedrock account that only deploys a subset of model versions — pick a model the user has access to.
  • The model spec is from LiteLLM's general catalog but the backing is region-specific (e.g. some Vertex models are only in us-central1).

Workaround: press c in the picker to switch to the all-scope and find the deployed name, or pass --model=litellm.<backing>/<exact-name> directly.

Wizard shows ✗ on a model whose key is set

  • Your env var name might not match LiteLLM's expectation (e.g. OPENAI_KEY instead of OPENAI_API_KEY).
  • The picker only checks env vars at render time — restart fast-agent go after exporting new keys.
  • For backings not in the bridged list (Cohere, Mistral, Perplexity, …), only env vars and proxy work — config-file creds aren't read for those.

Out of scope / future work

  • Documentation site PR — the docs live in the [fast-agent-docs](https://github.com/evalstate/fast-agent-docs) submodule. Happy to follow up with a docs PR there once this one lands.
  • Path 2 bridging for backings without a fast-agent native config section (Cohere, Mistral, Perplexity, …). They still work via Path 1 (env vars) or Path 3 (proxy).

@RheagalFire
Copy link
Copy Markdown
Author

cc @evalstate

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feature: add support for LiteLLM Provider

1 participant