Skip to content

Improve DMR support#2351

Open
krissetto wants to merge 1 commit intodocker:mainfrom
krissetto:improve-dmr-support
Open

Improve DMR support#2351
krissetto wants to merge 1 commit intodocker:mainfrom
krissetto:improve-dmr-support

Conversation

@krissetto
Copy link
Copy Markdown
Contributor

@krissetto krissetto commented Apr 8, 2026

Improve DMR support

  • provider_opts.context_size sets the engine's context window; max_tokens stays strictly per-request output instead of pulling-double duty (which was confusing).
  • Structured _configure request mirroring model-runner's BackendConfiguration (context-size, runtime-flags, speculative, llamacpp.reasoning-budget, vllm.{hf-overrides,gpu-memory-utilization}).
  • thinking_budget routed properly per backend: reasoning-budget for llama.cpp, thinking_token_budget per-request for vLLM, ignored on MLX/SGLang for now.
  • Fix session-title generation on reasoning models: DMR now honors NoThinking() by sending chat_template_kwargs.enable_thinking=false
  • clarified in the docs that sampling params belong on the regular model config, not in provider_opts.runtime_flags.

@krissetto krissetto force-pushed the improve-dmr-support branch 2 times, most recently from 2e7bbfe to a6a876c Compare April 17, 2026 16:18
- fixes session title generation
- adds 'context_size' provider_opt for DMR usage instead of giving 'max_tokens' double responsibility to avoid confusion
- improved thinking budget support and fix for NoThinking()
- improves how flags are sent to the DMR model/runtime configuration endpoint
- clarify docs on sampling/runtime params

Signed-off-by: Christopher Petito <chrisjpetito@gmail.com>
@krissetto krissetto force-pushed the improve-dmr-support branch from a6a876c to d2eac5f Compare April 17, 2026 17:11
@krissetto krissetto marked this pull request as ready for review April 17, 2026 17:30
@krissetto krissetto requested a review from a team as a code owner April 17, 2026 17:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant