docker
diff --git a/‎cagent‎
83 MB b/‎cagent‎
83 MB
diff --git a/‎model-cards/qwen3-coder-next-vllm.md‎
Lines changed: 56 additions & 0 deletions b/‎model-cards/qwen3-coder-next-vllm.md‎
Lines changed: 56 additions & 0 deletions
@@ -0,0 +1,56 @@
+# Qwen3-Coder-Next
+
+Qwen3-Coder-Next is an open-weight language model designed specifically for coding agents and local development. With an innovative architecture utilizing only 3B activated parameters (from a total of 80B parameters), it achieves performance comparable to models with 10–20x more active parameters, making it highly cost-effective for agent deployment. The model excels at long-horizon reasoning, complex tool usage, and recovery from execution failures, ensuring robust performance in dynamic coding tasks.
+
+Built with a 256K context length and adaptability to various scaffold templates, Qwen3-Coder-Next enables seamless integration with different CLI/IDE platforms such as Claude Code, Qwen Code, Qoder, Kilo, Trae, and Cline. Its hybrid architecture combines Gated DeltaNet and Mixture of Experts (MoE) layers, providing exceptional efficiency and versatility for real-world development environments. The model operates in non-thinking mode and is optimized for agentic coding workflows with advanced tool-calling capabilities.
+
+Released under the Apache 2.0 license, Qwen3-Coder-Next represents a significant advancement in efficient, production-ready coding assistance, delivering enterprise-grade performance while maintaining computational efficiency through its sparse activation architecture.
+
+---
+
+## Characteristics
+
+| Attribute | Value |
+|---|---|
+| **Provider** | Qwen (Alibaba Cloud) |
+| **Architecture** | Qwen3NextForCausalLM (Hybrid Gated DeltaNet + MoE) |
+| **Cutoff date** | TBD |
+| **Languages** | Multilingual (optimized for code) |
+| **Input modalities** | Text |
+| **Output modalities** | Text |
+| **License** | Apache 2.0 |
+
+## Using this model with Docker Model Runner
+
+```bash
+docker model run qwen3-coder-next-vllm
+```
+
+For more information, check out the [Docker Model Runner docs](https://docs.docker.com/desktop/features/model-runner/).
+
+## Benchmarks
+
+The model demonstrates performance comparable to significantly larger models on coding benchmarks, as shown in the official benchmarks. Specific details on SWE-bench and other coding evaluation metrics can be found in the reference links below.
+
+Key performance characteristics:
+- **Efficiency**: 3B activated parameters achieve performance of models with 10-20x more parameters
+- **Context handling**: Native support for 262,144 tokens (256K)
+- **Agentic capabilities**: Advanced tool usage and execution failure recovery
+
+## Links
+
+- https://huggingface.co/Qwen/Qwen3-Coder-Next
+
+## Considerations
+
+- Default context length is 256K tokens - consider reducing to 32,768 tokens if experiencing memory constraints
+- Requires `vllm>=0.15.0` or `sglang>=v0.5.8` for optimal deployment
+- Recommended sampling parameters for best performance: `temperature=1.0`, `top_p=0.95`, `top_k=40`
+- Model operates in non-thinking mode only and does not generate `<think></think>` blocks
+- Tensor parallelism is recommended for deployment (e.g., `--tensor-parallel-size 2` for vLLM)
+- Supports OpenAI-compatible API endpoints for seamless integration
+- 80B total parameters with sparse activation (only 3B active per forward pass)
+- Designed specifically for coding agents - may not be optimal for general-purpose tasks
+
+### Generated by
+This model card was automatically generated using [cagent-action](https://github.com/docker/cagent-action) with the [Docker Model Runner's model-card-generator](https://github.com/docker/model-runner).