Skip to content

Commit 7fae353

Browse files
ilopezlunagithub-actions[bot]
authored andcommitted
Add model card for aistaging/qwen3-coder-next-vllm
1 parent a79d925 commit 7fae353

2 files changed

Lines changed: 56 additions & 0 deletions

File tree

cagent

83 MB
Binary file not shown.
Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
# Qwen3-Coder-Next
2+
3+
Qwen3-Coder-Next is an open-weight language model designed specifically for coding agents and local development. With an innovative architecture utilizing only 3B activated parameters (from a total of 80B parameters), it achieves performance comparable to models with 10–20x more active parameters, making it highly cost-effective for agent deployment. The model excels at long-horizon reasoning, complex tool usage, and recovery from execution failures, ensuring robust performance in dynamic coding tasks.
4+
5+
Built with a 256K context length and adaptability to various scaffold templates, Qwen3-Coder-Next enables seamless integration with different CLI/IDE platforms such as Claude Code, Qwen Code, Qoder, Kilo, Trae, and Cline. Its hybrid architecture combines Gated DeltaNet and Mixture of Experts (MoE) layers, providing exceptional efficiency and versatility for real-world development environments. The model operates in non-thinking mode and is optimized for agentic coding workflows with advanced tool-calling capabilities.
6+
7+
Released under the Apache 2.0 license, Qwen3-Coder-Next represents a significant advancement in efficient, production-ready coding assistance, delivering enterprise-grade performance while maintaining computational efficiency through its sparse activation architecture.
8+
9+
---
10+
11+
## Characteristics
12+
13+
| Attribute | Value |
14+
|---|---|
15+
| **Provider** | Qwen (Alibaba Cloud) |
16+
| **Architecture** | Qwen3NextForCausalLM (Hybrid Gated DeltaNet + MoE) |
17+
| **Cutoff date** | TBD |
18+
| **Languages** | Multilingual (optimized for code) |
19+
| **Input modalities** | Text |
20+
| **Output modalities** | Text |
21+
| **License** | Apache 2.0 |
22+
23+
## Using this model with Docker Model Runner
24+
25+
```bash
26+
docker model run qwen3-coder-next-vllm
27+
```
28+
29+
For more information, check out the [Docker Model Runner docs](https://docs.docker.com/desktop/features/model-runner/).
30+
31+
## Benchmarks
32+
33+
The model demonstrates performance comparable to significantly larger models on coding benchmarks, as shown in the official benchmarks. Specific details on SWE-bench and other coding evaluation metrics can be found in the reference links below.
34+
35+
Key performance characteristics:
36+
- **Efficiency**: 3B activated parameters achieve performance of models with 10-20x more parameters
37+
- **Context handling**: Native support for 262,144 tokens (256K)
38+
- **Agentic capabilities**: Advanced tool usage and execution failure recovery
39+
40+
## Links
41+
42+
- https://huggingface.co/Qwen/Qwen3-Coder-Next
43+
44+
## Considerations
45+
46+
- Default context length is 256K tokens - consider reducing to 32,768 tokens if experiencing memory constraints
47+
- Requires `vllm>=0.15.0` or `sglang>=v0.5.8` for optimal deployment
48+
- Recommended sampling parameters for best performance: `temperature=1.0`, `top_p=0.95`, `top_k=40`
49+
- Model operates in non-thinking mode only and does not generate `<think></think>` blocks
50+
- Tensor parallelism is recommended for deployment (e.g., `--tensor-parallel-size 2` for vLLM)
51+
- Supports OpenAI-compatible API endpoints for seamless integration
52+
- 80B total parameters with sparse activation (only 3B active per forward pass)
53+
- Designed specifically for coding agents - may not be optimal for general-purpose tasks
54+
55+
### Generated by
56+
This model card was automatically generated using [cagent-action](https://github.com/docker/cagent-action) with the [Docker Model Runner's model-card-generator](https://github.com/docker/model-runner).

0 commit comments

Comments
 (0)