Releases · wanghqc/llama.cpp

23 Sep 19:20

f505bd8

b6560 Latest

Latest

ci : disable AMD workflows + update NVIDIA workflows (#16200)

* ci : disable AMD workflows + update NVIDIA workflows

* cont : fixes

* cont : update nvidia vulkan workflows

Assets 15

cudart-llama-bin-win-cuda-12.4-x64.zip

sha256:8c79a9b226de4b3cacfd1f83d24f962d0773be79f1e7b75c6af4ded7e32ae1d6

373 MB 2025-09-23T19:20:15Z
llama-b6560-bin-macos-arm64.zip

sha256:cdb6413778dad3d727e35033191989d45df3149445c7cf9d1bdd166c91a78eae

10.2 MB 2025-09-23T19:20:27Z
llama-b6560-bin-macos-x64.zip

sha256:6946d73d6c570402014482c61351802743236a385550f3c7fd3e0a26cf828a8a

27.6 MB 2025-09-23T19:20:28Z
llama-b6560-bin-ubuntu-vulkan-x64.zip

sha256:5e1b685f95b93f0e239f0e93df00c4bf8574a995a714c192d106f057450546e5

25.5 MB 2025-09-23T19:20:29Z
llama-b6560-bin-ubuntu-x64.zip

sha256:3b5f15eb3cf8e47ea33b484773f4e0dcbe5e78a10e95712efa016cf9a472e086

12.3 MB 2025-09-23T19:20:31Z
llama-b6560-bin-win-cpu-arm64.zip

sha256:109990ba03ba73777fa46a1b27c5508797b8495715055b1a00ef47cc8669f478

10.4 MB 2025-09-23T19:20:32Z
llama-b6560-bin-win-cpu-x64.zip

sha256:873b17617c1629db1618419fc933877cbf0849b17e7f1669f1be60750aab2b80

13.4 MB 2025-09-23T19:20:33Z
llama-b6560-bin-win-cuda-12.4-x64.zip

sha256:01719c711193c0cee126556d982773ca735a5456e9c37e7ef406a545a586a3be

146 MB 2025-09-23T19:20:34Z
llama-b6560-bin-win-hip-radeon-x64.zip

sha256:61907684b88db6c9c54301b539c3e5257851304aa2f4e22fa74222ba74fe63db

319 MB 2025-09-23T19:20:39Z
llama-b6560-bin-win-opencl-adreno-arm64.zip

sha256:ad553b07fb4fd9380cf37dd7a487ae3f861acc2f189114111c8dbee15a0601c8

10.8 MB 2025-09-23T19:20:47Z
Source code (zip)

2025-09-23T17:41:40Z
Source code (tar.gz)

2025-09-23T17:41:40Z

19 Aug 22:28

github-actions

b6209

fb22dd0

b6209

opencl: mark `argsort` unsupported if cols exceed workgroup limit (#1…

Assets 15

19 Aug 00:03

github-actions

b6199

f08c4c0

b6199

mtmd : clean up clip_n_output_tokens (#15391)

Assets 15

05 Aug 23:22

github-actions

b6096

fd1234c

b6096

llama : add gpt-oss (#15091)

* oai moe

* compat with new checkpoint

* add attn sink impl

* add rope scaling yarn

* logits match with latest transformers code

* wip chat template

* rm trailing space

* use ggml_scale_bias

* rm redundant is_swa_all

* convert interleaved gate_up

* graph : fix activation function to match reference (#7)

* vocab : handle o200k_harmony special tokens

* ggml : add attention sinks support (#1)

* llama : add attn sinks

* ggml : add attn sinks

* cuda : add attn sinks

* vulkan : add support for sinks in softmax

remove unnecessary return

* ggml : add fused swiglu_oai op (#11)

* ggml : add fused swiglu_oai op

* Update ggml/src/ggml-cpu/ops.cpp

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* update CUDA impl

* cont : metal impl

* add vulkan impl

* test-backend-ops : more test cases, clean up

* llama : remove unfused impl

* remove extra lines

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: slaren <slarengh@gmail.com>

* repack mxfp4 upon conversion

* clean up a bit

* enable thinking

* add quick hack to render only some special tokens

* fix bf16 conversion

* remove vocab hack

* webui ok

* support chat parsing for gpt-oss

* fix webui

* direct mapping mxfp4, FINALLY

* force using mxfp4

* properly use lazy tensor

* ggml : add mxfp4

ggml : use e8m0 conversion instead of powf

Co-authored-by: Diego Devesa <slarengh@gmail.com>

change kvalues_mxfp4 table to match e2m1 (#6)

metal : remove quantization for now (not used)

cuda : fix disabled CUDA graphs due to ffn moe bias

vulkan : add support for mxfp4

cont : add cm2 dequant

* ggml : add ggml_add_id (#13)

* ggml : add ggml_add_id

* add cuda impl

* llama : add weight support check for add_id

* perf opt

* add vulkan impl

* rename cuda files

* add metal impl

* allow in-place ggml_add_id

* llama : keep biases on CPU with --cpu-moe

* llama : fix compile error

ggml-ci

* cuda : add fallback for __nv_cvt_e8m0_to_bf16raw

ggml-ci

* cleanup

ggml-ci

* sycl : fix supports_op for MXFP4

ggml-ci

* fix Unknown reasoning format

* ggml-cpu : fix AVX build

ggml-ci

* fix hip build

ggml-ci

* cuda : add mxfp4 dequantization support for cuBLAS

ggml-ci

* ggml-cpu : fix mxfp4 fallback definitions for some architectures

ggml-ci

* cuda : fix version required for __nv_cvt_e8m0_to_bf16raw

---------

Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
Co-authored-by: slaren <slarengh@gmail.com>

Assets 15

18 Jul 19:31

github-actions

b5937

bf9087f

b5937

metal : fuse add, mul + add tests (#14596)

ggml-ci

Assets 15

30 Jun 22:25

github-actions

b5787

0a5a3b5

b5787

Add Conv2d for CPU (#14388)

* Conv2D: Add CPU version

* Half decent

* Tiled approach for F32

* remove file

* Fix tests

* Support F16 operations

* add assert about size

* Review: further formatting fixes, add assert and use CPU version of fp32->fp16

Assets 15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!

Releases: wanghqc/llama.cpp

b6560

Uh oh!

b6209

Uh oh!

b6199

Uh oh!

b6096

Uh oh!

b5937

Uh oh!

b5787

Uh oh!