Skip to content

EffortlessMetrics/BitNet-rs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5,337 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

BitNet-rs

CI Codecov ripr+

GitHub release crates.io pending docs.rs pending

MSRV License: MIT OR Apache-2.0

Pre-alpha Rust-native local model runtime and validation workspace for small and efficient language models, including dense SLMs and BitNet / 1-bit model families.

Warning

Pre-alpha. Do not use in production.

BitNet-rs is still pre-alpha. The project currently focuses on loader, tokenizer, kernel, receipt, runtime-target, and hardware validation. Dense SLM paths can be supported model families in their own right; coherent BitNet answer quality is still under validation, so generated BitNet text should be treated as diagnostic output until a strict proof lane accepts it.

Development Intake Has Moved

Active feature, hardware, performance, diagnostic, and refactor work now happens in EffortlessMetrics/bitnet-rs-swarm. This repository is the release and publish repository for BitNet-rs. Open normal development PRs in bitnet-rs-swarm; PRs here are limited to release promotion, versioning, changelog, packaging, signing, publish, emergency security or release-blocking hotfixes, and documentation corrections needed for released artifacts.

What This Repo Is For

BitNet-rs is moving toward a Rust-native local model runner with strict proof surfaces. BitNet / 1-bit models are an important specialized model family inside that runner; dense SLMs such as Qwen are also useful supported model families when their artifact, tokenizer, backend, and receipt contracts pass. The repo is useful for contributors working on model loading, tokenization, quantization, kernel parity, runtime targets, hardware bring-up, receipts, and reproducible inference validation. It is not yet a polished end-user inference server.

What exists today:

  • strict GGUF loading and tokenizer metadata checks
  • I2_S / QK256 quantization and kernel infrastructure
  • scalar, AVX2, AVX-512, NEON, CUDA, OpenCL, OpenVINO, Metal, and NPU validation work
  • diagnostic answer-corpus and answer-parity receipts
  • receipts that record hardware identity, runtime identity, fallback behavior, and kernel coverage
  • dense SLM work used as a first-class local-answer model family and to validate shared generation surfaces while BitNet model-artifact work continues

Current Status

The repo has real inference infrastructure, but it does not yet provide supported coherent Rust BitNet local answers. The Microsoft BitNet.cpp reference path can answer the tiny suite with the official I2_S GGUF when the missing pre-tokenizer is supplied from Microsoft's tokenizer assets.

Backend receipts remain useful for selected-device execution, tokenizer and prompt diagnostics, fallback behavior, and kernel coverage. They are not, by themselves, evidence that the Rust-generated text is a supported answer. See the model-artifact validation docs.

Capability Matrix

Area State What it means today
GGUF loading Supported / hardening Structural loading and metadata extraction are active work surfaces.
Tokenizer handling Supported / hardening Tokenizer metadata is checked strictly for answer-quality work.
I2_S BitNet32 CPU path Diagnostic CPU execution exists; coherent BitNet answer quality is still under validation.
I2_S QK256 CPU path Diagnostic Scalar, AVX2, and AVX-512 diagnostics have receipts; generated text quality is still under validation.
Scalar / SIMD parity Diagnostic Used for backend agreement checks and first-divergence debugging.
Dense SLM path Early working First-class supported model-family path where artifact, tokenizer, backend, and receipt gates pass; not a BitNet quality result.
RTX 5070 Ti CUDA Execution path validated / diagnostic Packed BitNet CUDA has receipts through CUDA-BITNET-009; coherent CUDA answers and speed are not established.
Metal / OpenCL / OpenVINO / NPU Probe / smoke Hardware identity and narrow execution receipts exist; full BitNet answer quality is not established.
WASM runtime target First-class proof lane / scaffolded bitnet-wasm is treated as a Rust-native runtime target for browser, Node, WASI, and embedded hosts; real inference claims require strict WASM receipts and do not displace native M4 Mac mini validation.
Cross-validation Supported / hardening Reference comparison infrastructure exists; model selection remains active work.
Honest-compute receipts Supported Receipts preserve backend, runtime, fallback, kernel, and timing metadata.
CLI run/chat Diagnostic Useful for exercising the pipeline; generated text is not yet a supported answer-quality surface.
Server / HTTP API Incomplete Health wiring exists; inference serving is not ready.

First Diagnostic Run

Need Start here
First token-generation walkthrough docs/tutorials/first-inference.md
Real GGUF model walkthrough docs/tutorials/real-gguf-model-inference.md
Model validation workflow docs/howto/validate-models.md
GGUF loading details docs/howto/gguf-model-validation-and-loading.md
CLI flags and receipt options docs/reference/inference-cli-reference.md

The commands below are a smoke path for contributors, not an answer-quality quickstart.

Build the CPU CLI:

cargo build --locked -p bitnet-cli --no-default-features --features cpu,full-cli

Download the official Microsoft BitNet GGUF:

cargo run --locked -p xtask --no-default-features -- download-model --id microsoft/bitnet-b1.58-2B-4T-gguf

Run a diagnostic CPU generation path:

RUST_LOG=warn cargo run --locked -p bitnet-cli \
  --no-default-features --features cpu,full-cli -- run \
  --model models/microsoft-bitnet-b1.58-2B-4T-gguf/ggml-model-i2_s.gguf \
  --tokenizer models/microsoft-bitnet-b1.58-2B-4T-gguf/tokenizer.json \
  --prompt "What is 2+2?" \
  --max-new-tokens 8 \
  --strict-loader \
  --strict-tokenizer \
  --json-out target/bitnet/receipts/first-run.json

This exercises the model, tokenizer, generation, and receipt path. Treat the output as diagnostic evidence, not as a supported chat answer.

Architecture

bitnet-tokenizers --------------------------------------+
                                                        |
bitnet-models  (GGUF loader, I2_S detection, metadata)  |
  -> bitnet-quantization  (I2_S / TL1 / TL2 / IQ2_S)    |
        -> bitnet-kernels (scalar / AVX2 / AVX-512 / NEON / CUDA)
                                                        v
                        bitnet-inference  (autoregressive engine)
                          -> bitnet-logits
                          -> bitnet-sampling
                          -> bitnet-generation
                          -> bitnet-prompt-templates
                          -> bitnet-receipts
                                                        |
                                      +-----------------+----------------+
                                      v                                  v
                                  bitnet-cli                       bitnet-server

The workspace contains roughly 200 crates. See docs/architecture-overview.md.

Hardware Validation

Hardware validation is organized by platform so backend identity, runtime identity, fallback status, and receipt coverage stay explicit.

Platform Role
Intel 258V CPU Lead BitNet CPU reference and AVX2 diagnostics.
i5-8250U CPU Dense SLM CPU lead and low-power comparison.
Ryzen 9950X3D AVX-512 support and high-performance CPU diagnostics.
RTX 5070 Ti CUDA packed BitNet validation and future answer path.
Apple M4 Metal, MPSGraph, and CPU/NEON validation.
Arc A770 Discrete Intel GPU OpenCL/OpenVINO validation.
Arc 140V Lunar Lake iGPU OpenCL/OpenVINO validation.
Intel NPU OpenVINO NPU static-shape validation.

See docs/hardware/HARDWARE_MATRIX.md.

Building

Before building, run the local environment doctor to confirm the pinned Rust toolchain, Cargo metadata, default feature detection, and optional helper tools:

make doctor
cargo build --locked --no-default-features --features cpu
cargo build --locked -p bitnet-cli --no-default-features --features cpu,full-cli
cargo build --locked --no-default-features --features gpu

Optimized CPU build:

RUSTFLAGS="-C target-cpu=native -C opt-level=3 -C lto=thin" \
  cargo build --locked --release -p bitnet-cli --no-default-features --features cpu,full-cli

Feature Flags

Flag Purpose
cpu CPU inference and diagnostics.
cuda CUDA backend surface.
gpu GPU umbrella feature for accelerator backends currently wired through the workspace.
full-cli Full CLI command set.
ffi C++ FFI bridge for cross-validation.
fixtures GGUF fixture-based integration tests.

Nix: nix develop && nix build .#bitnet-cli && nix flake check - see Nix guide.

Testing

For a fast local loop, use the development check wrapper:

scripts/dev-check.sh quick
scripts/dev-check.sh all

The same single-command CPU check is available as a Cargo alias:

cargo dev-check

The underlying CI-style commands remain available when you need to run each gate manually:

cargo nextest run --locked --workspace --no-default-features --features cpu
cargo fmt --all -- --check
cargo clippy --locked --workspace --all-targets --no-default-features --features cpu -- -D warnings

The repository contains unit, property, snapshot, fixture, fuzz, BDD, receipt, and hardware-specific tests. Some tests are intentionally ignored with justification strings where hardware, model artifacts, or long-running evidence is required. See docs/development/test-suite.md.

Documentation

Section Contents
docs/tutorials/ Getting started and first diagnostic runs.
docs/howto/ Install, run, export, validate, and cross-check.
docs/explanation/ Architecture and design notes.
docs/reference/ CLI, environment variables, quantization, and receipts.
docs/model-artifacts/ Model artifact status and validation.
docs/hardware/ Hardware validation and benchmark protocol.
docs/tracking/ Campaign state and active work.

What We Are Working On

Near-term work is focused on:

  1. matching the Microsoft BitNet.cpp reference path from Rust CPU
  2. preserving the reference runner, tokenizer, pre-tokenizer, and prompt template chain
  3. enriching backend-neutral answer diagnostics and first-divergence receipts
  4. validating coherent BitNet answer quality against a deterministic corpus
  5. validating strict CPU/CUDA answer parity after the Rust CPU path works
  6. qualifying throughput after answer quality works

Contributing

See CONTRIBUTING.md. Before opening a PR, run the fast local wrapper first and then the fuller local CI gate when applicable:

scripts/dev-check.sh all
./ci/local.sh

New internal maintenance commands belong in xtask. bitnet-task exists only to preserve legacy scripts/*.sh entrypoints while that migration is in flight.

See ROADMAP.md for project direction.

License

Dual-licensed under MIT and Apache 2.0.

About

Rust inference engine for 1-bit BitNet LLMs (GGUF + llama.cpp compatible).

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors