Pre-alpha Rust-native local model runtime and validation workspace for small and efficient language models, including dense SLMs and BitNet / 1-bit model families.
Warning
Pre-alpha. Do not use in production.
BitNet-rs is still pre-alpha. The project currently focuses on loader, tokenizer, kernel, receipt, runtime-target, and hardware validation. Dense SLM paths can be supported model families in their own right; coherent BitNet answer quality is still under validation, so generated BitNet text should be treated as diagnostic output until a strict proof lane accepts it.
Active feature, hardware, performance, diagnostic, and refactor work now happens
in
EffortlessMetrics/bitnet-rs-swarm.
This repository is the release and publish repository for BitNet-rs. Open normal
development PRs in bitnet-rs-swarm; PRs here are limited to release promotion,
versioning, changelog, packaging, signing, publish, emergency security or
release-blocking hotfixes, and documentation corrections needed for released
artifacts.
BitNet-rs is moving toward a Rust-native local model runner with strict proof surfaces. BitNet / 1-bit models are an important specialized model family inside that runner; dense SLMs such as Qwen are also useful supported model families when their artifact, tokenizer, backend, and receipt contracts pass. The repo is useful for contributors working on model loading, tokenization, quantization, kernel parity, runtime targets, hardware bring-up, receipts, and reproducible inference validation. It is not yet a polished end-user inference server.
What exists today:
- strict GGUF loading and tokenizer metadata checks
- I2_S / QK256 quantization and kernel infrastructure
- scalar, AVX2, AVX-512, NEON, CUDA, OpenCL, OpenVINO, Metal, and NPU validation work
- diagnostic answer-corpus and answer-parity receipts
- receipts that record hardware identity, runtime identity, fallback behavior, and kernel coverage
- dense SLM work used as a first-class local-answer model family and to validate shared generation surfaces while BitNet model-artifact work continues
The repo has real inference infrastructure, but it does not yet provide supported coherent Rust BitNet local answers. The Microsoft BitNet.cpp reference path can answer the tiny suite with the official I2_S GGUF when the missing pre-tokenizer is supplied from Microsoft's tokenizer assets.
Backend receipts remain useful for selected-device execution, tokenizer and prompt diagnostics, fallback behavior, and kernel coverage. They are not, by themselves, evidence that the Rust-generated text is a supported answer. See the model-artifact validation docs.
| Area | State | What it means today |
|---|---|---|
| GGUF loading | Supported / hardening | Structural loading and metadata extraction are active work surfaces. |
| Tokenizer handling | Supported / hardening | Tokenizer metadata is checked strictly for answer-quality work. |
| I2_S BitNet32 CPU path | Diagnostic | CPU execution exists; coherent BitNet answer quality is still under validation. |
| I2_S QK256 CPU path | Diagnostic | Scalar, AVX2, and AVX-512 diagnostics have receipts; generated text quality is still under validation. |
| Scalar / SIMD parity | Diagnostic | Used for backend agreement checks and first-divergence debugging. |
| Dense SLM path | Early working | First-class supported model-family path where artifact, tokenizer, backend, and receipt gates pass; not a BitNet quality result. |
| RTX 5070 Ti CUDA | Execution path validated / diagnostic | Packed BitNet CUDA has receipts through CUDA-BITNET-009; coherent CUDA answers and speed are not established. |
| Metal / OpenCL / OpenVINO / NPU | Probe / smoke | Hardware identity and narrow execution receipts exist; full BitNet answer quality is not established. |
| WASM runtime target | First-class proof lane / scaffolded | bitnet-wasm is treated as a Rust-native runtime target for browser, Node, WASI, and embedded hosts; real inference claims require strict WASM receipts and do not displace native M4 Mac mini validation. |
| Cross-validation | Supported / hardening | Reference comparison infrastructure exists; model selection remains active work. |
| Honest-compute receipts | Supported | Receipts preserve backend, runtime, fallback, kernel, and timing metadata. |
| CLI run/chat | Diagnostic | Useful for exercising the pipeline; generated text is not yet a supported answer-quality surface. |
| Server / HTTP API | Incomplete | Health wiring exists; inference serving is not ready. |
| Need | Start here |
|---|---|
| First token-generation walkthrough | docs/tutorials/first-inference.md |
| Real GGUF model walkthrough | docs/tutorials/real-gguf-model-inference.md |
| Model validation workflow | docs/howto/validate-models.md |
| GGUF loading details | docs/howto/gguf-model-validation-and-loading.md |
| CLI flags and receipt options | docs/reference/inference-cli-reference.md |
The commands below are a smoke path for contributors, not an answer-quality quickstart.
Build the CPU CLI:
cargo build --locked -p bitnet-cli --no-default-features --features cpu,full-cliDownload the official Microsoft BitNet GGUF:
cargo run --locked -p xtask --no-default-features -- download-model --id microsoft/bitnet-b1.58-2B-4T-ggufRun a diagnostic CPU generation path:
RUST_LOG=warn cargo run --locked -p bitnet-cli \
--no-default-features --features cpu,full-cli -- run \
--model models/microsoft-bitnet-b1.58-2B-4T-gguf/ggml-model-i2_s.gguf \
--tokenizer models/microsoft-bitnet-b1.58-2B-4T-gguf/tokenizer.json \
--prompt "What is 2+2?" \
--max-new-tokens 8 \
--strict-loader \
--strict-tokenizer \
--json-out target/bitnet/receipts/first-run.jsonThis exercises the model, tokenizer, generation, and receipt path. Treat the output as diagnostic evidence, not as a supported chat answer.
bitnet-tokenizers --------------------------------------+
|
bitnet-models (GGUF loader, I2_S detection, metadata) |
-> bitnet-quantization (I2_S / TL1 / TL2 / IQ2_S) |
-> bitnet-kernels (scalar / AVX2 / AVX-512 / NEON / CUDA)
v
bitnet-inference (autoregressive engine)
-> bitnet-logits
-> bitnet-sampling
-> bitnet-generation
-> bitnet-prompt-templates
-> bitnet-receipts
|
+-----------------+----------------+
v v
bitnet-cli bitnet-server
The workspace contains roughly 200 crates. See docs/architecture-overview.md.
Hardware validation is organized by platform so backend identity, runtime identity, fallback status, and receipt coverage stay explicit.
| Platform | Role |
|---|---|
| Intel 258V CPU | Lead BitNet CPU reference and AVX2 diagnostics. |
| i5-8250U CPU | Dense SLM CPU lead and low-power comparison. |
| Ryzen 9950X3D | AVX-512 support and high-performance CPU diagnostics. |
| RTX 5070 Ti | CUDA packed BitNet validation and future answer path. |
| Apple M4 | Metal, MPSGraph, and CPU/NEON validation. |
| Arc A770 | Discrete Intel GPU OpenCL/OpenVINO validation. |
| Arc 140V | Lunar Lake iGPU OpenCL/OpenVINO validation. |
| Intel NPU | OpenVINO NPU static-shape validation. |
See docs/hardware/HARDWARE_MATRIX.md.
Before building, run the local environment doctor to confirm the pinned Rust toolchain, Cargo metadata, default feature detection, and optional helper tools:
make doctorcargo build --locked --no-default-features --features cpu
cargo build --locked -p bitnet-cli --no-default-features --features cpu,full-cli
cargo build --locked --no-default-features --features gpuOptimized CPU build:
RUSTFLAGS="-C target-cpu=native -C opt-level=3 -C lto=thin" \
cargo build --locked --release -p bitnet-cli --no-default-features --features cpu,full-cli| Flag | Purpose |
|---|---|
cpu |
CPU inference and diagnostics. |
cuda |
CUDA backend surface. |
gpu |
GPU umbrella feature for accelerator backends currently wired through the workspace. |
full-cli |
Full CLI command set. |
ffi |
C++ FFI bridge for cross-validation. |
fixtures |
GGUF fixture-based integration tests. |
Nix: nix develop && nix build .#bitnet-cli && nix flake check - see Nix guide.
For a fast local loop, use the development check wrapper:
scripts/dev-check.sh quick
scripts/dev-check.sh allThe same single-command CPU check is available as a Cargo alias:
cargo dev-checkThe underlying CI-style commands remain available when you need to run each gate manually:
cargo nextest run --locked --workspace --no-default-features --features cpu
cargo fmt --all -- --check
cargo clippy --locked --workspace --all-targets --no-default-features --features cpu -- -D warningsThe repository contains unit, property, snapshot, fixture, fuzz, BDD, receipt, and hardware-specific tests. Some tests are intentionally ignored with justification strings where hardware, model artifacts, or long-running evidence is required. See docs/development/test-suite.md.
| Section | Contents |
|---|---|
| docs/tutorials/ | Getting started and first diagnostic runs. |
| docs/howto/ | Install, run, export, validate, and cross-check. |
| docs/explanation/ | Architecture and design notes. |
| docs/reference/ | CLI, environment variables, quantization, and receipts. |
| docs/model-artifacts/ | Model artifact status and validation. |
| docs/hardware/ | Hardware validation and benchmark protocol. |
| docs/tracking/ | Campaign state and active work. |
Near-term work is focused on:
- matching the Microsoft BitNet.cpp reference path from Rust CPU
- preserving the reference runner, tokenizer, pre-tokenizer, and prompt template chain
- enriching backend-neutral answer diagnostics and first-divergence receipts
- validating coherent BitNet answer quality against a deterministic corpus
- validating strict CPU/CUDA answer parity after the Rust CPU path works
- qualifying throughput after answer quality works
See CONTRIBUTING.md. Before opening a PR, run the fast local wrapper first and then the fuller local CI gate when applicable:
scripts/dev-check.sh all
./ci/local.shNew internal maintenance commands belong in xtask. bitnet-task exists only to preserve legacy scripts/*.sh entrypoints while that migration is in flight.
See ROADMAP.md for project direction.
Dual-licensed under MIT and Apache 2.0.