low-precision

Here are 11 public repositories matching this topic...

intel / neural-compressor

SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on PyTorch, TensorFlow, and ONNX Runtime

sparsity pruning quantization knowledge-distillation auto-tuning int8 low-precision quantization-aware-training post-training-quantization awq int4 large-language-models gptq smoothquant sparsegpt fp4 mxformat

Updated Apr 22, 2026
Python

Tiiiger / QPyTorch

Star

Low Precision Arithmetic Simulation in PyTorch

learning low-precision

Updated May 20, 2024
Python

gudovskiy / ShiftCNN

Star

A script to convert floating-point CNN models into generalized low-precision ShiftCNN representation

cnn dnn low-precision

Updated Jul 14, 2017
Python

sefaburakokcu / quantized-yolov5

Star

Low Precision(quantized) Yolov5

fpga yolov1 finn low-precision quantized-neural-networks pynq-z2 brevitas yolov5

Updated Mar 24, 2025
Python

KernelTuner / kernel_float

Star

CUDA/HIP header-only library for low-precision (16 bit, 8 bit) and vectorized GPU kernel development

performance cpp gpu cuda kernel-tuner hip vectorization floating-point half-precision mixed-precision low-precision bfloat16 header-only-library reduced-precision

Updated Apr 10, 2026
C++

graphcore-research / jax-scalify

Star

JAX Scalify: end-to-end scaled arithmetics

jax low-precision llm fp8

Updated Oct 30, 2024
Python

gudovskiy / fmap_compression

Star

Code for DNN feature map compression paper

compression caffe cnn dnn feature-map low-precision

Updated Nov 21, 2018
C++

hfp / libxs

Star

Small utility library

math emulation histogram malloc in-memory in-memory-database gemm memory-pool high-precision timer-clock low-precision ozaki

Updated Apr 20, 2026
C

gauranggupta0786 / Quantisation-on-reasoning-models

Star

Implemented post-training quantisation (PTQ) on transformer-based reasoning models using 8-bit and 4-bit weight quantisation (INT8, INT4) with frameworks like PyTorch and Hugging Face Transformers. Leveraged libraries such as bitsandbytes to reduce model size and accelerate inference, while evaluating performance degradation on reasoning tasks. Com

machine-learning deep-learning transformers pytorch quantization low-precision huggingface model-optimization post-training-quantization int8-quantization bitsandbytes reasoning-models int4-quantization

Updated Apr 21, 2026
Jupyter Notebook

abdulvahapmutlu / quantlab-8bit

Star

QuantLab-8bit is a reproducible benchmark of 8-bit quantization on compact vision backbones. It includes FP32 baselines, PTQ (dynamic & static), QAT, ONNX exports, parity checks, ORT CPU latency, and visual diagnostics.

benchmarking computer-vision deep-learning pytorch reproducibility quantization model-compression onnx gradcam low-precision edge-ai onnxruntime streamlit model-optimization quantization-aware-training post-training-quantization efficient-ai

Updated Sep 25, 2025
Python

AmanPriyanshu / LinearCosine

Sponsor

Star

LinearCosine: Adding beats multiplying for lower-precision efficient cosine similarity

nlp benchmarking machine-learning computer-vision deep-learning algorithms cpp optimization linear-algebra artificial-intelligence computation matrix-multiplication neural-networks cosine-similarity floating-point quantization energy-efficiency performance-optimization low-precision

Updated Oct 21, 2024
C++

Improve this page

Add a description, image, and links to the low-precision topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the low-precision topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

low-precision

Here are 11 public repositories matching this topic...

intel / neural-compressor

Tiiiger / QPyTorch

gudovskiy / ShiftCNN

sefaburakokcu / quantized-yolov5

KernelTuner / kernel_float

graphcore-research / jax-scalify

gudovskiy / fmap_compression

hfp / libxs

gauranggupta0786 / Quantisation-on-reasoning-models

abdulvahapmutlu / quantlab-8bit

AmanPriyanshu / LinearCosine

Improve this page

Add this topic to your repo