Skip to content

Commit c02c300

Browse files
authored
docs: add info about quantization and dimensionality reduction (#231)
* docs * add dim
1 parent a6ca71a commit c02c300

2 files changed

Lines changed: 50 additions & 0 deletions

File tree

README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -123,6 +123,8 @@ For advanced usage, please refer to our [usage documentation](https://github.com
123123

124124
## Updates & Announcements
125125

126+
- **01/05/2024**: We released backend support for `BPE` and `Unigram` tokenizers, along with quantization and dimensionality reduction. New Model2Vec models are now 50% of the original models, and can be quantized to int8 to be 25% of the size, without loss of performance.
127+
126128
- **12/02/2024**: We released **Model2Vec training**, allowing you to fine-tune your own classification models on top of Model2Vec models. Find out more in our [training documentation](https://github.com/MinishLab/model2vec/blob/main/model2vec/train/README.md) and [results](results/README.md#training-results).
127129

128130
- **30/01/2024**: We released two new models: [potion-base-32M](https://huggingface.co/minishlab/potion-base-32M) and [potion-retrieval-32M](https://huggingface.co/minishlab/potion-retrieval-32M). [potion-base-32M](https://huggingface.co/minishlab/potion-base-32M) is our most performant model to date, using a larger vocabulary and higher dimensions. [potion-retrieval-32M](https://huggingface.co/minishlab/potion-retrieval-32M) is a finetune of [potion-base-32M](https://huggingface.co/minishlab/potion-base-32M) that is optimized for retrieval tasks, and is the best performing static retrieval model currently available.

docs/usage.md

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -126,6 +126,54 @@ m2v_model = distill(model_name=model_name, vocabulary=vocabulary, use_subword=Fa
126126

127127
**Important note:** we assume the passed vocabulary is sorted in rank frequency. i.e., we don't care about the actual word frequencies, but do assume that the most frequent word is first, and the least frequent word is last. If you're not sure whether this is case, set `apply_zipf` to `False`. This disables the weighting, but will also make performance a little bit worse.
128128

129+
### Quantization
130+
131+
Models can be quantized to `float16` (default) or `int8` during distillation, or when loading from disk.
132+
133+
```python
134+
from model2vec.distill import distill
135+
136+
# Distill a Sentence Transformer model and quantize is to int8
137+
m2v_model = distill(model_name="BAAI/bge-base-en-v1.5", quantize_to="int8")
138+
139+
# Save the model. This model is now 25% of the size of a normal model.
140+
m2v_model.save_pretrained("m2v_model")
141+
```
142+
143+
You can also quantize during loading.
144+
145+
```python
146+
from model2vec import StaticModel
147+
148+
model = StaticModel.from_pretrained("minishlab/potion-base-8m", quantize_to="int8")
149+
```
150+
151+
### Dimensionality reduction
152+
153+
Because almost all Model2Vec models have been distilled using PCA, and because PCA explicitly orders dimensions from most informative to least informative, we can perform dimensionality reduction during loading. This is very similar to how matryoshka embeddings work.
154+
155+
```python
156+
from model2vec import StaticModel
157+
158+
model = StaticModel.from_pretrained("minishlab/potion-base-8m", dimensionality=32)
159+
160+
print(model.embedding.shape)
161+
# (29528, 32)
162+
```
163+
164+
### Combining quantization and dimensionality reduction
165+
166+
Combining these tricks can lead to extremely small models. For example, using this, we can reduce the size of `potion-base-8m`, which is now 30MB, to only 1MB:
167+
168+
```python
169+
model = StaticModel.from_pretrained("minishlab/potion-base-8m",
170+
dimensionality=32,
171+
quantize_to="int8")
172+
print(model.embedding.nbytes)
173+
# 944896 bytes = 944kb
174+
```
175+
176+
This should be enough to satisfy even the strongest hardware constraints.
129177

130178
## Training
131179

0 commit comments

Comments
 (0)