You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+2Lines changed: 2 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -123,6 +123,8 @@ For advanced usage, please refer to our [usage documentation](https://github.com
123
123
124
124
## Updates & Announcements
125
125
126
+
-**01/05/2024**: We released backend support for `BPE` and `Unigram` tokenizers, along with quantization and dimensionality reduction. New Model2Vec models are now 50% of the original models, and can be quantized to int8 to be 25% of the size, without loss of performance.
127
+
126
128
-**12/02/2024**: We released **Model2Vec training**, allowing you to fine-tune your own classification models on top of Model2Vec models. Find out more in our [training documentation](https://github.com/MinishLab/model2vec/blob/main/model2vec/train/README.md) and [results](results/README.md#training-results).
127
129
128
130
-**30/01/2024**: We released two new models: [potion-base-32M](https://huggingface.co/minishlab/potion-base-32M) and [potion-retrieval-32M](https://huggingface.co/minishlab/potion-retrieval-32M). [potion-base-32M](https://huggingface.co/minishlab/potion-base-32M) is our most performant model to date, using a larger vocabulary and higher dimensions. [potion-retrieval-32M](https://huggingface.co/minishlab/potion-retrieval-32M) is a finetune of [potion-base-32M](https://huggingface.co/minishlab/potion-base-32M) that is optimized for retrieval tasks, and is the best performing static retrieval model currently available.
**Important note:** we assume the passed vocabulary is sorted in rank frequency. i.e., we don't care about the actual word frequencies, but do assume that the most frequent word is first, and the least frequent word is last. If you're not sure whether this is case, set `apply_zipf` to `False`. This disables the weighting, but will also make performance a little bit worse.
128
128
129
+
### Quantization
130
+
131
+
Models can be quantized to `float16` (default) or `int8` during distillation, or when loading from disk.
132
+
133
+
```python
134
+
from model2vec.distill import distill
135
+
136
+
# Distill a Sentence Transformer model and quantize is to int8
# Save the model. This model is now 25% of the size of a normal model.
140
+
m2v_model.save_pretrained("m2v_model")
141
+
```
142
+
143
+
You can also quantize during loading.
144
+
145
+
```python
146
+
from model2vec import StaticModel
147
+
148
+
model = StaticModel.from_pretrained("minishlab/potion-base-8m", quantize_to="int8")
149
+
```
150
+
151
+
### Dimensionality reduction
152
+
153
+
Because almost all Model2Vec models have been distilled using PCA, and because PCA explicitly orders dimensions from most informative to least informative, we can perform dimensionality reduction during loading. This is very similar to how matryoshka embeddings work.
154
+
155
+
```python
156
+
from model2vec import StaticModel
157
+
158
+
model = StaticModel.from_pretrained("minishlab/potion-base-8m", dimensionality=32)
159
+
160
+
print(model.embedding.shape)
161
+
# (29528, 32)
162
+
```
163
+
164
+
### Combining quantization and dimensionality reduction
165
+
166
+
Combining these tricks can lead to extremely small models. For example, using this, we can reduce the size of `potion-base-8m`, which is now 30MB, to only 1MB:
167
+
168
+
```python
169
+
model = StaticModel.from_pretrained("minishlab/potion-base-8m",
170
+
dimensionality=32,
171
+
quantize_to="int8")
172
+
print(model.embedding.nbytes)
173
+
# 944896 bytes = 944kb
174
+
```
175
+
176
+
This should be enough to satisfy even the strongest hardware constraints.
0 commit comments