Skip to content

Commit 69f24bf

Browse files
authored
Merge branch 'main' into wrong-component-pipeline-quant
2 parents e1084af + 195926b commit 69f24bf

17 files changed

Lines changed: 365 additions & 193 deletions

File tree

docs/source/en/api/pipelines/chroma.md

Lines changed: 49 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -27,9 +27,36 @@ Chroma can use all the same optimizations as Flux.
2727

2828
</Tip>
2929

30-
## Inference (Single File)
30+
## Inference
3131

32-
The `ChromaTransformer2DModel` supports loading checkpoints in the original format. This is also useful when trying to load finetunes or quantized versions of the models that have been published by the community.
32+
The Diffusers version of Chroma is based on the [`unlocked-v37`](https://huggingface.co/lodestones/Chroma/blob/main/chroma-unlocked-v37.safetensors) version of the original model, which is available in the [Chroma repository](https://huggingface.co/lodestones/Chroma).
33+
34+
```python
35+
import torch
36+
from diffusers import ChromaPipeline
37+
38+
pipe = ChromaPipeline.from_pretrained("lodestones/Chroma", torch_dtype=torch.bfloat16)
39+
pipe.enabe_model_cpu_offload()
40+
41+
prompt = [
42+
"A high-fashion close-up portrait of a blonde woman in clear sunglasses. The image uses a bold teal and red color split for dramatic lighting. The background is a simple teal-green. The photo is sharp and well-composed, and is designed for viewing with anaglyph 3D glasses for optimal effect. It looks professionally done."
43+
]
44+
negative_prompt = ["low quality, ugly, unfinished, out of focus, deformed, disfigure, blurry, smudged, restricted palette, flat colors"]
45+
46+
image = pipe(
47+
prompt=prompt,
48+
negative_prompt=negative_prompt,
49+
generator=torch.Generator("cpu").manual_seed(433),
50+
num_inference_steps=40,
51+
guidance_scale=3.0,
52+
num_images_per_prompt=1,
53+
).images[0]
54+
image.save("chroma.png")
55+
```
56+
57+
## Loading from a single file
58+
59+
To use updated model checkpoints that are not in the Diffusers format, you can use the `ChromaTransformer2DModel` class to load the model from a single file in the original format. This is also useful when trying to load finetunes or quantized versions of the models that have been published by the community.
3360

3461
The following example demonstrates how to run Chroma from a single file.
3562

@@ -38,34 +65,39 @@ Then run the following example
3865
```python
3966
import torch
4067
from diffusers import ChromaTransformer2DModel, ChromaPipeline
41-
from transformers import T5EncoderModel
4268

43-
bfl_repo = "black-forest-labs/FLUX.1-dev"
69+
model_id = "lodestones/Chroma"
4470
dtype = torch.bfloat16
4571

46-
transformer = ChromaTransformer2DModel.from_single_file("https://huggingface.co/lodestones/Chroma/blob/main/chroma-unlocked-v35.safetensors", torch_dtype=dtype)
47-
48-
text_encoder = T5EncoderModel.from_pretrained(bfl_repo, subfolder="text_encoder_2", torch_dtype=dtype)
49-
tokenizer = T5Tokenizer.from_pretrained(bfl_repo, subfolder="tokenizer_2", torch_dtype=dtype)
50-
51-
pipe = ChromaPipeline.from_pretrained(bfl_repo, transformer=transformer, text_encoder=text_encoder, tokenizer=tokenizer, torch_dtype=dtype)
72+
transformer = ChromaTransformer2DModel.from_single_file("https://huggingface.co/lodestones/Chroma/blob/main/chroma-unlocked-v37.safetensors", torch_dtype=dtype)
5273

74+
pipe = ChromaPipeline.from_pretrained(model_id, transformer=transformer, torch_dtype=dtype)
5375
pipe.enable_model_cpu_offload()
5476

55-
prompt = "A cat holding a sign that says hello world"
77+
prompt = [
78+
"A high-fashion close-up portrait of a blonde woman in clear sunglasses. The image uses a bold teal and red color split for dramatic lighting. The background is a simple teal-green. The photo is sharp and well-composed, and is designed for viewing with anaglyph 3D glasses for optimal effect. It looks professionally done."
79+
]
80+
negative_prompt = ["low quality, ugly, unfinished, out of focus, deformed, disfigure, blurry, smudged, restricted palette, flat colors"]
81+
5682
image = pipe(
57-
prompt,
58-
guidance_scale=4.0,
59-
output_type="pil",
60-
num_inference_steps=26,
61-
generator=torch.Generator("cpu").manual_seed(0)
83+
prompt=prompt,
84+
negative_prompt=negative_prompt,
85+
generator=torch.Generator("cpu").manual_seed(433),
86+
num_inference_steps=40,
87+
guidance_scale=3.0,
6288
).images[0]
6389

64-
image.save("image.png")
90+
image.save("chroma-single-file.png")
6591
```
6692

6793
## ChromaPipeline
6894

6995
[[autodoc]] ChromaPipeline
7096
- all
7197
- __call__
98+
99+
## ChromaImg2ImgPipeline
100+
101+
[[autodoc]] ChromaImg2ImgPipeline
102+
- all
103+
- __call__

docs/source/en/optimization/memory.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -302,6 +302,13 @@ compute-bound, [group-offloading](#group-offloading) tends to be better. Group o
302302

303303
</Tip>
304304

305+
### Offloading to disk
306+
307+
Group offloading can consume significant system RAM depending on the model size. In limited RAM environments,
308+
it can be useful to offload to the second memory, instead. You can do this by setting the `offload_to_disk_path`
309+
argument in either of [`~ModelMixin.enable_group_offload`] or [`~hooks.apply_group_offloading`]. Refer [here](https://github.com/huggingface/diffusers/pull/11682#issue-3129365363) and
310+
[here](https://github.com/huggingface/diffusers/pull/11682#issuecomment-2955715126) for the expected speed-memory trade-offs with this option enabled.
311+
305312
## Layerwise casting
306313

307314
Layerwise casting stores weights in a smaller data format (for example, `torch.float8_e4m3fn` and `torch.float8_e5m2`) to use less memory and upcasts those weights to a higher precision like `torch.float16` or `torch.bfloat16` for computation. Certain layers (normalization and modulation related weights) are skipped because storing them in fp8 can degrade generation quality.

examples/server/requirements.txt

Lines changed: 46 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,8 @@ annotated-types==0.7.0
1010
# via pydantic
1111
anyio==4.6.2.post1
1212
# via starlette
13+
async-timeout==4.0.3
14+
# via aiohttp
1315
attrs==24.2.0
1416
# via aiohttp
1517
certifi==2024.8.30
@@ -18,13 +20,16 @@ charset-normalizer==3.4.0
1820
# via requests
1921
click==8.1.7
2022
# via uvicorn
23+
exceptiongroup==1.3.0
24+
# via anyio
2125
fastapi==0.115.3
2226
# via -r requirements.in
2327
filelock==3.16.1
2428
# via
2529
# huggingface-hub
2630
# torch
2731
# transformers
32+
# triton
2833
frozenlist==1.5.0
2934
# via
3035
# aiohttp
@@ -54,10 +59,41 @@ multidict==6.1.0
5459
# via
5560
# aiohttp
5661
# yarl
57-
networkx==3.4.2
62+
networkx==3.2.1
5863
# via torch
59-
numpy==2.1.2
64+
numpy==2.0.2
6065
# via transformers
66+
nvidia-cublas-cu12==12.1.3.1
67+
# via
68+
# nvidia-cudnn-cu12
69+
# nvidia-cusolver-cu12
70+
# torch
71+
nvidia-cuda-cupti-cu12==12.1.105
72+
# via torch
73+
nvidia-cuda-nvrtc-cu12==12.1.105
74+
# via torch
75+
nvidia-cuda-runtime-cu12==12.1.105
76+
# via torch
77+
nvidia-cudnn-cu12==9.1.0.70
78+
# via torch
79+
nvidia-cufft-cu12==11.0.2.54
80+
# via torch
81+
nvidia-curand-cu12==10.3.2.106
82+
# via torch
83+
nvidia-cusolver-cu12==11.4.5.107
84+
# via torch
85+
nvidia-cusparse-cu12==12.1.0.106
86+
# via
87+
# nvidia-cusolver-cu12
88+
# torch
89+
nvidia-nccl-cu12==2.20.5
90+
# via torch
91+
nvidia-nvjitlink-cu12==12.9.86
92+
# via
93+
# nvidia-cusolver-cu12
94+
# nvidia-cusparse-cu12
95+
nvidia-nvtx-cu12==12.1.105
96+
# via torch
6197
packaging==24.1
6298
# via
6399
# huggingface-hub
@@ -109,14 +145,21 @@ tqdm==4.66.5
109145
# transformers
110146
transformers==4.46.1
111147
# via -r requirements.in
148+
triton==3.0.0
149+
# via torch
112150
typing-extensions==4.12.2
113151
# via
152+
# anyio
153+
# exceptiongroup
114154
# fastapi
115155
# huggingface-hub
156+
# multidict
116157
# pydantic
117158
# pydantic-core
159+
# starlette
118160
# torch
119-
urllib3==2.2.3
161+
# uvicorn
162+
urllib3==2.5.0
120163
# via requests
121164
uvicorn==0.32.0
122165
# via -r requirements.in

0 commit comments

Comments
 (0)