Skip to content

Commit afebcba

Browse files
Document vLLM integration as opt-in in README
The eager loading strategy is slower than mmap with warm page cache on most container providers. Document it as not-on-by-default, with clear guidance on when it actually helps (cold reads from NFS/JuiceFS). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 1f3d10e commit afebcba

1 file changed

Lines changed: 16 additions & 0 deletions

File tree

README.md

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -167,6 +167,22 @@ cache.list_entries() # Show cached models
167167
cache.auto_evict(max_size_bytes=50e9) # LRU eviction to stay under 50GB
168168
```
169169

170+
### vLLM Integration
171+
172+
A custom model loader for vLLM that switches safetensors loading from mmap to eager read on network filesystems (NFS, JuiceFS, CIFS) where mmap page faults are 30-50x slower.
173+
174+
**Not enabled by default.** On most container providers (RunPod, Vast.ai), the kernel page cache makes mmap fast enough. The eager path only helps on cold reads from truly slow network storage.
175+
176+
```bash
177+
# Opt in via --load-format
178+
vllm serve Qwen/Qwen2.5-7B --load-format zerostart
179+
180+
# Or via env var
181+
ZEROSTART_EAGER=1 vllm serve Qwen/Qwen2.5-7B --load-format zerostart
182+
```
183+
184+
Auto-registers via vLLM's plugin system when zerostart is installed (`pip install zerostart`).
185+
170186
### Serving Integration
171187

172188
For custom serving stacks:

0 commit comments

Comments
 (0)