Skip to content

Commit 339f8ae

Browse files
README: clarify cold start speedup is network-dependent, document auto-detection
Cold start parallel downloads only help on bandwidth-constrained pods. On fast-network pods (~1Gbps), uv is just as fast because a single connection already saturates the link. Warm starts (4-7x) are consistent regardless of network speed. Also document the new dependency auto-detection (PEP 723, pyproject.toml, requirements.txt) and update Quick Start examples. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 78b4e73 commit 339f8ae

1 file changed

Lines changed: 33 additions & 37 deletions

File tree

README.md

Lines changed: 33 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -3,40 +3,36 @@
33
Parallel streaming wheel extraction for installing large Python packages on remote GPUs.
44

55
```bash
6-
zerostart run -p torch serve.py
6+
zerostart run serve.py
77
```
88

9-
Works on any container GPU provider — RunPod, Vast.ai, Lambda, etc.
9+
Auto-detects dependencies from PEP 723 inline metadata, `pyproject.toml`, or `requirements.txt`. Works on any container GPU provider — RunPod, Vast.ai, Lambda, etc.
1010

1111
## Benchmarks
1212

13-
Measured on RTX 4090 pods (RunPod). Results vary with network speed — slower pod networks show a larger advantage for zerostart because there's more room for parallel downloads to help.
14-
1513
### Cold Start (first run, empty cache)
1614

17-
| Workload | zerostart | uv | Speedup |
18-
|----------|-----------|-----|---------|
19-
| torch + CUDA (6.8 GB) | 33s | 98s | 3x |
20-
| vllm (9.4 GB) | 60s | 58s | ~1x |
21-
| triton (638 MB) | 3.4s | 1.0s | uv faster |
15+
Cold start speedup depends on pod network bandwidth. zerostart opens multiple parallel HTTP connections per wheel — this helps when a single connection can't saturate the link, but doesn't help when one connection already maxes out the pipe.
2216

23-
zerostart's cold start advantage comes from parallel HTTP Range requests. For large packages like torch on a bandwidth-limited connection, this matters. For small packages like triton, the overhead isn't worth it — just use uv.
17+
| Pod network | Workload | zerostart | uv | Speedup |
18+
|-------------|----------|-----------|-----|---------|
19+
| Moderate (~200 Mbps) | torch (6.8 GB) | 33s | 98s | 3x |
20+
| Moderate (~200 Mbps) | triton (638 MB) | 3.4s | 1.0s | uv faster |
21+
| Fast (~1 Gbps) | diffusers+torch (7 GB) | 57s | 57s | ~1x |
2422

25-
vllm cold starts are roughly comparable. The package set is large (177 wheels) but many are small, so uv's single-connection approach keeps up.
23+
On bandwidth-constrained pods (common with cheaper providers), parallel Range requests download large wheels 3x faster. On fast-network pods, a single connection already saturates the link and both tools finish in about the same time. For small packages, zerostart's startup overhead makes uv faster — just use uv directly.
2624

2725
### Warm Start (cached environment)
2826

27+
Warm starts are where zerostart consistently wins regardless of network speed. uv re-resolves dependencies and rebuilds the environment on every invocation. zerostart checks a cache marker and exec's Python directly.
28+
2929
| Workload | zerostart | uv | Speedup |
3030
|----------|-----------|-----|---------|
3131
| torch | 1.8s | 13.2s | 7x |
3232
| vllm | 3.3s | 14.5s | 4x |
3333
| triton | 0.2s | 1.0s | 5x |
3434

35-
Warm starts are where zerostart consistently wins. uv re-resolves dependencies and rebuilds the environment on every run. zerostart checks a cache marker and exec's Python directly — no resolution, no environment setup.
36-
37-
### Network speed matters
38-
39-
On pods with slower network (common with cheaper providers), the cold start advantage grows because parallel Range requests can saturate the link where a single connection can't. On fast-network pods (1Gbps+), uv downloads quickly enough that the parallel approach helps less.
35+
All measured on RunPod (RTX 4090 / A6000).
4036

4137
## How It Works
4238

@@ -92,38 +88,39 @@ Requires Linux + Python 3.10+ + `uv` (pre-installed on most GPU containers).
9288
## Quick Start
9389

9490
```bash
95-
# Run a script with dependencies
96-
zerostart run -p torch -p transformers serve.py
91+
# Auto-detect deps from PEP 723 metadata, pyproject.toml, or requirements.txt
92+
zerostart run serve.py
9793

98-
# Run inline
99-
zerostart run torch -- -c "import torch; print(torch.cuda.is_available())"
94+
# Add extra packages on top of auto-detected deps
95+
zerostart run -p torch serve.py
10096

101-
# With a requirements file
97+
# Explicit requirements file
10298
zerostart run -r requirements.txt serve.py
10399

100+
# Run a package directly
101+
zerostart run torch -- -c "import torch; print(torch.cuda.is_available())"
102+
104103
# Pass args to your script
105104
zerostart run serve.py -- --port 8000
106105
```
107106

108-
### PEP 723 Inline Script Metadata
107+
### Dependency Detection
109108

110-
Embed dependencies directly in your script — no `requirements.txt` needed:
109+
zerostart automatically finds dependencies — no flags needed:
111110

111+
1. **PEP 723 inline metadata** (checked first):
112112
```python
113113
# /// script
114114
# dependencies = ["torch>=2.0", "transformers", "safetensors"]
115115
# ///
116-
117116
import torch
118-
from transformers import AutoModel
119-
120-
model = AutoModel.from_pretrained("bert-base-uncased")
121-
print(f"Loaded on {model.device}")
122117
```
123118

124-
```bash
125-
zerostart run serve.py # deps auto-detected from script
126-
```
119+
2. **pyproject.toml** `[project.dependencies]` in the script's directory or parents
120+
121+
3. **requirements.txt** in the script's directory or parents
122+
123+
`-p` and `-r` flags add packages on top of whatever is auto-detected.
127124

128125
## Model Loading Acceleration
129126

@@ -212,14 +209,13 @@ Key design decisions:
212209
## When to Use It
213210

214211
**Good fit:**
215-
- Large GPU packages (torch, vllm, diffusers) on container providers with moderate network
216-
- Repeated runs where warm start time matters
217-
- Spot instances, CI/CD, autoscaling where cold starts add up
212+
- Repeated runs on the same pod — warm starts are 4-7x faster than uv
213+
- Large GPU packages on bandwidth-constrained pods — parallel downloads help when a single connection is slow
214+
- Spot instances, CI/CD, autoscaling where you restart often and warm cache pays off
218215

219216
**Not worth it:**
220-
- Small packages — uv is already fast, zerostart adds overhead
221-
- One-off scripts that don't repeat
222-
- Pods with very fast network (1Gbps+) where uv cold starts are already quick
217+
- One-off cold starts on fast-network pods — uv is just as fast
218+
- Small packages — uv is faster, zerostart adds startup overhead
223219
- Local NVMe with models in page cache
224220

225221
## Requirements

0 commit comments

Comments
 (0)