Skip to content

Commit 65cfb13

Browse files
committed
docs: update performance numbers and architecture for fasthttp migration
- Update benchmark numbers: ~141k req/sec (55% faster than Bun) - Update architecture diagram: post-processing compress, ctx.SetBody() fast path - Update key design decisions: fasthttp engine, tcp4 listener, custom Range impl - Update DoS mitigations: MaxRequestBodySize replaces MaxHeaderBytes - Update landing page (docs/index.html): hero stats, meta tags, JSON-LD, perf cards - Update USER_GUIDE.md: preload section, GC tuning notes - Add CHANGELOG.md v1.3.0 entry with full migration details
1 parent 9596966 commit 65cfb13

4 files changed

Lines changed: 80 additions & 60 deletions

File tree

CHANGELOG.md

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,21 @@
1+
## v1.3.0 (2026-03-08)
2+
3+
### Perf
4+
5+
- **server**: migrate HTTP layer from net/http to fasthttp — ~141k req/sec (55% faster than Bun)
6+
- **server**: use `tcp4` listener to eliminate dual-stack overhead (2x throughput gain on macOS)
7+
8+
### Refactor
9+
10+
- **handler**: replace `http.ServeContent` with custom `parseRange()`/`serveRange()` for byte-range requests
11+
- **compress**: convert gzip middleware from wrapping `ResponseWriter` to post-processing response body
12+
- **security**: use `ctx.SetStatusCode()`+`ctx.SetBodyString()` instead of `ctx.Error()` to preserve headers
13+
- **cache**: change `CachedFile` header fields from `[]string` to `string`
14+
15+
### Build
16+
17+
- **benchmark**: add fasthttp/net-http hello world baselines and update baremetal script
18+
119
## v1.2.0 (2026-03-07)
220

321
### Feat

README.md

Lines changed: 29 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# static-web
22

3-
A production-grade, high-performance static web file server written in Go. Zero external runtime dependencies beyond `BurntSushi/toml` and `hashicorp/golang-lru/v2`.
3+
A production-grade, high-performance static web file server written in Go. Built on [fasthttp](https://github.com/valyala/fasthttp) for maximum throughput — **~141k req/sec**, 55% faster than Bun's native static server.
44

55
## Table of Contents
66

@@ -57,7 +57,7 @@ static-web --help
5757
| **gzip compression** | On-the-fly via pooled `gzip.Writer`; pre-compressed `.gz`/`.br` sidecar support |
5858
| **HTTP/2** | Automatic ALPN negotiation when TLS is configured |
5959
| **Conditional requests** | ETag, `304 Not Modified`, `If-Modified-Since`, `If-None-Match` |
60-
| **Range requests** | Byte ranges via `http.ServeContent` for video and large files |
60+
| **Range requests** | Byte ranges via custom `parseRange`/`serveRange` implementation for video and large files |
6161
| **TLS 1.2 / 1.3** | Modern cipher suites; configurable cert/key paths |
6262
| **Security headers** | `X-Content-Type-Options`, `X-Frame-Options`, `Content-Security-Policy`, `Referrer-Policy`, `Permissions-Policy` |
6363
| **HSTS** | `Strict-Transport-Security` on all HTTPS responses; configurable max-age |
@@ -83,7 +83,7 @@ HTTP request
8383
└────────┬────────┘
8484
8585
┌────────▼────────┐
86-
│ loggingMiddleware │ ← pooled statusResponseWriter; logs method/path/status/duration
86+
│ loggingMiddleware │ ← logs method/path/status/duration
8787
└────────┬────────┘
8888
8989
┌────────▼────────────────────────────────────────┐
@@ -94,25 +94,25 @@ HTTP request
9494
│ • Path-safety cache (sync.Map, pre-warmed) │
9595
│ • Dotfile blocking │
9696
│ • CORS (preflight + per-origin or wildcard *) │
97-
│ • Injects validated path into context │
98-
└────────┬────────────────────────────────────────┘
99-
100-
┌────────▼────────────────────────────────────────┐
101-
│ compress.Middleware │
102-
│ • lazyGzipWriter: decides at first Write() │
103-
│ • Skips 1xx/204/304, non-compressible types │
104-
│ • Respects q=0 explicit denial │
97+
│ • Injects validated path into ctx.SetUserValue │
10598
└────────┬────────────────────────────────────────┘
10699
107100
┌────────▼────────────────────────────────────────┐
108101
│ handler.FileHandler │
109-
│ • Cache hit → direct w.Write() fast path
110-
│ • Range/conditional → http.ServeContent
102+
│ • Cache hit → direct ctx.SetBody() fast path │
103+
│ • Range/conditional → custom serveRange()
111104
│ • Cache miss → os.Stat → disk read → cache put │
112105
│ • Large files (> max_file_size) bypass cache │
113106
│ • Encoding negotiation: brotli > gzip > plain │
114107
│ • Preloaded files served instantly on startup │
115108
│ • Custom 404 page (path-validated) │
109+
└─────────────────────────────────────────────────┘
110+
111+
┌────────▼────────────────────────────────────────┐
112+
│ compress.Middleware (post-processing) │
113+
│ • Compresses response body after handler runs │
114+
│ • Skips 1xx/204/304, non-compressible types │
115+
│ • Respects q=0 explicit denial │
116116
└─────────────────────────────────────────────────┘
117117
```
118118

@@ -122,7 +122,7 @@ HTTP request
122122
GET /app.js
123123
124124
├─ cache.Get("/app.js") hit?
125-
│ YES → serveFromCache (direct w.Write, no syscall) → done
125+
│ YES → serveFromCache (direct ctx.SetBody, no syscall) → done
126126
127127
└─ NO → resolveIndexPath → cache.Get(canonicalURL) hit?
128128
YES → serveFromCache → done
@@ -137,15 +137,15 @@ When `preload = true`, every eligible file is loaded into cache at startup. The
137137

138138
### End-to-end HTTP benchmarks
139139

140-
Measured on Apple M-series, localhost (no Docker), serving 3 small static files via `bombardier -c 50 -n 100000`:
140+
Measured on Apple M-series, localhost (no Docker), serving 3 small static files via `bombardier -c 100 -n 100000`:
141141

142-
| Server | Avg Req/sec | p50 Latency | p99 Latency |
143-
|--------|-------------|-------------|-------------|
144-
| Bun (native static serve) | **~90,000** | **508 µs** | **1.10 ms** |
145-
| **static-web** (preload + GC 400) | ~76,000 | 630 µs | 1.40 ms |
146-
| **static-web** (default config) | ~50,000 | 920 µs | 3.20 ms |
142+
| Server | Avg Req/sec | p50 Latency | p99 Latency | Throughput |
143+
|--------|-------------|-------------|-------------|------------|
144+
| **static-web** (fasthttp + preload) | **~141,000** | **619 µs** | **2.46 ms** | **469 MB/s** |
145+
| Bun (native static serve) | ~90,000 | 1.05 ms | 2.33 ms | 306 MB/s |
146+
| static-web (old net/http) | ~76,000 | 1.25 ms | 3.15 ms | |
147147

148-
With `preload = true` and `gc_percent = 400`, static-web delivers ~76k req/sec — within 20% of Bun's native static serving, while offering full security headers, TLS, and compression out of the box.
148+
With `preload = true` and the fasthttp engine, static-web delivers **~141k req/sec****55% faster than Bun's native static serving**, while offering full security headers, TLS, and compression out of the box.
149149

150150
### Micro-benchmarks
151151

@@ -158,13 +158,16 @@ Measured on Apple M2 Pro (`go test -bench=. -benchtime=5s`):
158158

159159
### Key design decisions
160160

161+
- **fasthttp engine**: Built on [fasthttp](https://github.com/valyala/fasthttp) — zero-alloc request handling with pre-allocated per-connection buffers. No per-request allocations on the hot path.
162+
- **`tcp4` listener**: IPv4-only listener eliminates dual-stack overhead on macOS/Linux — a 2× throughput difference vs `"tcp"`.
161163
- **Preload at startup**: `preload = true` reads all eligible files into RAM before the first request — eliminating cold-miss latency.
162-
- **Direct `w.Write()` fast path**: cache hits bypass `http.ServeContent` entirely; pre-formatted `Content-Type` and `Content-Length` headers are assigned directly.
164+
- **Direct `ctx.SetBody()` fast path**: cache hits bypass range/conditional logic entirely; pre-formatted `Content-Type` and `Content-Length` headers are assigned directly.
165+
- **Custom Range implementation**: `parseRange()`/`serveRange()` handle byte-range requests without `http.ServeContent`.
166+
- **Post-processing compression**: compress middleware runs after the handler, compressing the response body in a single pass.
163167
- **Path-safety cache**: `sync.Map`-based cache eliminates per-request `filepath.EvalSymlinks` syscalls. Pre-warmed from preload.
164-
- **GC tuning**: `gc_percent = 400` reduces garbage collection frequency — the hot path is allocation-free, but `net/http` internals allocate per-request.
168+
- **GC tuning**: `gc_percent = 400` reduces garbage collection frequency — the hot path is allocation-free.
165169
- **Cache-before-stat**: `os.Stat` is never called on a cache hit — the hot path is pure memory.
166170
- **Zero-alloc `AcceptsEncoding`**: walks the `Accept-Encoding` header byte-by-byte without `strings.Split`.
167-
- **Pooled `sync.Pool`**: both `gzip.Writer` and `statusResponseWriter` are pooled.
168171
- **Pre-computed `ETagFull`**: the `W/"..."` string is built when the file is cached.
169172

170173
---
@@ -208,10 +211,10 @@ Only `GET`, `HEAD`, and `OPTIONS` are accepted. All other methods (including `TR
208211

209212
| Mitigation | Value |
210213
|------------|-------|
211-
| `ReadHeaderTimeout` | 5 s (Slowloris) |
212214
| `ReadTimeout` | 10 s |
213215
| `WriteTimeout` | 10 s |
214-
| `MaxHeaderBytes` | 8 KiB |
216+
| `IdleTimeout` | 75 s (keep-alive) |
217+
| `MaxRequestBodySize` | 0 (no body accepted — static server) |
215218

216219
---
217220

USER_GUIDE.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -602,7 +602,7 @@ docker kill --signal=HUP <container_id>
602602
603603
## Preloading for Maximum Performance
604604
605-
Enable `preload` to read every eligible file into the in-memory cache at startup. Combined with GC tuning, this yields the highest possible throughput — up to **~76,000 req/sec** on Apple M-series (within 20% of Bun's native static serve, while including full security headers, TLS, and compression).
605+
Enable `preload` to read every eligible file into the in-memory cache at startup. Combined with the fasthttp engine, this yields the highest possible throughput — up to **~141,000 req/sec** on Apple M-series (**55% faster than Bun's native static serve**, while including full security headers, TLS, and compression).
606606
607607
### Configuration
608608
@@ -640,7 +640,7 @@ STATIC_CACHE_PRELOAD=true STATIC_CACHE_GC_PERCENT=400 ./bin/static-web
640640
641641
### GC tuning
642642
643-
`gc_percent` sets the Go runtime `GOGC` target. A higher value means the GC runs less often, trading memory for throughput. The handler's hot path is allocation-free, but `net/http` internals allocate per-request. Recommended values:
643+
`gc_percent` sets the Go runtime `GOGC` target. A higher value means the GC runs less often, trading memory for throughput. The handler's hot path is allocation-free, and fasthttp reuses per-connection buffers (unlike net/http which allocates per-request). Recommended values:
644644
645645
| `gc_percent` | Behaviour |
646646
|---|---|

0 commit comments

Comments
 (0)