Changelog

All notable changes to Headroom will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

Unreleased

Fixed

headroom unwrap codex now actually undoes headroom wrap codex — previously there was no unwrap codex subcommand at all, so the injected model_provider = "headroom" / [model_providers.headroom] block stayed in ~/.codex/config.toml forever and Codex continued routing through the (potentially stopped) proxy, surfacing as Missing environment variable: OPENAI_API_KEY. wrap codex now snapshots the pre-wrap config.toml to config.toml.headroom-backup before its first injection, and unwrap codex restores that snapshot byte-for-byte (or, if the backup is missing, strips only the Headroom-managed block and leaves surrounding user content intact). Safe no-op when run without a prior wrap. Reported by @raenaryl in Discord.
headroom learn no longer clobbers prior recommendations on re-run — the marker block in CLAUDE.md / MEMORY.md is now merged with the prior block instead of wholesale-replaced. Sections re-surfaced by the new run win; sections not re-surfaced are carried forward so learnings accumulate across runs instead of disappearing. To fully rebuild the block, delete it manually and re-run. (#231)
headroom learn no longer emits dangling cross-references when a section is re-surfaced — the analyzer now includes the project's current  block (from CLAUDE.md and MEMORY.md) in the LLM digest as a "Prior Learned Patterns" section, and the system prompt instructs the LLM that re-emitting a section replaces the prior one wholesale. Prevents bullets like "X is also large — same rule as Y, Z" from appearing after Y and Z got dropped during per-section replacement. The writer's section-level carry-forward from #231 remains in place as a safety net for sections the LLM omits entirely. New helper extract_marker_block added to headroom.learn.writer.

Added

turn_id linking agent-loop API calls to a single user prompt — a new compute_turn_id(model, system, messages) helper in headroom/proxy/helpers.py hashes the message prefix up to and including the last user-text message, yielding an id that is stable across every agent-loop iteration of one prompt but rolls over when the user sends a new prompt (or runs /compact, /clear). RequestLog gained a turn_id: str | None field, which is stamped at every log site (anthropic handler bedrock + direct branches, and the streaming handler) and surfaced as turn_id in /transformations/feed. Lets downstream consumers (e.g. the Headroom Desktop Activity tab) aggregate savings per user prompt rather than per API call.
Live flush of traffic-learned patterns to CLAUDE.md / MEMORY.md — the TrafficLearner now writes to agent-native context files continuously during proxy operation, not just at shutdown. A new dirty-flag debounced _flush_worker (10s window, FLUSH_DEBOUNCE_SECONDS) calls flush_to_file() whenever _accumulate() marks the learner dirty, so patterns surface in CLAUDE.md / MEMORY.md near real-time. Flushes read both persisted rows (via _load_persisted_patterns_from_sqlite) and the in-memory accumulator, bucket patterns by project via the learn plugin registry (plugin.discover_projects() + longest-path anchoring in _project_for_pattern), and route by PatternCategory to the correct file (_patterns_to_recommendations + _CATEGORY_TO_TARGET). Live flushes require evidence_count >= 2; the shutdown flush accepts single-evidence rows.

Fixed

Traffic-learner evidence count stuck at 1; duplicate DB rows across restarts. _accumulate queued patterns with the default ExtractedPattern.evidence_count = 1 regardless of how many times the pattern was actually seen, so every persisted row landed at 1 and never crossed the live-flush gate (evidence_count >= 2). Worse, once a pattern was in _saved_hashes it was early-returned on every re-sighting, and _saved_hashes reset on process restart — so a second sighting in a later session inserted a duplicate row rather than bumping the existing one. Now: _accumulate writes the real accumulated count at save time, start() hydrates _saved_hashes + a new _persisted_ids map from the DB, and re-sightings bump the persisted row's metadata.evidence_count via an atomic json_set UPDATE (_bump_persisted_evidence). _load_persisted_patterns_from_sqlite now filters via json_extract(metadata, '$.source') instead of a LIKE on the raw JSON string, so rows survive metadata rewrites.

Added

Telemetry stack & install-mode identity fields — anonymous beacon now reports headroom_stack (how Headroom is invoked: proxy, wrap_claude, adapter_ts_openai, ...) and install_mode (wrapped / persistent / on_demand), plus requests_by_stack for proxies that serve multiple integrations. Proxy exposes a by_stack bucket alongside by_provider / by_model on /stats, a matching headroom_requests_by_stack Prometheus counter, and an X-Headroom-Stack header honored by the FastAPI middleware. headroom wrap <tool> sets HEADROOM_STACK=wrap_<agent>; the TS SDK and all four adapters (openai, anthropic, gemini, vercel-ai) tag their compress calls. Schema migration: sql/upgrade_telemetry_stack_context.sql.
Canonical filesystem contract (issue #175) — new HEADROOM_CONFIG_DIR (default ~/.headroom/config, read-mostly) and HEADROOM_WORKSPACE_DIR (default ~/.headroom, read-write state) env vars recognized by the Python proxy/CLI and the npm SDK. Additive; all existing per-resource env vars (HEADROOM_SAVINGS_PATH, HEADROOM_TOIN_PATH, HEADROOM_SUBSCRIPTION_STATE_PATH, HEADROOM_MODEL_LIMITS) continue to work with identical semantics. Docker install scripts and docker-compose.native.yml forward the new vars into containers so savings, logs, and telemetry resolve to the bind-mounted .headroom path. See wiki/filesystem-contract.md.

Changed

/stats-history now returns compact checkpoint history by default — the JSON response keeps recent checkpoints dense while evenly sampling older checkpoints so long-running installs do not return ever-growing payloads. Add history_mode=full to fetch the full retained checkpoint list, or history_mode=none to skip it entirely while still receiving the derived hourly/daily/weekly/monthly rollups. Responses now include a history_summary block describing stored versus returned points.

Fixed

Streaming Anthropic requests are now visible to /stats.recent_requests and /transformations/feed — _finalize_stream_response did not call self.logger.log(...), so the entire streaming Anthropic code path (the one Claude Code uses) silently bypassed the request logger. Only the non-streaming Anthropic path and the Bedrock streaming path were logged. As a consequence, --log-messages had no observable effect on the live transformations feed for typical traffic. The streaming finalizer now emits the same RequestLog shape the other paths do, including request_messages when log_full_messages is enabled.

[0.5.22] - 2026-04-11

Added

Cross-agent memory — Claude saves a fact, Codex reads it back. All agents sharing one proxy share one memory store. Project-scoped DB at .headroom/memory.db, auto user_id from $USER.
Agent provenance tracking — every memory records which agent saved it (source_agent, source_provider, created_via), with edit history on updates.
LLM-mediated dedup — on memory_save, enriched response hints similar existing memories to the LLM. Background async dedup auto-removes >92% cosine duplicates. Zero extra LLM calls.
Memory for OpenAI and Gemini handlers — context injection + tool handling wired into all three provider handlers (Anthropic, OpenAI, Gemini).
Plugin architecture for headroom learn — each agent (Claude, Codex, Gemini) is a self-contained plugin. External plugins register via headroom.learn_plugin entry points. --agent flag for CLI.
GeminiScanner for headroom learn — reads ~/.gemini/tmp/*/chats/session-*.json and .jsonl.
Code graph integration — headroom wrap claude --code-graph auto-indexes the project via codebase-memory-mcp for call-chain traversal, impact analysis, and architectural queries. Opt-in, ~200 token overhead with Claude Code's MCP Tool Search.
OpenAI embedder auto-detection — memory backend uses OpenAI embeddings when sentence-transformers is unavailable (no torch/2GB dependency needed).
Live traffic learning flush — headroom wrap <agent> --learn flushes learned patterns to the correct agent-native file (MEMORY.md / AGENTS.md / GEMINI.md) at proxy shutdown.

Changed

CodeCompressor disabled by default — AST-based code compression produced invalid syntax on 40% of real files. Code now passes through uncompressed. Use --code-graph for code intelligence instead, or re-enable with --code-aware.
Shared tool name map — consolidated tool normalization across all learn plugins into _shared.py.
Dynamic CLI agent detection — headroom learn discovers agents via plugin registry, no hardcoded choices.

Fixed

CodeCompressor statement-based truncation — body truncation now walks AST statements (not lines), never cuts mid-expression. Fixes syntax errors on multi-line dict literals and function calls.
Docstring FIRST_LINE mode — uses source lines directly instead of reconstructing from byte offsets. Properly handles all quote styles.
Memory shutdown queue drain — patterns in the save queue were lost on proxy shutdown. Now drained before exit.

Unreleased

Added

Codex-proxy resilience hardening — reduces event-loop starvation under cold-start reconnect storms
- Stage-timing instrumentation — per-stage durations for both Codex WS accept and Anthropic /v1/messages pre-upstream phases emitted as a single STAGE_TIMINGS structured log line per request plus Prometheus histograms
- Per-pipeline shared warmup — Anthropic + OpenAI pipelines eagerly load compressors/parsers once at startup; status merged into WarmupRegistry for /debug/warmup and /readyz
- WS session registry — first-class tracking of active Codex WS sessions with deterministic relay-task cancellation and termination-cause classification (client_disconnect, upstream_error, client_timeout, etc.)
- Bounded pre-upstream Anthropic concurrency — --anthropic-pre-upstream-concurrency / HEADROOM_ANTHROPIC_PRE_UPSTREAM_CONCURRENCY caps simultaneous /v1/messages pre-upstream work (body read, deep copy, first compression stage, memory-context lookup, upstream connect) so replay storms cannot starve /livez, /readyz, and new Codex WS opens. Default: auto max(2, min(8, cpu_count)); 0 or negative disables (unbounded)
- Loopback-only debug endpoints — /debug/tasks, /debug/ws-sessions, /debug/warmup return 404 (not 403) to non-loopback callers so external scanners cannot enumerate them
- Reconnect-storm repro harness — scripts/repro_codex_replay.py drives concurrent WS + HTTP replay traffic against a local proxy and asserts /livez p99 under threshold; --json output routes JSON to stdout and the human summary to stderr
Proxy liveness and readiness health checks
- Adds GET /livez for process liveness and GET /readyz for traffic readiness
- Keeps GET /health backward compatible while expanding it with readiness details and subsystem checks
- Eagerly initializes configured memory backends during proxy startup so readiness reflects real serving capability
- Wires /readyz into the Docker image HEALTHCHECK and the example docker-compose.yml
Durable proxy savings history
- Persists proxy compression savings history locally at ~/.headroom/proxy_savings.json
- Supports HEADROOM_SAVINGS_PATH to override the storage location
- Adds /stats-history with lifetime totals plus hourly/daily/weekly/monthly rollups
- Supports JSON and CSV export from /stats-history
- Extends /stats with a persistent_savings block while keeping savings_history backward compatible
- Adds a historical mode to /dashboard backed by /stats-history, including export actions
Proxy telemetry SDK override via HEADROOM_SDK
- Downstream apps can override the anonymous telemetry sdk field without patching installed files
- Blank values fall back to the default proxy label
headroom learn — Offline failure learning for coding agents
- Analyzes past conversation history (Claude Code, extensible to Cursor/Codex)
- Success correlation: for each failure, finds what succeeded after and extracts the specific correction
- 5 analyzers: Environment, Structure, Command Patterns, Retry Prevention, Cross-Session
- Writes specific learnings to CLAUDE.md (stable project facts) and MEMORY.md (session patterns)
- Generic architecture: tool-agnostic ToolCall model, pluggable Scanner/Writer adapters
- Dry-run by default, --apply to write, --all for all projects
- Example output: "FirstClassEntity.java is not at axion-formats/ — actually at axion-scala-common/"
Read Lifecycle Management — Event-driven compression of stale/superseded Read outputs
- Detects when a Read output becomes stale (file was edited after) or superseded (file was re-read)
- Replaces stale/superseded content with compact CCR markers, stores originals for retrieval
- 75% of Read output bytes are provably stale or redundant (from real-world analysis of 66K tool calls)
- Fresh Reads (latest read, no subsequent edit) are never touched — Edit safety preserved
- Opt-in via ReadLifecycleConfig(enabled=True), disabled by default
- Handles both OpenAI and Anthropic message formats
any-llm backend - Route requests through 38+ LLM providers (OpenAI, Mistral, Groq, Ollama, etc.) via any-llm
- Enable with --backend anyllm --anyllm-provider <provider>
- Install with: pip install 'headroom-ai[anyllm]'
Production-ready proxy server with caching, rate limiting, and metrics
CLI command headroom proxy to start the proxy server
IntelligentContextManager (semantic-aware context management)
- Multi-factor importance scoring: recency, semantic similarity, TOIN importance, error indicators, forward references, token density
- No hardcoded patterns - all importance signals learned from TOIN or computed from metrics
- TOIN integration for retrieval_rate and field_semantics-based scoring
- Strategy selection: NONE, COMPRESS_FIRST, DROP_BY_SCORE based on budget overage
- Atomic tool unit handling (call + response dropped together)
- Configurable scoring weights via ScoringWeights dataclass
- IntelligentContextConfig for full configuration control
- Backwards compatible with RollingWindowConfig
LLMLingua-2 Integration (opt-in ML-based compression)
- LLMLinguaCompressor transform using Microsoft's LLMLingua-2 model
- Content-aware compression rates (code: 0.4, JSON: 0.35, text: 0.3)
- Memory management utilities: unload_llmlingua_model(), is_llmlingua_model_loaded()
- Proxy integration via --llmlingua flag
- Device selection: --llmlingua-device (auto/cuda/cpu/mps)
- Custom compression rate: --llmlingua-rate
- Helpful startup hints when llmlingua is available but not enabled
- Install with: pip install headroom-ai[llmlingua]
Code-Aware Compression (AST-based, syntax-preserving)
- CodeAwareCompressor transform using tree-sitter for AST parsing
- Supports Python, JavaScript, TypeScript, Go, Rust, Java, C, C++
- Preserves imports, function signatures, type annotations, error handlers
- Compresses function bodies while maintaining structural integrity
- Guarantees syntactically valid output (no broken code)
- Automatic language detection from code patterns
- Memory management: is_tree_sitter_available(), unload_tree_sitter()
- Uses tree-sitter-language-pack for broad language support
- Install with: pip install headroom-ai[code]
ContentRouter (intelligent compression orchestrator)
- Auto-routes content to optimal compressor based on type detection
- Source hint support for high-confidence routing (file paths, tool names)
- Handles mixed content (e.g., markdown with code blocks)
- Strategies: CODE_AWARE, SMART_CRUSHER, SEARCH, LOG, TEXT, LLMLINGUA
- Configurable strategy preferences and fallbacks
- Routing decision log for transparency and debugging
Custom Model Configuration
- Support for new models: Claude 4.5 (Opus), Claude 4 (Sonnet, Haiku), o3, o3-mini
- Pattern-based inference for unknown models (opus/sonnet/haiku tiers)
- Custom model config via HEADROOM_MODEL_LIMITS environment variable
- Config file support: ~/.headroom/models.json
- Graceful fallback for unknown models (no crashes)
- Updated pricing data for all current models

Fixed

Event.wait task leak in subscription trackers — asyncio.shield pattern prevents cancellation of the outer wait_for from leaking the inner Event.wait task
Python 3.10 compatibility for memory-context fail-open — catches asyncio.TimeoutError (the 3.10-compatible alias) rather than TimeoutError to preserve behaviour on older runtimes
uvicorn proxy_headers=False — refuses Forwarded / X-Forwarded-For rewrites so the loopback guard on /debug/* cannot be spoofed by a misconfigured reverse proxy
First-frame timeout for Codex WS accepts — guards against a client that opens a handshake and never sends the first frame; relays cancel deterministically with client_timeout
Semaphore leak on unexpected exception in Anthropic pre-upstream path — the finalizer now releases the pre-upstream semaphore on every exit path (early 4xx, cache hit, upstream error, streaming handoff)
active_relay_tasks gauge double-decrement — deregister_and_count returns (handle, released_task_count) atomically so the handler decrements the Prometheus gauge by the exact number it registered, eliminating drift

Internal

IPv6-mapped loopback recognition — the loopback guard parses ::ffff:127.0.0.1 and other dual-stack literals through ipaddress.ip_address(...).is_loopback
Lock-free stage-timing accumulators — record_stage_timings writes to per-path counters that do not contend with /metrics export or record_request
Narrow contextlib.suppress in relay classification — only CancelledError is suppressed where we reclassify it; other exceptions propagate so termination cause stays truthful
jitter_delay_ms helper — shared exponential-backoff + 50-150% jitter formula in headroom/proxy/helpers.py; used by three proxy retry sites and mirrored inline in the repro harness

0.2.0 - 2025-01-07

Added

SmartCrusher: Statistical compression for tool outputs
- Keeps first/last K items, errors, anomalies, and relevance matches
- Variance-based change point detection
- Pattern detection (time series, logs, search results)
Relevance Scoring Engine: ML-powered item relevance
- BM25Scorer: Fast keyword matching (zero dependencies)
- EmbeddingScorer: Semantic similarity with sentence-transformers
- HybridScorer: Adaptive combination of both methods
CacheAligner: Prefix stabilization for better cache hits
- Dynamic date extraction
- Whitespace normalization
- Stable prefix hashing
RollingWindow: Context management within token limits
- Drops oldest tool units first
- Never orphans tool results
- Preserves recent turns
Multi-Provider Support:
- Anthropic with official count_tokens API
- Google with official countTokens API
- Cohere with official tokenize API
- Mistral with official tokenizer
- LiteLLM for unified interface
Integrations:
- LangChain callback handler (HeadroomOptimizer)
- MCP (Model Context Protocol) utilities
Proxy Server (headroom.proxy):
- Semantic caching with LRU eviction
- Token bucket rate limiting
- Retry with exponential backoff
- Cost tracking with budget enforcement
- Prometheus metrics endpoint
- Request logging (JSONL)
Pricing Registry: Centralized model pricing with staleness tracking
Benchmarks: Performance benchmarks for transforms and relevance scoring

Changed

Improved token counting accuracy across all providers
Enhanced tool output compression with relevance-aware selection

Fixed

Mistral tokenizer API compatibility
Google token counting for multi-turn conversations

0.1.0 - 2025-01-05

Added

Initial release
HeadroomClient: OpenAI-compatible client wrapper
ToolCrusher: Basic tool output compression
Audit mode for observation without modification
Optimize mode for applying transforms
Simulate mode for previewing changes
SQLite and JSONL storage backends
HTML report generation
Streaming support

Safety Guarantees

Never removes human content
Never breaks tool ordering
Parse failures are no-ops
Preserves recency (last N turns)

Migration Guide

From 0.1.x to 0.2.x

The 0.2.0 release is backward compatible. New features are opt-in:

# Old code still works
from headroom import HeadroomClient, OpenAIProvider

# New SmartCrusher (replaces ToolCrusher for better compression)
from headroom import SmartCrusher, SmartCrusherConfig

config = SmartCrusherConfig(
    min_tokens_to_crush=200,
    max_items_after_crush=50,
)
crusher = SmartCrusher(config)

# New relevance scoring
from headroom import create_scorer

scorer = create_scorer("hybrid")  # or "bm25" for zero deps

Using the Proxy

New in 0.2.0 - run Headroom as a proxy server:

# Start the proxy
python -m headroom.proxy.server --port 8787

# Use with Claude Code
ANTHROPIC_BASE_URL=http://localhost:8787 claude

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Changelog

Unreleased

Fixed

Added

Fixed

Added

Changed

Fixed

[0.5.22] - 2026-04-11

Added

Changed

Fixed

Unreleased

Added

Fixed

Internal

0.2.0 - 2025-01-07

Added

Changed

Fixed

0.1.0 - 2025-01-05

Added

Safety Guarantees

Migration Guide

From 0.1.x to 0.2.x

Using the Proxy

Uh oh!

FilesExpand file tree

CHANGELOG.md

Latest commit

History

CHANGELOG.md

File metadata and controls

Changelog

Unreleased

Fixed

Added

Fixed

Added

Changed

Fixed

[0.5.22] - 2026-04-11

Added

Changed

Fixed

Unreleased

Added

Fixed

Internal

0.2.0 - 2025-01-07

Added

Changed

Fixed

0.1.0 - 2025-01-05

Added

Safety Guarantees

Migration Guide

From 0.1.x to 0.2.x

Using the Proxy