HyperFlow is a framework for building agents that rewrite and test their own improvements.
Instead of manually retuning prompts and logic after every failure, HyperFlow runs a self-improvement loop where an agent evaluates what happened, edits its own code, tools, and prompts, then tests the new version in a sandbox.
The core idea is simple: do not just rerun the same workflow. Learn from execution and get better over time.
Built on LangChain and LangGraph. Inspired by HyperAgents (Meta Research, 2026).
⚠️ EXPERIMENTAL: This project is currently in an experimental phase and is not recommended for production use.
HyperFlow runs an evolutionary self-improvement loop with two roles:
- TaskAgent solves the domain problem
- MetaAgent studies evaluation results and improves the system
Each generation:
- Select a parent generation from the archive
- MetaAgent reads past evaluation scores and edits the source code
- Evaluation scripts run in a sandbox to score the new agent
- Better agents are added back to the archive for future generations
It is self-referential: the mechanism that improves the agent is itself part of the editable code.
Important
This framework is currently in an Experimental state. See Limitations for more information.
The TaskAgent gets better over generations without manual intervention.
New here? Read docs/concepts.md for a detailed explanation of every concept with examples.
# Install from PyPI
pip install hyperflow-ai
# Or install from source for development
pip install -e .- Python 3.11+
- At least one LLM provider API key (e.g.
OPENAI_API_KEY,ANTHROPIC_API_KEY)
# Set your API key
export OPENAI_API_KEY="sk-..."
# Run the bash example (single eval)
cd examples/bash
python run.py
# Run with evolutionary loop
python run.py evolvehyperflow/
__init__.py # Public API re-exports
agent/
base_agent.py # Abstract AgentSystem base class
llm.py # Multi-provider LLM factory
llm_with_tools.py # LangGraph ReAct chat loop
meta_agent.py # MetaAgent (mutation operator)
task_agent.py # TaskAgent (task solver)
tool_registry.py # Tool registration
core/
ensemble.py # Best-of-archive ensemble
generate_loop.py # Main evolutionary loop
select_parent.py # Parent selection strategies
domains/
base.py # Domain/DomainTask/EvalResult interfaces
evaluators.py # Static, LLM judge, human evaluators
harness.py # Evaluation harness
report.py # Report generation
prompts/
llm_judge.py # LLM judge prompt template
meta_agent.py # MetaAgent prompt template
task_agent.py # TaskAgent prompt template
tools/
__init__.py # get_framework_tools()
bash.py # Bash shell tool
editor.py # File editor tool
utils/
archive.py # JSONL archive CRUD
common.py # JSON extraction, file helpers
constants.py # Shared constants
docker.py # Docker container management
executor.py # Local/Docker execution
git.py # Git operations
examples/
bash/ # Bash command generation
calculator/ # Buggy tool fix demo
factcheck/ # True/false classification
git_evolution/ # Git-based evolution with patches
paper_review/ # Paper accept/reject prediction
scoring/ # Math grading self-improvement
from hyperflow import MODELS
# Available model presets
MODELS["OPENAI_GPT4O"] # "openai/gpt-4o"
MODELS["OPENAI_GPT4O_MINI"] # "openai/gpt-4o-mini"
MODELS["OPENAI_O3"] # "openai/o3"
MODELS["OPENAI_O4_MINI"] # "openai/o4-mini"
MODELS["CLAUDE_SONNET"] # "anthropic/claude-sonnet-4-5-20250929"
MODELS["GEMINI_PRO"] # "gemini/gemini-2.5-pro"
MODELS["OLLAMA_LLAMA3"] # "ollama/llama3"Or use any "provider/model-name" string.
| Variable | Description |
|---|---|
OPENAI_API_KEY |
OpenAI API key |
ANTHROPIC_API_KEY |
Anthropic API key |
GOOGLE_API_KEY |
Google Gemini API key |
OLLAMA_BASE_URL |
Ollama server URL (default: http://localhost:11434) |
HYPERFLOW_MODEL |
Default model for examples (e.g. openai/gpt-4o) |
cd examples/bash && python run.py
cd examples/factcheck && python run.py
cd examples/paper_review && python run.pycd examples/bash && python run.py evolve
cd examples/factcheck && python run.py evolve
cd examples/scoring && python run.py
cd examples/calculator && python run.py
cd examples/git_evolution && python run.pycd examples/git_evolution && python run.py # 2 generations
cd examples/git_evolution && python run.py 5 # 5 generations
cd examples/git_evolution && python run.py --reset # start overMIT
If you use this framework in your research, please cite the original HyperAgents paper:
@misc{zhang2026hyperagents,
title={Hyperagents},
author={Jenny Zhang and Bingchen Zhao and Wannan Yang and Jakob Foerster and Jeff Clune and Minqi Jiang and Sam Devlin and Tatiana Shavrina},
year={2026},
eprint={2603.19461},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2603.19461},
}