Skip to content

Commit 65de11a

Browse files
update readme
1 parent a233399 commit 65de11a

1 file changed

Lines changed: 30 additions & 36 deletions

File tree

README.md

Lines changed: 30 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -85,7 +85,21 @@ export API_KEY=your_api_key_here
8585
export API_BASE=your_api_base_url
8686
```
8787

88-
## Reproducing Paper Results
88+
## Paper Results
89+
90+
### Available Benchmark Problems
91+
92+
| Problem Category | Problem | Dimensions | Description |
93+
|-----------------|---------|------------|-------------|
94+
| **Autocorrelation** | First Autocorr Ineq | - | First autocorrelation inequality |
95+
| | Second Autocorr Ineq | - | Second autocorrelation inequality |
96+
| **Heilbronn** | Triangle | - | Heilbronn triangle problem |
97+
| | Convex | 13, 14 | Heilbronn convex hull problem |
98+
| **Max-Min Distance** | Dimension 2 | 2D | Maximize minimum distance |
99+
| | Dimension 3 | 3D | Maximize minimum distance |
100+
| **Packing** | Circle in Rectangle | - | Pack circles in rectangle |
101+
| | Circle in Square | 26, 32 | Pack N circles in unit square |
102+
| | Hexagon Packing | 11, 12 | Pack N hexagons in larger hexagon |
89103

90104
### Running a Benchmark Problem
91105

@@ -94,7 +108,7 @@ Each problem has configuration files for different LLM providers (Gemini, Qwen,
94108
```bash
95109
# Example: Circle packing in a square (26 circles) with Qwen
96110
codeevolve \
97-
--inpt_dir=problems/alphaevolve_math_problems/packing_problems/circle_packing_square/26 \
111+
--inpt_dir=problems/alphaevolve_math_problems/packing_problems/circle_packing_square/26/input \
98112
--cfg_path=problems/alphaevolve_math_problems/packing_problems/circle_packing_square/26/configs/qwen_config.yaml \
99113
--out_dir=results/circle_packing_26_qwen \
100114
--terminal_logging
@@ -107,27 +121,13 @@ codeevolve \
107121
--terminal_logging
108122
```
109123

110-
### Available Benchmark Problems
111-
112-
| Problem Category | Problem | Dimensions | Description |
113-
|-----------------|---------|------------|-------------|
114-
| **Autocorrelation** | First Autocorr Ineq | - | First autocorrelation inequality |
115-
| | Second Autocorr Ineq | - | Second autocorrelation inequality |
116-
| **Heilbronn** | Triangle | - | Heilbronn triangle problem |
117-
| | Convex | 13, 14 | Heilbronn convex hull problem |
118-
| **Max-Min Distance** | Dimension 2 | 2D | Maximize minimum distance |
119-
| | Dimension 3 | 3D | Maximize minimum distance |
120-
| **Packing** | Circle in Rectangle | - | Pack circles in rectangle |
121-
| | Circle in Square | 26, 32 | Pack N circles in unit square |
122-
| | Hexagon Packing | 11, 12 | Pack N hexagons in larger hexagon |
123-
124124
### Resuming from Checkpoints
125125

126126
To resume an interrupted run:
127127

128128
```bash
129129
codeevolve \
130-
--inpt_dir=problems/alphaevolve_math_problems/packing_problems/circle_packing_square/26 \
130+
--inpt_dir=problems/alphaevolve_math_problems/packing_problems/circle_packing_square/26/input \
131131
--out_dir=results/circle_packing_26_qwen \
132132
--load_ckpt=-1 # Load latest checkpoint
133133
```
@@ -136,32 +136,26 @@ Or load a specific checkpoint epoch:
136136

137137
```bash
138138
codeevolve \
139-
--inpt_dir=problems/alphaevolve_math_problems/packing_problems/circle_packing_square/26 \
139+
--inpt_dir=problems/alphaevolve_math_problems/packing_problems/circle_packing_square/26/input \
140140
--out_dir=results/circle_packing_26_qwen \
141141
--load_ckpt=100 # Load checkpoint from epoch 100
142142
```
143143

144-
### Exact Reproducibility
144+
### Reproducibility
145145

146-
Our experimental results were obtained using Qwen and Gemini models as the backbone for our LLM ensembles. Both models were accessed via an internal API system at Inter that routed requests to the respective LLM provider. Many commercial LLM providers do not guarantee deterministic outputs even when random seeds are provided. As a result, **exact numerical reproduction of our paper results is not guaranteed**, even when using the same configuration files and seeds. Despite these limitations, our ablation studies demonstrate that CodeEvolve consistently achieves **state-of-the-art results across multiple seeds and experimental runs** on all considered benchmarks. The core algorithmic contributions remain robust to LLM stochasticity.
146+
This repository supports two distinct notions of reproducibility:
147147

148-
## Analyzing Results
148+
#### 1) Reproducing the paper analysis (deterministic, using included artifacts)
149+
The folder `experiments/` contains the raw artifacts used in the paper (checkpoints, histories, logs). The notebook(s) in `notebooks/` analyze those artifacts to generate the plots and comparisons. Re-running the analysis should reproduce the reported figures/tables as long as your analysis environment is compatible.
149150

150-
### Using the Analysis Notebook
151+
#### 2) Re-running the full search (best-effort; exact replay depends on the LLM provider)
152+
**Exact numerical reproduction of a full evolutionary run is not guaranteed** when using hosted LLM APIs.
151153

152-
```bash
153-
# Make sure jupyter is installed
154-
conda activate codeevolve
155-
pip install jupyter matplotlib pandas
156-
157-
# Launch notebook
158-
jupyter notebook notebooks/experiment_analysis.ipynb
159-
```
154+
Why:
155+
- Many commercial LLM providers **do not support deterministic sampling** or **do not honor `seed`**.
156+
- Even when a provider accepts `seed`, outputs can vary due to backend nondeterminism (load balancing, infrastructure-level randomness, model version rollouts).
160157

161-
The notebook provides:
162-
- Solution quality over time plots
163-
- Comparison with AlphaEvolve baselines
164-
- Ablation study analysis
158+
This is not a limitation of CodeEvolve’s evolutionary framework: CodeEvolve is **seedable for its internal stochastic decisions**, and it forwards model `seed` to OpenAI-compatible endpoints when supported. The remaining nondeterminism comes from the LLM backbone/provider.
165159

166160
## Citation
167161

@@ -183,8 +177,8 @@ If you use CodeEvolve or these benchmarks in your research, please cite:
183177

184178
Experiments are versioned to match the main repository:
185179

186-
- **v0.1.0**: Initial release, corresponds to v1 of technical report
187-
- **v0.2.0**: Current release, corresponds to v3 of technical report
180+
- **v0.1.0**: Initial release, corresponds to v1 of CodeEvolve's [paper](https://arxiv.org/abs/2510.14150).
181+
- **v0.2.0**: Current release, corresponds to v3 of CodeEvolve's [paper](https://arxiv.org/abs/2510.14150).
188182

189183
## Acknowledgements
190184

0 commit comments

Comments
 (0)