Skip to content

Commit 12a3cd5

Browse files
committed
Document Reckless testing workflow
1 parent 01b3fe0 commit 12a3cd5

3 files changed

Lines changed: 336 additions & 0 deletions

File tree

README.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,10 @@ Reckless is an open source competitive chess engine, consistently performing amo
4343

4444
## Getting started
4545

46+
Additional contributor docs:
47+
48+
- [Testing Reckless Changes](docs/testing.md)
49+
4650
### Precompiled binaries
4751

4852
You can download precompiled builds from the [GitHub Releases page](https://github.com/codedeliveryservice/Reckless/releases).

docs/README.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# Docs
2+
3+
- [Testing Reckless Changes](testing.md)

docs/testing.md

Lines changed: 329 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,329 @@
1+
# Testing Reckless Changes
2+
3+
This guide explains the layers of testing used in Reckless development,
4+
what the `Bench:` value means, and how to set up local and OpenBench
5+
tests without relying on assumed chess-engine workflow knowledge.
6+
7+
## Overview
8+
9+
Reckless uses four different kinds of validation:
10+
11+
1. Local correctness checks
12+
2. Local benchmarking
13+
3. Local smoke games
14+
4. OpenBench strength testing
15+
16+
These layers answer different questions:
17+
18+
- `cargo test`, `cargo fmt`, and `cargo clippy` answer "did I break the
19+
build or obvious correctness?"
20+
- `bench` answers "did I change the engine's search behavior or
21+
throughput on the standard bench positions?"
22+
- local `fastchess` smoke tests answer "does the engine stay stable in
23+
games?"
24+
- OpenBench answers "does this change improve strength?"
25+
26+
Do not treat a single layer as a replacement for the others.
27+
28+
## Local Correctness Checks
29+
30+
Run the same checks that CI runs:
31+
32+
```bash
33+
cargo test --verbose
34+
cargo fmt -- --check
35+
cargo clippy -- -D warnings
36+
```
37+
38+
CI also runs `cargo run --verbose -- bench`, so it is worth running
39+
`bench` locally before opening a PR.
40+
41+
Relevant workflows:
42+
43+
- [Reckless CI](../.github/workflows/rust.yml)
44+
- [Games](../.github/workflows/games.yml)
45+
- [PGO](../.github/workflows/pgo.yml)
46+
47+
## What `bench` Does
48+
49+
The built-in `bench` command searches a fixed set of positions from
50+
[`src/tools/bench.rs`](../src/tools/bench.rs)
51+
and prints:
52+
53+
```text
54+
Bench: <nodes> nodes <nps> nps
55+
```
56+
57+
The important value for commit messages and OpenBench is the first
58+
number:
59+
60+
- `Bench: <nodes>`
61+
62+
That number is the total number of nodes searched over the built-in
63+
bench suite at the configured depth. In practice, contributors use it
64+
as a compact fingerprint for the engine's current search behavior.
65+
66+
The second number:
67+
68+
- `<nps>`
69+
70+
is still useful, but it is not the canonical `Bench:` value used in
71+
commit messages or OpenBench forms.
72+
73+
### Default Bench Settings
74+
75+
From [`src/tools/bench.rs`](../src/tools/bench.rs):
76+
77+
- hash: `16`
78+
- threads: `1`
79+
- depth: `12`
80+
81+
So these commands are equivalent:
82+
83+
```bash
84+
cargo run -- bench
85+
./target/release/reckless bench
86+
./target/release/reckless 'bench 16 1 12'
87+
```
88+
89+
The parameter meanings are:
90+
91+
- first argument: transposition-table hash size in MB
92+
- second argument: number of search threads
93+
- third argument: search depth
94+
95+
For example:
96+
97+
```bash
98+
./target/release/reckless 'bench 16 1 12'
99+
```
100+
101+
means "run bench with `Hash=16`, `Threads=1`, `Depth=12`".
102+
103+
## What to Put in the Commit Message
104+
105+
When maintainers ask for `Bench: ...` in the commit message, they mean
106+
the full commit message or description should contain the node count
107+
from `bench`, for example:
108+
109+
```text
110+
Bench: 3140512
111+
```
112+
113+
For Reckless, OpenBench uses this to autofill the bench field for a
114+
test.
115+
116+
The usual flow is:
117+
118+
1. make the change
119+
2. run `bench`
120+
3. set the commit message to `Bench: <nodes>`
121+
4. push your branch
122+
5. submit OpenBench tests
123+
6. open or update the PR once the test passes
124+
125+
If your change is intended to be non-functional, the bench node count
126+
should usually stay the same. If it changes, treat that as a sign that
127+
the patch changed engine behavior, even if the edit looked like a
128+
micro-optimization.
129+
130+
## Architecture Caveat
131+
132+
Bench values are not always identical across architectures. In
133+
practice, Apple Silicon and x86 can disagree on the `Bench:` node
134+
count, likely because of architecture-specific NNUE inference details.
135+
136+
If your local `Bench:` value does not match what other contributors
137+
expect:
138+
139+
1. run `bench` on `main`
140+
2. run `bench` on your branch
141+
3. ask in Discord or check recent Reckless OpenBench tests before
142+
submitting
143+
144+
Do not assume your local Apple Silicon number is the number the
145+
Reckless OpenBench instance expects.
146+
147+
## Local Smoke Games
148+
149+
The repo's game workflow uses `fastchess` as a stability smoke test, not
150+
as final Elo proof. It builds a pinned `fastchess` revision and checks
151+
for:
152+
153+
- `illegal move`
154+
- `disconnect`
155+
- `stall`
156+
157+
CI pins a specific `fastchess` revision in the
158+
[`Games` workflow](../.github/workflows/games.yml) to keep smoke-test
159+
infrastructure reproducible.
160+
161+
The local workflow can be much smaller than a real OpenBench test. Its
162+
goal is simply to catch obvious instability before burning remote worker
163+
time.
164+
165+
## PGO Testing
166+
167+
PGO stands for profile-guided optimization. Reckless uses it in CI and
168+
in release workflows:
169+
170+
```bash
171+
cargo pgo instrument
172+
cargo pgo run -- bench
173+
cargo pgo optimize
174+
```
175+
176+
That process:
177+
178+
1. builds an instrumented binary
179+
2. runs `bench` to collect profile data
180+
3. rebuilds using the recorded profile
181+
182+
Small hot-path changes can disappear or reverse under PGO, so do not
183+
rely only on plain release builds for performance claims.
184+
185+
If you want the exact repo-style optimized build:
186+
187+
```bash
188+
make pgo
189+
```
190+
191+
## OpenBench Basics
192+
193+
OpenBench is the main strength-testing framework for Reckless. The
194+
upstream project describes it as a distributed framework for running
195+
fixed-game and SPRT engine tests:
196+
197+
- <https://github.com/AndyGrant/OpenBench>
198+
199+
Reckless uses its own OpenBench instance:
200+
201+
- <https://recklesschess.space/>
202+
203+
### Important OpenBench Fields
204+
205+
For a normal branch-vs-main test, the key fields are:
206+
207+
- `Dev Source`: the repository that contains your test branch
208+
- `Dev Sha`: the commit you want to test
209+
- `Dev Branch`: your test branch
210+
- `Dev Bench`: the `Bench:` node count for your dev build
211+
- `Base Source`: the repository that contains the base branch
212+
- `Base Sha`: the commit you want as the baseline
213+
- `Base Branch`: usually `main`
214+
- `Base Bench`: the `Bench:` node count for the base build
215+
- `Dev Options` and `Base Options`: UCI options passed to the engine
216+
during games
217+
218+
### Which Repository to Use
219+
220+
If your development branch only exists in your fork, use your fork as
221+
the source repository for both sides of the test.
222+
223+
Example:
224+
225+
- `Dev Source`: `https://github.com/<you>/Reckless`
226+
- `Dev Branch`: `your-branch`
227+
- `Base Source`: `https://github.com/<you>/Reckless`
228+
- `Base Branch`: `main`
229+
230+
This works as long as your fork's `main` matches upstream `main`.
231+
232+
Using the upstream repo for the base side and your fork for the dev side
233+
can be confusing if the instance expects both refs to come from the same
234+
source repository. If in doubt, copy a recent working Reckless test and
235+
only change the branch, SHA, and bench fields.
236+
237+
### What the Bench Fields Mean in OpenBench
238+
239+
The `Dev Bench` and `Base Bench` fields should contain the bench node
240+
counts, not the NPS.
241+
242+
Example:
243+
244+
- correct: `3140512`
245+
- wrong: `1133878`
246+
247+
### What the Engine Options Mean
248+
249+
OpenBench options such as:
250+
251+
```text
252+
Threads=1 Hash=16 Minimal=true MoveOverhead=0
253+
```
254+
255+
map to normal UCI engine options:
256+
257+
- `Threads=1`: use one search thread
258+
- `Hash=16`: use a 16 MB transposition table
259+
- `Minimal=true`: reduce UCI output noise
260+
- `MoveOverhead=0`: reserve zero milliseconds per move for
261+
GUI/network overhead
262+
263+
This `Hash=16` is the same concept as the first argument to the local
264+
`bench` command.
265+
266+
### A Good Reckless Example
267+
268+
This is a representative Reckless OpenBench test layout:
269+
270+
- dev and base both use your fork as `Source`
271+
- dev branch points at your testing bookmark or branch
272+
- base branch points at `main`
273+
- both sides use the same network
274+
- both sides use `Threads=1 Hash=16 Minimal=true MoveOverhead=0`
275+
276+
At the time this guide was written, a working example looked like:
277+
278+
```text
279+
Dev Source https://github.com/joshka/Reckless
280+
Dev Branch joshka/optimize-quiet-move-scoring
281+
Dev Bench 2786596
282+
Base Source https://github.com/joshka/Reckless
283+
Base Branch main
284+
Base Bench 2786596
285+
Dev/Base Options Threads=1 Hash=16 Minimal=true MoveOverhead=0
286+
```
287+
288+
Treat that as a template for field placement, not as a permanent
289+
universal config. Copy a recent passing Reckless test when possible.
290+
291+
### Approval and Pending Tests
292+
293+
Some OpenBench instances auto-approve tests. Reckless does not appear to
294+
do that for every registered user.
295+
296+
If a test lands in a pending state, that usually means the instance
297+
requires an approver to accept it before workers will run it.
298+
299+
## Recommended Reckless Workflow
300+
301+
For a normal search or evaluation patch:
302+
303+
1. make the change
304+
2. run `cargo test --verbose`
305+
3. run `cargo fmt -- --check`
306+
4. run `cargo clippy -- -D warnings`
307+
5. run `bench`
308+
6. set the commit message to `Bench: <nodes>`
309+
7. push the branch to your fork
310+
8. create an OpenBench test using your fork for both `Dev Source` and
311+
`Base Source`
312+
9. open or update the PR
313+
314+
If the change is specifically about performance:
315+
316+
1. compare release builds locally
317+
2. compare PGO builds locally
318+
3. only then rely on OpenBench to answer the Elo question
319+
320+
## When to Ask for Help
321+
322+
Ask in Discord before spending a lot of worker time if:
323+
324+
- your local `Bench:` value differs from what maintainers expect
325+
- OpenBench cannot find your branch or SHA
326+
- you are not sure whether `Base Source` should point at upstream or
327+
your fork
328+
- you see a pending test and do not know whether it needs approval
329+
- your patch changes `Bench:` when you thought it was non-functional

0 commit comments

Comments
 (0)