Skip to content

Commit 9cab94e

Browse files
committed
Document Reckless testing workflow
1 parent 01b3fe0 commit 9cab94e

3 files changed

Lines changed: 342 additions & 0 deletions

File tree

README.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,10 @@ Reckless is an open source competitive chess engine, consistently performing amo
4343

4444
## Getting started
4545

46+
Additional contributor docs:
47+
48+
- [Testing Reckless Changes](docs/testing.md)
49+
4650
### Precompiled binaries
4751

4852
You can download precompiled builds from the [GitHub Releases page](https://github.com/codedeliveryservice/Reckless/releases).

docs/README.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# Docs
2+
3+
- [Testing Reckless Changes](testing.md)

docs/testing.md

Lines changed: 335 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,335 @@
1+
# Testing Reckless Changes
2+
3+
This guide explains the layers of testing used in Reckless development,
4+
what the `Bench:` value means, and how to set up local and OpenBench
5+
tests without relying on assumed chess-engine workflow knowledge.
6+
7+
## Overview
8+
9+
Reckless uses four different kinds of validation:
10+
11+
1. Local correctness checks
12+
2. Local benchmarking
13+
3. Local smoke games
14+
4. OpenBench strength testing
15+
16+
These layers answer different questions:
17+
18+
- `cargo test`, `cargo fmt`, and `cargo clippy` answer "did I break the
19+
build or obvious correctness?"
20+
- `bench` answers "did I change the engine's search behavior or
21+
throughput on the standard bench positions?"
22+
- local `fastchess` smoke tests answer "does the engine stay stable in
23+
games?"
24+
- OpenBench answers "does this change improve strength?"
25+
26+
Do not treat a single layer as a replacement for the others.
27+
28+
## Local Correctness Checks
29+
30+
Run the same checks that CI runs:
31+
32+
```bash
33+
cargo test --verbose
34+
cargo fmt -- --check
35+
cargo clippy -- -D warnings
36+
```
37+
38+
CI also runs `cargo run --verbose -- bench`, so it is worth running
39+
`bench` locally before opening a PR.
40+
41+
Relevant workflows:
42+
43+
- [Reckless CI](../.github/workflows/rust.yml)
44+
- [Games](../.github/workflows/games.yml)
45+
- [PGO](../.github/workflows/pgo.yml)
46+
47+
## What `bench` Does
48+
49+
The built-in `bench` command searches a fixed set of positions from
50+
[`src/tools/bench.rs`](../src/tools/bench.rs)
51+
and prints:
52+
53+
```text
54+
Bench: <nodes> nodes <nps> nps
55+
```
56+
57+
The important value for commit messages and OpenBench is the first
58+
number:
59+
60+
- `Bench: <nodes>`
61+
62+
That number is the total number of nodes searched over the built-in
63+
bench suite at the configured depth. In practice, contributors use it
64+
as a compact fingerprint for the engine's current search behavior.
65+
66+
The second number:
67+
68+
- `<nps>`
69+
70+
is still useful, but it is not the canonical `Bench:` value used in
71+
commit messages or OpenBench forms.
72+
73+
### Default Bench Settings
74+
75+
From [`src/tools/bench.rs`](../src/tools/bench.rs):
76+
77+
- hash: `16`
78+
- threads: `1`
79+
- depth: `12`
80+
81+
So these commands are equivalent:
82+
83+
```bash
84+
cargo run -- bench
85+
./target/release/reckless bench
86+
./target/release/reckless 'bench 16 1 12'
87+
```
88+
89+
The parameter meanings are:
90+
91+
- first argument: transposition-table hash size in MB
92+
- second argument: number of search threads
93+
- third argument: search depth
94+
95+
For example:
96+
97+
```bash
98+
./target/release/reckless 'bench 16 1 12'
99+
```
100+
101+
means "run bench with `Hash=16`, `Threads=1`, `Depth=12`".
102+
103+
## What to Put in the Commit Message
104+
105+
When maintainers ask for `Bench: ...` in the commit message, they mean
106+
the full commit message or description should contain the node count
107+
from `bench`, for example:
108+
109+
```text
110+
Bench: 3140512
111+
```
112+
113+
For Reckless, OpenBench uses this to autofill the bench field for a
114+
test.
115+
116+
The usual flow is:
117+
118+
1. make the change
119+
2. run `bench`
120+
3. set the commit message to `Bench: <nodes>`
121+
4. push your branch
122+
5. submit OpenBench tests
123+
6. open the PR once the test passes, or update an already-open PR with
124+
the result
125+
126+
If your change is intended to be non-functional, the bench node count
127+
should usually stay the same. If it changes, treat that as a sign that
128+
the patch changed engine behavior, even if the edit looked like a
129+
micro-optimization.
130+
131+
## Architecture Caveat
132+
133+
Bench values are not always identical across architectures. In
134+
practice, Apple Silicon and x86 can disagree on the `Bench:` node
135+
count, likely because of architecture-specific NNUE inference details.
136+
137+
If your local `Bench:` value does not match what other contributors
138+
expect:
139+
140+
1. run `bench` on `main`
141+
2. run `bench` on your branch
142+
3. ask in Discord or check recent Reckless OpenBench tests before
143+
submitting
144+
145+
Do not assume your local Apple Silicon number is the number the
146+
Reckless OpenBench instance expects.
147+
148+
## Local Smoke Games
149+
150+
The repo's game workflow uses `fastchess` as a stability smoke test, not
151+
as final Elo proof. It builds a pinned `fastchess` revision and checks
152+
for:
153+
154+
- `illegal move`
155+
- `disconnect`
156+
- `stall`
157+
158+
CI pins a specific `fastchess` revision in the
159+
[`Games` workflow](../.github/workflows/games.yml) to keep smoke-test
160+
infrastructure reproducible.
161+
162+
The local workflow can be much smaller than a real OpenBench test. Its
163+
goal is simply to catch obvious instability before burning remote worker
164+
time.
165+
166+
## PGO Testing
167+
168+
PGO stands for profile-guided optimization. Reckless uses it in CI and
169+
in release workflows:
170+
171+
```bash
172+
cargo pgo instrument
173+
cargo pgo run -- bench
174+
cargo pgo optimize
175+
```
176+
177+
That process:
178+
179+
1. builds an instrumented binary
180+
2. runs `bench` to collect profile data
181+
3. rebuilds using the recorded profile
182+
183+
Small hot-path changes can disappear or reverse under PGO, so do not
184+
rely only on plain release builds for performance claims.
185+
186+
If you want the exact repo-style optimized build:
187+
188+
```bash
189+
make pgo
190+
```
191+
192+
## OpenBench Basics
193+
194+
OpenBench is the main strength-testing framework for Reckless. The
195+
upstream project describes it as a distributed framework for running
196+
fixed-game and SPRT engine tests:
197+
198+
- <https://github.com/AndyGrant/OpenBench>
199+
200+
Reckless uses its own OpenBench instance:
201+
202+
- <https://recklesschess.space/>
203+
204+
### Important OpenBench Fields
205+
206+
For a normal branch-vs-main test, the key fields are:
207+
208+
- `Dev Source`: the repository that contains your test branch
209+
- `Dev Sha`: the commit you want to test
210+
- `Dev Branch`: your test branch
211+
- `Dev Bench`: the `Bench:` node count for your dev build
212+
- `Base Source`: the repository that contains the base branch
213+
- `Base Sha`: the commit you want as the baseline
214+
- `Base Branch`: usually `main`
215+
- `Base Bench`: the `Bench:` node count for the base build
216+
- `Dev Options` and `Base Options`: UCI options passed to the engine
217+
during games
218+
219+
### Which Repository to Use
220+
221+
If your development branch only exists in your fork, use your fork as
222+
the source repository for both sides of the test.
223+
224+
Example:
225+
226+
- `Dev Source`: `https://github.com/<you>/Reckless`
227+
- `Dev Branch`: `your-branch`
228+
- `Base Source`: `https://github.com/<you>/Reckless`
229+
- `Base Branch`: `main`
230+
231+
This works as long as your fork's `main` matches upstream `main`.
232+
233+
Using the upstream repo for the base side and your fork for the dev side
234+
can be confusing if the instance expects both refs to come from the same
235+
source repository. If in doubt, copy a recent working Reckless test and
236+
only change the branch, SHA, and bench fields.
237+
238+
### What the Bench Fields Mean in OpenBench
239+
240+
The `Dev Bench` and `Base Bench` fields should contain the bench node
241+
counts, not the NPS.
242+
243+
Example:
244+
245+
- correct: `3140512`
246+
- wrong: `1133878`
247+
248+
### What the Engine Options Mean
249+
250+
OpenBench options such as:
251+
252+
```text
253+
Threads=1 Hash=16 Minimal=true MoveOverhead=0
254+
```
255+
256+
map to normal UCI engine options:
257+
258+
- `Threads=1`: use one search thread
259+
- `Hash=16`: use a 16 MB transposition table
260+
- `Minimal=true`: reduce UCI output noise
261+
- `MoveOverhead=0`: reserve zero milliseconds per move for
262+
GUI/network overhead
263+
264+
This `Hash=16` is the same concept as the first argument to the local
265+
`bench` command.
266+
267+
### A Good Reckless Example
268+
269+
This is a representative Reckless OpenBench test layout:
270+
271+
- dev and base both use your fork as `Source`
272+
- dev branch points at your testing bookmark or branch
273+
- base branch points at `main`
274+
- both sides use the same network
275+
- both sides use `Threads=1 Hash=16 Minimal=true MoveOverhead=0`
276+
277+
At the time this guide was written, a working example looked like:
278+
279+
```text
280+
Dev Source https://github.com/joshka/Reckless
281+
Dev Branch joshka/optimize-quiet-move-scoring
282+
Dev Bench 2786596
283+
Base Source https://github.com/joshka/Reckless
284+
Base Branch main
285+
Base Bench 2786596
286+
Dev/Base Options Threads=1 Hash=16 Minimal=true MoveOverhead=0
287+
```
288+
289+
Treat that as a template for field placement, not as a permanent
290+
universal config. Copy a recent passing Reckless test when possible.
291+
292+
### Approval and Pending Tests
293+
294+
Some OpenBench instances auto-approve tests. Reckless does not appear to
295+
do that for every registered user.
296+
297+
If a test lands in a pending state, that usually means the instance
298+
requires an approver to accept it before workers will run it.
299+
300+
## Recommended Reckless Workflow
301+
302+
For a normal search or evaluation patch:
303+
304+
1. make the change
305+
2. run `cargo test --verbose`
306+
3. run `cargo fmt -- --check`
307+
4. run `cargo clippy -- -D warnings`
308+
5. run `bench`
309+
6. set the commit message to `Bench: <nodes>`
310+
7. push the branch to your fork
311+
8. create an OpenBench test using your fork for both `Dev Source` and
312+
`Base Source`
313+
9. open the PR after the test passes, or update an existing PR with the
314+
result
315+
316+
This ordering is intentional. In Reckless development, contributors
317+
often run OpenBench first and only open the PR after the test looks
318+
good.
319+
320+
If the change is specifically about performance:
321+
322+
1. compare release builds locally
323+
2. compare PGO builds locally
324+
3. only then rely on OpenBench to answer the Elo question
325+
326+
## When to Ask for Help
327+
328+
Ask in Discord before spending a lot of worker time if:
329+
330+
- your local `Bench:` value differs from what maintainers expect
331+
- OpenBench cannot find your branch or SHA
332+
- you are not sure whether `Base Source` should point at upstream or
333+
your fork
334+
- you see a pending test and do not know whether it needs approval
335+
- your patch changes `Bench:` when you thought it was non-functional

0 commit comments

Comments
 (0)