claude: add generic launch-devnet skill#21024
Conversation
There was a problem hiding this comment.
Pull request overview
This PR replaces the devnet-specific Claude skill for bal-devnet-3 with a reusable, generic launch-devnet skill and a thin bal-devnet-ab-test wrapper to run BAL vs no-BAL throughput comparisons on BAL devnets.
Changes:
- Added
launch-devnetskill that discovers devnet configuration at runtime (genesis/config/inventory), generates start/stop/clean scripts, and provides monitoring + cross-client failure investigation guidance. - Added
bal-devnet-ab-testskill that reuseslaunch-devnetand spins up a second instance withIGNORE_BAL=trueon a separate port offset. - Removed the legacy
launch-bal-devnet-3devnet-specific skill.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
| .claude/skills/launch-devnet/SKILL.md | New generic devnet launcher skill with runtime discovery, script generation, monitoring, and investigation workflow. |
| .claude/skills/bal-devnet-ab-test/SKILL.md | New wrapper skill for running a second “no-BAL” instance to compare throughput metrics. |
| .claude/skills/launch-bal-devnet-3/SKILL.md | Removed devnet-specific launcher skill (superseded by the generic launcher + wrapper). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| mkdir -p $WORKDIR/testnet-config | ||
| CFG=https://config.${DEVNET}.ethpandaops.io | ||
| curl -fsSL ${CFG}/el/genesis.json -o $WORKDIR/genesis.json | ||
| curl -fsSL ${CFG}/cl/config.yaml -o $WORKDIR/testnet-config/config.yaml | ||
| curl -fsSL ${CFG}/cl/genesis.ssz -o $WORKDIR/testnet-config/genesis.ssz | ||
| curl -fsSL ${CFG}/api/v1/nodes/inventory -o $WORKDIR/inventory.json | ||
| echo 0 > $WORKDIR/testnet-config/deploy_block.txt | ||
| echo 0 > $WORKDIR/testnet-config/deposit_contract_block.txt |
|
|
||
| ```bash | ||
| for p in <each chosen port>; do | ||
| ss -tlnp 2>/dev/null | awk -v P=":$p" '$4 ~ P {print P}' |
|
|
||
| ```bash | ||
| JWT=$WORKDIR/erigon-data/jwt.hex | ||
| test -f "$JWT" # MUST exist; start-erigon.sh creates it |
| --execution-jwt=/jwt.hex \ | ||
| --boot-nodes="<comma-joined ENRs>" \ | ||
| --port=<CL P2P> --quic-port=<CL QUIC> \ | ||
| --http --http-port=<CL HTTP> --http-address=0.0.0.0 \ |
| `stop.sh`: | ||
| - `docker stop <DEVNET>-cl` (and `docker rm`) | ||
| - `pkill -f "datadir.*$WORKDIR/erigon-data"` | ||
|
|
||
| `clean.sh`: | ||
| - runs `stop.sh` | ||
| - removes `erigon-data/{chaindata,snapshots,txpool,nodes,temp}` and `cl-data/*` | ||
| - re-runs `erigon init` |
| docker stop "<devnet>-nobal-cl" 2>/dev/null | ||
| docker rm "<devnet>-nobal-cl" 2>/dev/null | ||
| pkill -f "datadir.*${WORKDIR}-nobal/erigon-data" | ||
| # Optional — wipe disk | ||
| rm -rf "${WORKDIR}-nobal" |
There was a problem hiding this comment.
Significant issues
- ss -tlnp is Linux-only. Both launch-devnet/SKILL.md (Step 4) and bal-devnet-ab-test/SKILL.md use ss for port-conflict detection. The user's primary platform is darwin, where ss
doesn't exist (which ss → not found). The sibling erigon-ephemeral skill correctly uses lsof -nP -iTCP: -sTCP:LISTEN. CLAUDE.md asks for cross-platform shell. The old skill had
the same bug (it's only mentioned in troubleshooting), so this isn't a regression — but since you're rewriting the file anyway, this is the moment to fix it. Suggested:
for p in ; do
lsof -nP -iTCP:$p -sTCP:LISTEN 2>/dev/null | grep -q LISTEN && echo "$p in use"
done
Same for the verification step that says ss -tlnp | grep . - bal-devnet-ab-test Step "Set up Instance B" tells the agent to "Pick a second port offset that doesn't collide with Instance A. If Instance A uses +100, Instance B should use +400"
— but it never tells the agent how to learn what offset A used. launch-devnet records the chosen offset in devnet-info.txt, but the wrapper doesn't say "read it from there." Easy
fix: add OFFSET_A=$(grep '^port_offset:' "$WORKDIR/devnet-info.txt" | awk '{print $2}') (or however you decide to format the field) and derive B's offset from that. - The jwt.hex/authrpc poll is only a comment. Step 8:
cd $WORKDIR && nohup bash start-erigon.sh > erigon-console.log 2>&1 &
# Poll until both conditions are true (timeout ~60s):
# - $WORKDIR/erigon-data/jwt.hex exists
# - the authrpc port is bound by the erigon PID
cd $WORKDIR && nohup bash start-cl.sh > cl-console.log 2>&1 &
An agent following this verbatim will fire both lines back-to-back and the CL will fail JWT auth on the first newPayload. Make it an actual loop or call it out as a separate step
the agent must execute, not a comment.
4. docker pull always runs. The old skill checked docker image inspect first and only pulled on miss. Pull is idempotent so it's fast on cache hit, but pull against an unauthenticated
network with intermittent connectivity can stall. Probably fine, but the previous behavior was nicer.
Nits
- vs placeholder casing inconsistency. Step 7 uses -cl (uppercase placeholder), the A/B skill uses -nobal-cl (lowercase). Step 1 explicitly says
DEVNET=bal-devnet-3 (lowercase value). Pick one — they're the same variable. matches reality. - Step 7 sells "any CL client" but the actual recipe is lighthouse-only. The "for non-lighthouse CLs, look up the client's CLI flags" hedge is honest, but the description says "a CL
client" rather than "Lighthouse (with hooks for other CLs)". Either narrow the description or add a one-line table mapping client → command name for the common ones. - rm -rf cl-data/* in clean.sh loses dotfiles. Use rm -rf cl-data && mkdir cl-data for cleaner intent.
- Checkpoint-sync URL is assumed to exist (https://checkpoint-sync.${DEVNET}.ethpandaops.io) — fresh devnets sometimes don't have one yet. A curl -fsI probe before baking it into the
script would let the skill fall back to genesis sync gracefully. Not a blocker — lighthouse will just emit warnings if the URL 404s. - python3 -c '...' for JSON parsing in Step 9 — jq is already required for inventory parsing in Step 2. Pick one for consistency.
- devnet-info.txt would be more useful as a key/value file (or JSON) than free-form prose, since the A/B wrapper needs to read fields from it programmatically (see issue #2).
Suggested commit before merge
Fix issues 1–3 (cross-platform port check, A/B offset discovery, jwt.hex poll as a real step). Everything else can land as a follow-up or be ignored.
- Switch port-conflict detection from ss (Linux-only) to lsof (cross-platform)
and check both TCP and UDP families. Matches the pattern in erigon-ephemeral.
- Quote $WORKDIR/$CFG in download/init steps so paths with spaces work.
- Make Step 8's jwt.hex+authrpc check a real polling loop with timeouts and
abort guards instead of a comment, so the CL doesn't race ahead and fail
JWT auth on the first newPayload.
- Promote test -f "$JWT" to a hard abort guard ([ -f ] || { echo; exit 1; }).
- Bind Lighthouse beacon API to 127.0.0.1 by default; document widening to
0.0.0.0 only when remote access is needed.
- Capture erigon's PID at start time and kill it by PID in stop.sh instead
of pkill -f, which can match unrelated processes when $WORKDIR contains
regex metacharacters. Skill now requires start-erigon.sh to end with `exec`
so the captured PID is the erigon PID.
- Replace `rm -rf cl-data/*` with `rm -rf cl-data && mkdir cl-data` so
dotfiles don't survive the glob.
- Probe the checkpoint-sync URL before baking it into start-cl.sh; fall back
to genesis sync if the endpoint isn't provisioned yet.
- Skip docker pull if the image is already cached.
- Switch monitoring from python3 to jq for consistency with the rest of the
skill, which already requires jq.
- Rewrite devnet-info.txt as a key/value file so wrappers like
bal-devnet-ab-test can read fields (notably port_offset) programmatically
instead of guessing.
- bal-devnet-ab-test: read OFFSET_A from devnet-info.txt and derive
OFFSET_B = OFFSET_A + 300 instead of assuming A used +100.
- bal-devnet-ab-test: add sanity guards (NOBAL_DIR ends with -nobal, marker
file present) before rm -rf, and stop Instance B by PID.
- Fix <DEVNET>/<devnet> placeholder casing inconsistency by switching to
the actual ${DEVNET} bash expansion.
- Clarify that the Step 7 recipe is Lighthouse-specific and add a small
client→subcommand mapping table for non-lighthouse CLs.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
Thanks for the careful review @yperbasis @copilot-pull-request-reviewer — went through every comment and addressed all of them in f189e46. None warranted pushback: all were either correct (e.g. yperbasis — significant issues
yperbasis — nits
Copilot — inline comments
|
Summary
Replaces the devnet-specific
launch-bal-devnet-3Claude skill with two reusable skills:launch-devnet— generic launcher for any ethpandaops devnet. Takes only a landing-page URL (e.g.https://bal-devnet-3.ethpandaops.io) and discovers everything else at runtime: chain id and fork schedule fromel/genesis.json, CL fork epochs fromcl/config.yaml, EL/CL bootnodes and client image tags from the inventory API, public RPC/beacon/checkpoint URLs from the host convention. Auto-detects port conflicts and bumps the offset, generatesstart-erigon.sh/start-cl.sh/stop.sh/clean.sh, starts erigon → waits forjwt.hex→ starts the CL, then monitors EL head vs network head, peer counts, and CL sync status.bal-devnet-ab-test— slim wrapper that reuseslaunch-devnetfor the primary instance and adds a second instance withIGNORE_BAL=trueon a+400port offset for head-to-head throughput comparison (gas/s,repeat%,abort,invalid).The old
launch-bal-devnet-3skill is deleted; nothing else in the repo referenced it.Key design point: failure investigation
launch-devnetincludes an explicit "finding the absolute truth" section. The default assumption is not that erigon is wrong — on a multi-client devnet, erigon may be spec-correct while another client is buggy, the spec itself may be ambiguous (clients split into factions), or the network/genesis may be broken. The skill instructs the agent to:A "common false-positive signals" list (optimistic head, first-newPayload timeouts, transient
peers: 0) keeps the agent from escalating noise.Files
Test plan
/launch-devnet https://bal-devnet-3.ethpandaops.ioand confirm it discovers chain id7098917910, the Amsterdam fork timestamp, and ≥10 EL/CL bootnodes from the inventory.+100ports are already bound./launch-devnetagainst a different devnet (e.g.fusaka-devnet-N) and confirm the same flow works without code changes./bal-devnet-ab-testafter a successful/launch-devnetrun and confirm Instance B starts on+400ports withIGNORE_BAL=trueexported.