fix(buildernet): make --rbuilder flag work#420
Conversation
Buildernet removed beacon_healthmon but left beacon's health-check sidecar
label pointing at it, so manifest validation failed when rbuilder declared
DependsOnHealthy("beacon"). Beacon also cannot reach a healthy state in
buildernet until a builder connects (--target-peers=1), so depending on
beacon health would deadlock anyway.
Strip the stale label after removing the sidecar, and downgrade rbuilder's
beacon dependency from healthy to running when used inside buildernet.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Claude finished @SozinM's task in 2m 31s —— View job ReviewThe fix correctly identifies and addresses the two issues described in the PR (dangling sidecar label + deadlock from healthy-dependency on a beacon that can never become healthy). One substantive concern below.
Findings
Non-issues
|
| if rbuilder := component.FindService("rbuilder"); rbuilder != nil { | ||
| for _, dep := range rbuilder.DependsOn { | ||
| if dep.Name == "beacon" && dep.Condition == DependsOnConditionHealthy { | ||
| dep.Condition = DependsOnConditionRunning | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
This patch is keyed off the literal service name rbuilder, but the underlying invariant the comment states ("beacon never reaches healthy state until a builder connects") applies to every consumer of beacon's health, not just rbuilder.
contender (components.go:926) also calls DependsOnHealthy("beacon") and is added through RunContenderIfEnabled, which the L1 recipe runs at recipe_l1.go:144 (and is then called a second time at line 74 of this file — likely a separate bug). With this PR, after delete(beacon.Labels, healthCheckSidecarLabel), running buildernet --contender will fail manifest validation with service contender depends on service beacon, but it does not have a ready check (manifest.go:349) — same class of failure this PR is trying to fix for rbuilder, just with a different failure mode than before.
Suggest generalizing: walk every service in the tree and downgrade any beacon-healthy dependency to running, instead of hardcoding rbuilder. That keeps the recipe correct as new beacon-dependent components are added.
| component.RemoveService("beacon_healthmon") | ||
| if beacon != nil { | ||
| delete(beacon.Labels, healthCheckSidecarLabel) | ||
| } |
There was a problem hiding this comment.
Minor: the label deletion is logically part of the same "remove the sidecar" operation but is split from RemoveService("beacon_healthmon") only by a nil guard that's already handled above. Consider folding both into a single if beacon != nil { ... } block (and ideally co-locating the RemoveService call) so the invariant — sidecar gone ⇔ label gone — is obvious at a glance.
…tender
Address review feedback on the previous fix:
- RemoveService now strips dangling health-check-sidecar label refs across
the component tree, so callers can no longer leave a beacon pointing at a
removed healthmon.
- Add Component.WalkServices and Service.ReplaceDependency as small reusable
primitives.
- Buildernet uses WalkServices to downgrade *any* DependsOnHealthy("beacon")
to running, not just rbuilder. This also covers contender, which has the
same dependency and would have hit the same validation failure.
- Remove the duplicate RunContenderIfEnabled call in buildernet — L1Recipe
already adds contender, so calling it again at the outer level was
registering the service twice.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Claude finished @SozinM's task in 2m 49s —— View job Review (re-review of fbc2a1b)
Both prior review points are addressed.
Two minor follow-ups (left as inline comments):
Neither is blocking. LGTM modulo whichever you want to act on. |
|
Okay, this looks bad to me, closing it |
| // RemoveService removes a service by name from the component tree. | ||
| // It also clears any health-check-sidecar labels that reference the removed | ||
| // service so manifest validation does not fail on a dangling sidecar pointer. | ||
| func (p *Component) RemoveService(name string) { | ||
| p.removeServiceRecursive(name) | ||
| p.clearSidecarLabel(name) | ||
| } | ||
|
|
||
| func (p *Component) removeServiceRecursive(name string) bool { | ||
| for i, svc := range p.Services { | ||
| if svc.Name == name { | ||
| p.Services = append(p.Services[:i], p.Services[i+1:]...) | ||
| return | ||
| return true | ||
| } | ||
| } | ||
| for _, inner := range p.Inner { | ||
| inner.RemoveService(name) | ||
| if inner.removeServiceRecursive(name) { | ||
| return true | ||
| } | ||
| } | ||
| return false | ||
| } | ||
|
|
||
| func (p *Component) clearSidecarLabel(sidecarName string) { | ||
| for _, svc := range p.Services { | ||
| if svc.Labels[healthCheckSidecarLabel] == sidecarName { | ||
| delete(svc.Labels, healthCheckSidecarLabel) | ||
| } | ||
| } | ||
| for _, inner := range p.Inner { | ||
| inner.clearSidecarLabel(sidecarName) | ||
| } | ||
| } |
There was a problem hiding this comment.
The new RemoveService semantics (always cleans up dangling sidecar labels) and the new WalkServices / ReplaceDependency helpers are not covered by manifest_test.go — only the original RemoveService cases are. A few small unit tests would lock in the contract being introduced here (in particular, that RemoveService("X") clears any health-check-sidecar=X label across the tree, including in nested components), so a future caller doesn't accidentally regress this back to the dangling-pointer state this PR is fixing.
Minor: clearSidecarLabel always walks the entire tree even when removeServiceRecursive returned false (i.e. nothing was removed). Harmless, but you could skip it in that case.
| // Beacon never reaches healthy in buildernet, so any consumer that asked | ||
| // for healthy must accept running instead. | ||
| component.WalkServices(func(svc *Service) { | ||
| svc.ReplaceDependency("beacon", DependsOnConditionRunning) | ||
| }) |
There was a problem hiding this comment.
WalkServices visits every service in the tree, but FindService("beacon") above only finds the first match in DFS order. If a future recipe ends up with multiple beacon services in the tree (e.g. a combined L1+L2 setup), WalkServices will downgrade beacon-deps for all of them, while only the first beacon's args/healthmon get touched. Today there's exactly one beacon so this is fine; just worth noting that the two helpers have different scoping semantics in case this recipe is ever extended.
Summary
buildernet --rbuilderfailed manifest validation:service rbuilder depends on service beacon, but its health check sidecar beacon_healthmon is not definedbeacon_healthmon(incompatible with--target-peers=1) but left beacon'shealth-check-sidecarlabel danglingRbuilder.DependsOnHealthy("beacon")would deadlockChanges
healthCheckSidecarLabelafter removingbeacon_healthmonhealthytorunningwhen used inside buildernetTest plan
go build ./...gofmt -d -sclean,go vet ./...cleanbuilder-playground cook buildernet --rbuilder --dry-runsucceeds (previously errored on validation)builder-playground cook buildernet --dry-runstill worksbuilder-playground cook l1 --rbuilder --dry-runstill works🤖 Generated with Claude Code