Skip to content

fix(buildernet): make --rbuilder flag work#420

Closed
SozinM wants to merge 2 commits into
mainfrom
claude/fix-buildernet-rbuilder-dep
Closed

fix(buildernet): make --rbuilder flag work#420
SozinM wants to merge 2 commits into
mainfrom
claude/fix-buildernet-rbuilder-dep

Conversation

@SozinM
Copy link
Copy Markdown
Contributor

@SozinM SozinM commented May 5, 2026

Summary

  • buildernet --rbuilder failed manifest validation: service rbuilder depends on service beacon, but its health check sidecar beacon_healthmon is not defined
  • Buildernet removes beacon_healthmon (incompatible with --target-peers=1) but left beacon's health-check-sidecar label dangling
  • Even with the label fixed, beacon cannot reach healthy until a builder connects, so Rbuilder.DependsOnHealthy("beacon") would deadlock

Changes

  • Strip beacon's healthCheckSidecarLabel after removing beacon_healthmon
  • Downgrade rbuilder's beacon dependency from healthy to running when used inside buildernet

Test plan

  • go build ./...
  • gofmt -d -s clean, go vet ./... clean
  • builder-playground cook buildernet --rbuilder --dry-run succeeds (previously errored on validation)
  • builder-playground cook buildernet --dry-run still works
  • builder-playground cook l1 --rbuilder --dry-run still works

🤖 Generated with Claude Code

Buildernet removed beacon_healthmon but left beacon's health-check sidecar
label pointing at it, so manifest validation failed when rbuilder declared
DependsOnHealthy("beacon"). Beacon also cannot reach a healthy state in
buildernet until a builder connects (--target-peers=1), so depending on
beacon health would deadlock anyway.

Strip the stale label after removing the sidecar, and downgrade rbuilder's
beacon dependency from healthy to running when used inside buildernet.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@SozinM SozinM requested a review from canercidam as a code owner May 5, 2026 09:43
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 5, 2026

Claude finished @SozinM's task in 2m 31s —— View job


Review

The fix correctly identifies and addresses the two issues described in the PR (dangling sidecar label + deadlock from healthy-dependency on a beacon that can never become healthy). One substantive concern below.

  • Read PR diff and surrounding code
  • Trace dependency graph for beacon
  • Post inline comments

Findings

  1. Brittle to the literal service name rbuilder (see inline on lines 66–72) — The recipe asserts the invariant "beacon never reaches healthy state until a builder connects," but the patch only downgrades that one service. contender also has DependsOnHealthy("beacon") (components.go:926) and is added via RunContenderIfEnabled from the L1 recipe; with this PR, buildernet --contender will now fail manifest validation at manifest.go:349 ("does not have a ready check") because the sidecar label has been deleted and beacon has no native ReadyCheck. Recommend walking every service and downgrading any beacon-healthy dep, rather than hardcoding rbuilder.

  2. Tangentially: RunContenderIfEnabled is called both inside L1Recipe.Apply (line 144) and again at the end of BuilderNetRecipe.Apply (line 74). If contender is enabled, this looks like it would add the service twice. Pre-existing, but worth flagging while in this code.

  3. Minor structural nit (see inline on lines 59–62) — the new delete(beacon.Labels, …) is logically part of "remove the sidecar" but is split from RemoveService("beacon_healthmon") by a redundant nil guard. Folding into a single if beacon != nil block makes the invariant easier to see.

Non-issues

  • delete on a nil beacon.Labels is a safe no-op in Go.
  • Mutating dep.Condition via the range pointer is correct because DependsOn is []*DependsOn.
    Branch

Comment thread playground/recipe_buildernet.go Outdated
Comment on lines +66 to +72
if rbuilder := component.FindService("rbuilder"); rbuilder != nil {
for _, dep := range rbuilder.DependsOn {
if dep.Name == "beacon" && dep.Condition == DependsOnConditionHealthy {
dep.Condition = DependsOnConditionRunning
}
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This patch is keyed off the literal service name rbuilder, but the underlying invariant the comment states ("beacon never reaches healthy state until a builder connects") applies to every consumer of beacon's health, not just rbuilder.

contender (components.go:926) also calls DependsOnHealthy("beacon") and is added through RunContenderIfEnabled, which the L1 recipe runs at recipe_l1.go:144 (and is then called a second time at line 74 of this file — likely a separate bug). With this PR, after delete(beacon.Labels, healthCheckSidecarLabel), running buildernet --contender will fail manifest validation with service contender depends on service beacon, but it does not have a ready check (manifest.go:349) — same class of failure this PR is trying to fix for rbuilder, just with a different failure mode than before.

Suggest generalizing: walk every service in the tree and downgrade any beacon-healthy dependency to running, instead of hardcoding rbuilder. That keeps the recipe correct as new beacon-dependent components are added.

Comment thread playground/recipe_buildernet.go Outdated
Comment on lines +59 to +62
component.RemoveService("beacon_healthmon")
if beacon != nil {
delete(beacon.Labels, healthCheckSidecarLabel)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: the label deletion is logically part of the same "remove the sidecar" operation but is split from RemoveService("beacon_healthmon") only by a nil guard that's already handled above. Consider folding both into a single if beacon != nil { ... } block (and ideally co-locating the RemoveService call) so the invariant — sidecar gone ⇔ label gone — is obvious at a glance.

…tender

Address review feedback on the previous fix:

- RemoveService now strips dangling health-check-sidecar label refs across
  the component tree, so callers can no longer leave a beacon pointing at a
  removed healthmon.
- Add Component.WalkServices and Service.ReplaceDependency as small reusable
  primitives.
- Buildernet uses WalkServices to downgrade *any* DependsOnHealthy("beacon")
  to running, not just rbuilder. This also covers contender, which has the
  same dependency and would have hit the same validation failure.
- Remove the duplicate RunContenderIfEnabled call in buildernet — L1Recipe
  already adds contender, so calling it again at the outer level was
  registering the service twice.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 5, 2026

Claude finished @SozinM's task in 2m 49s —— View job


Review (re-review of fbc2a1b)

  • Read updated diff
  • Verify generalization covers contender (and any future beacon-healthy consumer)
  • Verify manifest.go RemoveService refactor is sound
  • Post review

Both prior review points are addressed.

  • The hardcoded rbuilder downgrade is gone, replaced by component.WalkServices(...).ReplaceDependency("beacon", DependsOnConditionRunning). This now correctly covers contender (components.go:926) and any future beacon-healthy consumer — closes the regression I flagged for buildernet --contender.
  • The duplicate RunContenderIfEnabled call is removed (it was already added by L1Recipe.Apply at recipe_l1.go:144).
  • The label-deletion concern is also resolved more cleanly than I suggested: RemoveService itself now strips any health-check-sidecar=<removed> label across the tree, so callers can't forget. Worth highlighting that this is a behavioral change to a generic helper — strictly safer (no current caller depended on dangling labels), and the doc comment captures it.

Two minor follow-ups (left as inline comments):

  1. manifest.go:115–147 — new behaviors (WalkServices, ReplaceDependency, label-clearing in RemoveService) are uncovered by manifest_test.go. Small unit tests would lock in the contract. Also: clearSidecarLabel runs even when removeServiceRecursive returned false — harmless but skippable.
  2. recipe_buildernet.go:59–63FindService (single match, DFS) and WalkServices (every match) have different scoping. Today there's exactly one beacon, so it doesn't matter; worth a note for future combined-network recipes.

Neither is blocking. LGTM modulo whichever you want to act on.
Branch

@SozinM
Copy link
Copy Markdown
Contributor Author

SozinM commented May 5, 2026

Okay, this looks bad to me, closing it

@SozinM SozinM closed this May 5, 2026
Comment thread playground/manifest.go
Comment on lines +115 to 147
// RemoveService removes a service by name from the component tree.
// It also clears any health-check-sidecar labels that reference the removed
// service so manifest validation does not fail on a dangling sidecar pointer.
func (p *Component) RemoveService(name string) {
p.removeServiceRecursive(name)
p.clearSidecarLabel(name)
}

func (p *Component) removeServiceRecursive(name string) bool {
for i, svc := range p.Services {
if svc.Name == name {
p.Services = append(p.Services[:i], p.Services[i+1:]...)
return
return true
}
}
for _, inner := range p.Inner {
inner.RemoveService(name)
if inner.removeServiceRecursive(name) {
return true
}
}
return false
}

func (p *Component) clearSidecarLabel(sidecarName string) {
for _, svc := range p.Services {
if svc.Labels[healthCheckSidecarLabel] == sidecarName {
delete(svc.Labels, healthCheckSidecarLabel)
}
}
for _, inner := range p.Inner {
inner.clearSidecarLabel(sidecarName)
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new RemoveService semantics (always cleans up dangling sidecar labels) and the new WalkServices / ReplaceDependency helpers are not covered by manifest_test.go — only the original RemoveService cases are. A few small unit tests would lock in the contract being introduced here (in particular, that RemoveService("X") clears any health-check-sidecar=X label across the tree, including in nested components), so a future caller doesn't accidentally regress this back to the dangling-pointer state this PR is fixing.

Minor: clearSidecarLabel always walks the entire tree even when removeServiceRecursive returned false (i.e. nothing was removed). Harmless, but you could skip it in that case.

Comment on lines +59 to +63
// Beacon never reaches healthy in buildernet, so any consumer that asked
// for healthy must accept running instead.
component.WalkServices(func(svc *Service) {
svc.ReplaceDependency("beacon", DependsOnConditionRunning)
})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WalkServices visits every service in the tree, but FindService("beacon") above only finds the first match in DFS order. If a future recipe ends up with multiple beacon services in the tree (e.g. a combined L1+L2 setup), WalkServices will downgrade beacon-deps for all of them, while only the first beacon's args/healthmon get touched. Today there's exactly one beacon so this is fine; just worth noting that the two helpers have different scoping semantics in case this recipe is ever extended.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant