libstore: send SSH ServerAlive keep-alives to remote stores by default#15620
Open
lovesegfault wants to merge 2 commits intomasterfrom
Open
libstore: send SSH ServerAlive keep-alives to remote stores by default#15620lovesegfault wants to merge 2 commits intomasterfrom
lovesegfault wants to merge 2 commits intomasterfrom
Conversation
When a remote builder reboots or otherwise drops off the network without closing the TCP connection, the local ssh process never sees an EOF and the build hook blocks forever on a half-open pipe. Pass `-o ServerAliveInterval=30 -o ServerAliveCountMax=3` so that ssh detects a dead peer in roughly 90 seconds. The values are exposed as the new `ssh-server-alive-interval` and `ssh-server-alive-count-max` store settings (interval `0` disables them), and `NIX_SSHOPTS` continues to take precedence.
095aa8f to
33e6fa9
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
When a remote builder reboots, has
sshdrestarted, or otherwise drops off the network without the local kernel seeing aFIN, thesshprocess spawned by the build hook blocks forever on a half-open TCP connection. Because the hook is registered withrespectTimeouts = false, neither--max-silent-timenor--timeoutwill ever kill it, so the build slot is occupied indefinitely.We hit this in practice: local
sshprocesses pointing at a builder that had no matchingsshd-sessionon the remote side, with__build-remoteparked inread()on a dead pipe.Context
SSHMaster::addCommonSSHOpts()previously passed no liveness-related options. This change passes-o ServerAliveInterval=30 -o ServerAliveCountMax=3by default, so a dead peer is detected in roughly 90 seconds and the build fails cleanly instead of hanging.The values are exposed as new per-store settings on
ssh:///ssh-ng://:ssh-server-alive-interval(default30, set to0to disable and defer tossh_config)ssh-server-alive-count-max(default3)NIX_SSHOPTSis emitted before these defaults, so it continues to take precedence (OpenSSH uses the first-obtained value for-ooptions).The only existing workaround was setting
NIX_SSHOPTSin the daemon's environment, which is awkward to discover and configure.Add 👍 to pull requests you find important.
The Nix maintainer team uses a GitHub project board to schedule and track reviews.