Add typed install error classification and exhaustive test matrix for all CSV failure families#10504
Draft
alokedesai wants to merge 2 commits intomasterfrom
Draft
Add typed install error classification and exhaustive test matrix for all CSV failure families#10504alokedesai wants to merge 2 commits intomasterfrom
alokedesai wants to merge 2 commits intomasterfrom
Conversation
… all CSV failure families Introduces InstallFailureCategory enum covering 21 failure categories observed in production remote-server install errors: - Platform: UnsupportedArch, UnsupportedOs, NoHttpClient, MissingTar, MissingBash - Network: DnsFailure, ConnectionRefused, TlsCaFailure, HttpForbidden, HttpServerError, PartialDownload, InstallTimeout - Filesystem: DownloadWriteFailure, InstallDirPermissionDenied, NoSpaceOrQuota, ReadOnlyFilesystem, TarPermissionFailure - SSH/Auth: ExpiredPasswordOrNoTty, StartupFilePermissionDenied, SshDisconnect - Catch-all: Unknown Each category includes: - classify_install_failure() parser from exit code + stderr + timeout flag - is_retriable() to prevent blind retry on permanent host conditions - telemetry_tag() for stable telemetry strings - Display impl for user-facing messages 78 new unit tests covering: - Classifier coverage for every CSV failure family with production stderr samples - Priority ordering (timeout > exit code sentinel > stderr pattern) - No-retry invariants for permission/quota/read-only/expired-password/SSH-disconnect - Telemetry tag uniqueness and stability - Install script structural probes (arch mapping, HTTP client selection order, SCP fallback trigger alignment, tilde expansion, staging tarball path) - Architecture mapping consistency between Rust and shell script Co-Authored-By: Oz <oz-agent@warp.dev>
Renames and additions per error-classification agent coordination: - Timeout (was InstallTimeout), MissingHttpClient (was NoHttpClient) - UnsupportedArchitecture (was UnsupportedArch), ExpiredPassword (was ExpiredPasswordOrNoTty) - NoSpaceLeft (was NoSpaceOrQuota), HttpBadGateway/HttpError (was HttpServerError) - ConnectionRefused + ConnectionUnreachable (was single ConnectionRefused) - New: MissingCurl, MissingWget, TarExtractionFailure, ScriptError API methods aligned: title(), description(), as_str() (was telemetry_tag()) Function signature: classify_install_failure(stderr, exit_code) per coordinated spec Added classify_install_failure_with_timeout() for SSH transport timeout flag ScriptError: non-zero exits with unrecognized stderr now get their own bucket instead of falling through to Unknown. Co-Authored-By: Oz <oz-agent@warp.dev>
4da80f5 to
b263d22
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Remote-server install failures were arriving as raw exit codes and stderr strings. The CSV contained many distinct failure families, but without a typed classifier it was difficult to tell which failures were transient, which should be routed to fallback logic, and which represented permanent host conditions. This also made regressions likely because the failure-family coverage was not encoded in tests.
Solution
InstallFailureCategoryenum for the observed install failure families.Proof / validation
No screenshots are applicable because this is classifier/test infrastructure. Proof is the test matrix itself and command output:
cargo test -p remote_serverpassed in the integrated validation run: 136 tests passed.4, no HTTP client exiting3, and missing tar exiting5.Co-Authored-By: Oz oz-agent@warp.dev