Skip to content

Add typed install error classification and exhaustive test matrix for all CSV failure families#10504

Draft
alokedesai wants to merge 2 commits intomasterfrom
oz/remote-install-test-matrix
Draft

Add typed install error classification and exhaustive test matrix for all CSV failure families#10504
alokedesai wants to merge 2 commits intomasterfrom
oz/remote-install-test-matrix

Conversation

@alokedesai
Copy link
Copy Markdown
Member

@alokedesai alokedesai commented May 8, 2026

Problem

Remote-server install failures were arriving as raw exit codes and stderr strings. The CSV contained many distinct failure families, but without a typed classifier it was difficult to tell which failures were transient, which should be routed to fallback logic, and which represented permanent host conditions. This also made regressions likely because the failure-family coverage was not encoded in tests.

Solution

  • Add a typed InstallFailureCategory enum for the observed install failure families.
  • Add classifier helpers for raw stderr, exit codes, and explicit timeout failures.
  • Encode retryability by category so fallback logic can distinguish retriable network/download failures from permanent host failures.
  • Add an exhaustive unit-test matrix for the CSV-derived families, category metadata, priority ordering, script sentinels, and architecture mapping consistency.

Proof / validation

No screenshots are applicable because this is classifier/test infrastructure. Proof is the test matrix itself and command output:

  • cargo test -p remote_server passed in the integrated validation run: 136 tests passed.
  • The tests cover timeout, missing curl/wget/no HTTP client, missing tar/bash, unsupported OS/arch, DNS/connectivity/TLS/HTTP failures, partial downloads, write failures, permissions, no-space/quota, read-only filesystems, extraction failures, expired password, startup-file permission issues, SSH disconnects, and unknown script errors.
  • The Docker SSH harness produced classifier-aligned sentinels after integration, including DNS/HTTP/TLS/download failures exiting 4, no HTTP client exiting 3, and missing tar exiting 5.

Co-Authored-By: Oz oz-agent@warp.dev

@cla-bot cla-bot Bot added the cla-signed label May 8, 2026
alokedesai and others added 2 commits May 8, 2026 18:25
… all CSV failure families

Introduces InstallFailureCategory enum covering 21 failure categories
observed in production remote-server install errors:

- Platform: UnsupportedArch, UnsupportedOs, NoHttpClient, MissingTar, MissingBash
- Network: DnsFailure, ConnectionRefused, TlsCaFailure, HttpForbidden,
  HttpServerError, PartialDownload, InstallTimeout
- Filesystem: DownloadWriteFailure, InstallDirPermissionDenied, NoSpaceOrQuota,
  ReadOnlyFilesystem, TarPermissionFailure
- SSH/Auth: ExpiredPasswordOrNoTty, StartupFilePermissionDenied, SshDisconnect
- Catch-all: Unknown

Each category includes:
- classify_install_failure() parser from exit code + stderr + timeout flag
- is_retriable() to prevent blind retry on permanent host conditions
- telemetry_tag() for stable telemetry strings
- Display impl for user-facing messages

78 new unit tests covering:
- Classifier coverage for every CSV failure family with production stderr samples
- Priority ordering (timeout > exit code sentinel > stderr pattern)
- No-retry invariants for permission/quota/read-only/expired-password/SSH-disconnect
- Telemetry tag uniqueness and stability
- Install script structural probes (arch mapping, HTTP client selection order,
  SCP fallback trigger alignment, tilde expansion, staging tarball path)
- Architecture mapping consistency between Rust and shell script

Co-Authored-By: Oz <oz-agent@warp.dev>
Renames and additions per error-classification agent coordination:
- Timeout (was InstallTimeout), MissingHttpClient (was NoHttpClient)
- UnsupportedArchitecture (was UnsupportedArch), ExpiredPassword (was ExpiredPasswordOrNoTty)
- NoSpaceLeft (was NoSpaceOrQuota), HttpBadGateway/HttpError (was HttpServerError)
- ConnectionRefused + ConnectionUnreachable (was single ConnectionRefused)
- New: MissingCurl, MissingWget, TarExtractionFailure, ScriptError

API methods aligned: title(), description(), as_str() (was telemetry_tag())
Function signature: classify_install_failure(stderr, exit_code) per coordinated spec
Added classify_install_failure_with_timeout() for SSH transport timeout flag

ScriptError: non-zero exits with unrecognized stderr now get their
own bucket instead of falling through to Unknown.

Co-Authored-By: Oz <oz-agent@warp.dev>
@alokedesai alokedesai force-pushed the oz/remote-install-test-matrix branch from 4da80f5 to b263d22 Compare May 8, 2026 18:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant