Skip to content

Commit 07b6b98

Browse files
Add agent loader bootstrap planning docs
1 parent 847452a commit 07b6b98

6 files changed

Lines changed: 270 additions & 0 deletions

File tree

planning/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -207,6 +207,7 @@ Phase 1 must complete before Phase 2 or Phase 3 can begin. Phase 2 and Phase 3 c
207207
| `09-signing.md` | **Entitlements (com.apple.security.virtualization), code signing with Developer ID, notarization, CI signing workflow, distribution (Homebrew, GitHub Releases, cargo install, install script)** |
208208
| `on-demand-elevation/README.md` | One-command UX with stage-gated elevation: request admin privileges only when offline provisioning requires it; includes security, UX, and rollout plan |
209209
| `pinned-ipsw-patches/README.md` | Pinned base matrix + signed file-level patch bundle strategy to keep system reliability while enabling no-local-sudo artifact workflows |
210+
| `agent-loader-bootstrap/README.md` | Stage-0 loader + signed swappable guest-agent artifact plan to avoid frequent image deltas for agent updates |
210211
| `sandbox/README.md` | Agent sandbox platform track: first-class primitives (`Sandbox`, `Session`, `Run`, `Policy`), Rust library surface, runtime integration plan, and OpenAPI contract |
211212
| `docker-in-sandbox/README.md` | Full Docker-inside-sandbox platform track: core primitives (`EngineInstance`, `EndpointLease`, filesystem and policy model) and reusable infrastructure surface across SDK/CLI/OpenAPI |
212213
| `oci-runtime/README.md` | Linux OCI runtime track (current): Linux VM bootstrap, image pulling, container lifecycle, unified API prerequisites |
Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
# 01: Stage-0 Loader Contract
2+
3+
## Responsibilities
4+
5+
`vz-agent-loader` is a minimal bootstrap executable with a stable contract:
6+
7+
1. Discover the active guest-agent version.
8+
2. Verify local install metadata before execution.
9+
3. Execute the selected agent binary with expected args/env.
10+
4. Emit clear diagnostics and fallback behavior when no valid agent is available.
11+
12+
The loader should avoid feature creep. Business logic belongs in the main guest agent.
13+
14+
## Proposed guest filesystem layout
15+
16+
- Loader binary: `/usr/local/libexec/vz-agent-loader`
17+
- Launchd plist target:
18+
- system mode: `/Library/LaunchDaemons/com.vz.agent.loader.plist`
19+
- user mode: `/Library/LaunchAgents/com.vz.agent.loader.plist` or per-user location
20+
- Agent store root: `/var/lib/vz/agent`
21+
- Versioned installs: `/var/lib/vz/agent/versions/<version>/vz-guest-agent`
22+
- Active pointer: `/var/lib/vz/agent/current` (symlink to `versions/<version>`)
23+
- Update staging: `/var/lib/vz/agent/staging/<txn-id>`
24+
- State file: `/var/lib/vz/agent/state.json`
25+
26+
## Loader startup sequence
27+
28+
1. Read `state.json` and resolve `current` symlink.
29+
2. Validate agent binary exists and matches recorded digest.
30+
3. `execve()` into the resolved agent binary.
31+
4. If validation fails:
32+
- fallback to previous known-good version if available,
33+
- otherwise exit with explicit error code and structured log.
34+
35+
## Failure behavior
36+
37+
- Never run an unverified binary.
38+
- Never mutate installed versions during boot path.
39+
- Keep startup deterministic: success path is `resolve -> verify -> exec`.
40+
41+
## Compatibility contract
42+
43+
The loader and agent communicate through CLI/env contract, not private ABI.
44+
45+
Required env examples:
46+
47+
- `VZ_AGENT_HOME=/var/lib/vz/agent`
48+
- `VZ_AGENT_CHANNEL=<channel>`
49+
- `VZ_AGENT_LOADER_VERSION=<semver>`
50+
51+
## Implementation constraints
52+
53+
1. Keep loader dependency surface minimal.
54+
2. Keep binary size small enough that bootstrap patch churn is rare.
55+
3. Add integration test that simulates broken `current` symlink and validates fallback.
Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
# 02: Agent Artifact and Verification Model
2+
3+
## Artifact structure
4+
5+
Each agent release artifact contains:
6+
7+
1. `manifest.json`
8+
2. `vz-guest-agent` binary
9+
3. detached signature (`manifest.sig`)
10+
11+
Optional packaging: `tar.zst` with the three files.
12+
13+
## Manifest schema (v1)
14+
15+
Required fields:
16+
17+
- `schema_version`
18+
- `agent_version` (semver)
19+
- `channel` (`stable`, `canary`, etc.)
20+
- `target_os` (`darwin`)
21+
- `target_arch` (`arm64`)
22+
- `binary_sha256`
23+
- `binary_size`
24+
- `created_at`
25+
- `min_loader_version`
26+
- `signing_key_id`
27+
28+
## Trust model
29+
30+
- Loader trusts a pinned set of public keys shipped in bootstrap.
31+
- Manifest signature must validate against trusted key set.
32+
- Binary digest and size must match manifest.
33+
- Any verification failure aborts install.
34+
35+
## Anti-rollback policy
36+
37+
Default rule:
38+
39+
- Reject install if `agent_version` is lower than `state.json` highest known-good for channel.
40+
41+
Override path:
42+
43+
- Explicit `--allow-downgrade` flag for controlled rollback.
44+
45+
## Atomic install algorithm
46+
47+
1. Verify artifact signature and digest in temp staging dir.
48+
2. Write binary to `staging/<txn-id>/vz-guest-agent`.
49+
3. Set mode/owner.
50+
4. Move staging dir to `versions/<version>` (atomic rename).
51+
5. Swap `current` symlink atomically.
52+
6. Update `state.json` with `current`, `previous`, and timestamps.
53+
54+
## Recovery model
55+
56+
If install fails before symlink swap, existing `current` remains untouched.
57+
58+
If post-swap startup healthcheck fails, rollback command points `current` to `previous`.
Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
# 03: Update Flow and CLI UX
2+
3+
## Primary commands
4+
5+
### One-time bootstrap (per base line)
6+
7+
`vz vm bootstrap-agent --image <base.img> --loader <loader-bin> --trust-key <pubkey>`
8+
9+
What it does:
10+
11+
1. Installs loader + launchd plist.
12+
2. Seeds trust roots and empty agent store layout.
13+
3. Optionally seeds an initial agent version.
14+
15+
### Routine agent update (no image delta)
16+
17+
`vz vm agent install --artifact <agent.tar.zst> [--image <img> | --name <running-vm>]`
18+
19+
What it does:
20+
21+
1. Verifies signature and manifest.
22+
2. Installs version atomically.
23+
3. Flips `current` pointer.
24+
4. Optionally restarts loader/agent service.
25+
26+
## Update modes
27+
28+
1. Offline image mode:
29+
- mounts image root
30+
- updates `/var/lib/vz/agent/*`
31+
- used for baking new base variants
32+
2. Online VM mode:
33+
- sends artifact to running VM via existing control channel
34+
- installs in guest without rebuilding image
35+
36+
## Desired UX simplification
37+
38+
User-facing default should be:
39+
40+
1. Bootstrap once for base image family.
41+
2. Ship frequent small agent artifacts.
42+
3. Run a single install command for updates.
43+
44+
No manual bundle JSON, payload dirs, or frequent image deltas for agent-only changes.
45+
46+
## Observability
47+
48+
Add `vz vm agent status`:
49+
50+
- current version
51+
- previous version
52+
- last update time
53+
- channel
54+
- loader version
55+
56+
Add structured logs/events for install, verify, rollback.
Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
# 04: Rollout, Validation, and Risks
2+
3+
## Rollout waves
4+
5+
1. Wave 0 (internal):
6+
- ship loader and artifact verifier behind feature flag
7+
- keep legacy direct-agent bootstrap path
8+
2. Wave 1 (opt-in):
9+
- expose `bootstrap-agent` and `agent install`
10+
- document as preferred for fast agent iteration
11+
3. Wave 2 (default):
12+
- default provisioning path installs loader
13+
- legacy direct binary patch remains fallback
14+
4. Wave 3 (cleanup):
15+
- reduce direct-agent-in-image updates to exceptional cases only
16+
17+
## Validation matrix
18+
19+
1. Fresh base image bootstrap and first boot.
20+
2. Online update while VM running.
21+
3. Offline update on stopped image.
22+
4. Corrupted artifact (signature fail).
23+
5. Downgrade reject + explicit rollback allow path.
24+
6. Startup recovery from broken `current` symlink.
25+
26+
## Risks
27+
28+
1. Loader bug can block agent startup across fleet.
29+
2. Key rotation mistakes can brick update path.
30+
3. Version-state corruption can cause bad rollback behavior.
31+
4. Divergence between offline and online install paths.
32+
33+
## Mitigations
34+
35+
1. Keep loader minimal and heavily tested.
36+
2. Support multiple trusted keys and overlap rotation windows.
37+
3. Write `state.json` atomically with checksum.
38+
4. Reuse a single install engine for offline and online modes.
39+
40+
## Open questions
41+
42+
1. Should `bootstrap-agent` always seed an initial agent artifact?
43+
2. Should channel selection live in loader config or artifact manifest only?
44+
3. Do we require healthcheck ack before finalizing `current` switch?
45+
4. How strict should anti-rollback be for local dev workflows?
Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
# Agent Loader Bootstrap Plan
2+
3+
## Why this plan exists
4+
5+
Current VM bootstrap bundles include the full `vz-guest-agent` binary. That forces frequent patch/delta regeneration whenever the agent changes.
6+
7+
We want a stable bootstrap layer that changes rarely, and a small swappable agent payload that can be updated independently.
8+
9+
## Goals
10+
11+
1. Install a tiny stage-0 loader once per base image line.
12+
2. Decouple guest-agent updates from image delta distribution.
13+
3. Make agent updates small, signed, atomic, and rollback-safe.
14+
4. Preserve unattended startup reliability (`launchd` + pre-login behavior).
15+
5. Keep `vz vm` UX simple for default users.
16+
17+
## Non-goals
18+
19+
1. Replace all patch infrastructure immediately.
20+
2. Support unsigned or best-effort agent updates.
21+
3. Ship a background privileged host daemon in v1.
22+
23+
## Design summary
24+
25+
- Bootstrap image contains:
26+
- `vz-agent-loader` (small static-ish binary, stable interface)
27+
- launchd plist pointing to loader path, not direct guest-agent path
28+
- trust root material for artifact signature verification
29+
- Loader resolves and executes the current agent from a versioned store.
30+
- New agent versions are delivered as signed artifacts and installed atomically.
31+
- No new `.img` delta is required for normal guest-agent releases.
32+
33+
## Document map
34+
35+
- `01-stage0-loader.md` — loader contract, file layout, startup lifecycle.
36+
- `02-agent-artifact-and-verification.md` — artifact format, signature and rollback rules.
37+
- `03-update-and-cli-ux.md` — one-command UX and command surface.
38+
- `04-rollout-risks.md` — rollout waves, validation, risks, and open questions.
39+
40+
## Phase dependency graph
41+
42+
```
43+
Phase 1: stage-0 loader contract + filesystem layout
44+
-> Phase 2: signed agent artifact format + verifier
45+
-> Phase 3: update/install commands (offline + online)
46+
-> Phase 4: rollout and deprecate frequent image-delta agent updates
47+
```
48+
49+
## Acceptance criteria
50+
51+
1. New guest-agent release does not require a new image delta in normal path.
52+
2. Loader starts agent at boot/login according to policy without host-side manual steps.
53+
3. Agent update is atomic (`current` pointer swap) and rollback-capable.
54+
4. Invalid signatures or hash mismatches fail closed.
55+
5. CLI exposes one primary bootstrap command and one primary update command.

0 commit comments

Comments
 (0)