feat(runtime): /runtimes/* HTTP surface + RuntimeStatusBar/ControlPanel UI#971
feat(runtime): /runtimes/* HTTP surface + RuntimeStatusBar/ControlPanel UI#971Dani Akash (DaniAkash) wants to merge 9 commits intofeat/openclaw-runtimefrom
Conversation
Uniform HTTP surface backed by AgentRuntimeRegistry + runtime.executeAction: - GET /runtimes — list all registered runtimes (descriptor + status + capabilities) - GET /runtimes/:adapter/status — single status snapshot - GET /runtimes/:adapter/status/stream — SSE: snapshot on connect + every state transition - POST /runtimes/:adapter/actions/:action — capability-gated dispatch through executeAction - GET /runtimes/:adapter/logs — container-runtime logs (405 for host-process) Routes use zValidator for path/query/body so the typed RPC client picks up the schemas; mounted with the same requireTrustedAppOrigin middleware as /claw/* /terminal /acl-rules /monitoring.
Generic React Query hooks backed by the typed RPC client (hc<AppType>), keyed by adapter id. useRuntime polls /runtimes/:adapter/status every 5s by default; useRuntimeAction issues a capability-gated POST to /runtimes/:adapter/actions/:action and invalidates the status query on success; useRuntimeLogs is opt-in (disabled by default) for container runtimes.
RuntimeStatusBar — compact one-line bar with adapter name + state pill + optional Restart action. Reads from useRuntime(adapter); the pill covers every container and host-process state. extraPill / extraActions slots let openclaw add its control-plane pill and Open Terminal button without baking gateway specifics into the runtime layer. RuntimeControlPanel — capability-gated state-appropriate primary CTA: not_installed → Install, stopped → Start, errored → Restart + Reset, installing/starting → spinner, cli_missing/unhealthy → Reinstall CLI, running → optional Stop. extras slot for adapter-specific affordances (e.g. openclaw provider Setup dialog trigger).
…ge; drop legacy lifecycle UI AgentsPage now uses the new runtime-control components for OpenClaw lifecycle: - RuntimeControlPanel replaces GatewayStateCards (state-appropriate CTAs gated on capabilities). Provider config dialog trigger lives in the panel's extras slot. - RuntimeStatusBar replaces GatewayStatusBar (running pill + Restart). Control-plane pill + Open Terminal live in the bar's extra slots — gateway specifics stay outside the runtime layer. GatewayStatusBar.tsx deletes outright. The 'Unavailable' badge in AgentSummaryChips.tsx deletes — capabilities-driven UI surfaces the same signal more usefully on the new RuntimeControlPanel; the prop stays for upstream callers but is now a no-op. ControlPlaneAlert / LifecycleAlert / InlineErrorAlert from OpenClawControls remain — they're alerts for control-plane and mid-flight lifecycle states, distinct from the runtime control surface. They cover gateway-specific concerns the runtime layer doesn't model. Cleanup deferred to a follow-up.
✅ Tests passed — 1224/1228
|
Greptile SummaryThis PR lands the user-visible runtime layer: a uniform
Confidence Score: 4/5Safe to merge after addressing the minor cleanup items — the core server routes, hooks, and UI components are well-structured and covered by tests. The new routes are capability-gated, validated, and tested. The UI components cleanly replace their predecessors without introducing regressions on the primary openclaw flow. The findings are quality/cleanup items: an unused query key, a dead prop retained for callers that is never read, a label-fidelity regression in the control-plane pill, and a subtle SSE heartbeat leak on silent TCP drops. None affect correctness of the main flow today. useRuntime.ts (unused RUNTIME_QUERY_KEYS.list), AgentSummaryChips.tsx (dead adapterHealth prop), AgentsPage.tsx (ControlPlanePill label regression), runtimes.ts (SSE heartbeat cleanup) Important Files Changed
Sequence DiagramsequenceDiagram
participant UI as AgentsPage (React)
participant Hook as useRuntime / useRuntimeAction
participant RPC as Hono RPC Client
participant Server as /runtimes/* routes
participant Reg as AgentRuntimeRegistry
participant RT as AgentRuntime (openclaw)
UI->>Hook: useRuntime("openclaw") [5s poll]
Hook->>RPC: GET /runtimes/:adapter/status
RPC->>Server: GET /runtimes/openclaw/status
Server->>Reg: registry.get("openclaw")
Reg-->>Server: runtime instance
Server->>RT: runtime.getStatusSnapshot()
RT-->>Server: RuntimeStatusSnapshot
Server-->>RPC: "{ descriptor, status, capabilities }"
RPC-->>Hook: RuntimeView
Hook-->>UI: "{ data, isLoading }"
UI->>Hook: useRuntimeAction("openclaw")
UI->>Hook: "action.mutate({ action: "restart" })"
Hook->>RPC: POST /runtimes/openclaw/actions/restart
RPC->>Server: POST /runtimes/:adapter/actions/:action
Server->>RT: capabilities.includes("restart")?
RT-->>Server: true
Server->>RT: "runtime.executeAction({ type: "restart" })"
RT-->>Server: void
Server-->>RPC: "{ status: "ok", state: "starting" }"
RPC-->>Hook: success
Hook->>Hook: invalidateQueries(["runtime-status","openclaw"])
UI->>RPC: GET /runtimes/openclaw/status/stream (SSE)
RPC->>Server: SSE connect
Server->>RT: runtime.subscribe(writeSnapshot)
RT-->>Server: unsubscribe fn
loop every state change
RT->>Server: listener(snapshot)
Server-->>UI: "event: snapshot data: {...}"
end
loop every 15s
Server-->>UI: "event: heartbeat data: {ts:...}"
end
UI->>Server: abort
Server->>RT: unsubscribe()
Server->>Server: clearInterval(heartbeat)
|
…nder Start CTA for installed state Two stuck-state bugs in the new RuntimeControlPanel: 1. The runtime's state machine started fresh at not_installed on every server boot. tryAutoStart's short-circuit branches (gateway already running, auth pass) never drove the state transitions, so the UI saw not_installed for a gateway that was actually running. Add a syncState() method on OpenClawContainerRuntime that probes the actual container via cli.inspectContainer + /readyz and sets state accordingly. Wire it into tryAutoStart as the first step so it runs regardless of which branch the rest takes. 2. RuntimeControlPanel had no case for state === 'installed', so after a successful Install the panel went blank instead of offering the next step. Treat installed the same as stopped — show the Start CTA with copy that reflects the difference (image is pulled vs container exists but stopped). Optional-chained the syncState call so existing tests with partial runtime mocks don't crash on the missing method.
When a previous server boot wrote runtime-state.json after the gateway container had already been created with a different hostPort (e.g. 18789 held at allocate-time → container started on 18790), the persisted port disagrees with the live mapping. The runtime then probes the persisted port forever and the UI sticks at `starting`. `syncState` now reads `NetworkSettings.Ports` from inspect-container and adopts the actual host port for the gateway container's published port when it differs. The service then re-syncs `hostPort`/`httpClient` and rewrites runtime-state.json so the next boot starts from a clean slate. - ContainerInfo gains a flat `ports` array (parsed from `NetworkSettings.Ports`) - OpenClawContainerRuntime.syncState: reconcile hostPort from live mapping before probing /readyz - OpenClawService.tryAutoStart: adopt the runtime's reconciled port and persist it via writePersistedGatewayPort
…ismatch When a previous boot leaves a gateway running with a stale token, the realloc-on-auth-mismatch branch was bumping the persisted port without actually freeing the old container — ManagedContainer.start() no-ops when state==='running', so the next start cycle never recreated the container on the new port. The result: persisted/service/runtime drift back into mismatch, and history requests 500 with "gateway is not ready" even while the (stale) gateway keeps serving chat from the old port. Stop the gateway explicitly when we decide to bump off the port, so the upcoming start cycle goes through the full remove + create + start path on the freshly-allocated port. The token-mismatch test still passes; adds a new test pinning the stop-before-realloc behaviour.
…fresh install
Starting the gateway via the new RuntimeControlPanel "Start" CTA goes
through runtime.executeAction({type:'start'}) directly, bypassing
OpenClawService.tryAutoStart and its ensureStateEnvFile() seeding step.
On a freshly-wiped .browseros-dev that left nerdctl create failing with
"failed to open env file .../.openclaw/.env: no such file or directory".
Seed the file (empty, mode 0600) inside buildContainerSpec so the
runtime is self-sufficient. Service callers continue to work — their
ensureStateEnvFile is now an idempotent no-op once the file exists.
Summary
Stacked on #970 (feat/openclaw-runtime). Lands the user-visible piece of the AgentRuntime architecture: a uniform
/runtimes/<adapter>/*HTTP surface backed byruntime.executeAction(...)throughAgentRuntimeRegistry, plus capability-gated UI components that consume it.Server:
GET /runtimes— list all registered runtimes with descriptor + status snapshot + capabilitiesGET /runtimes/:adapter/status— single runtime statusGET /runtimes/:adapter/status/stream— SSE: snapshot on connect + every state transition + 15s heartbeatPOST /runtimes/:adapter/actions/:action— capability-gated dispatch throughexecuteAction. Body schema picks upagentIdforreset-wipe-agent. 405 if action not in capabilities; 400 on unknown action; 500 on action throw.GET /runtimes/:adapter/logs— container-runtime logs (405 for host-process)zValidatorfor path/query/body so the typed RPC client (hc<AppType>) picks up the schemas.UI:
useRuntime(adapter)/useRuntimeAction(adapter)/useRuntimeLogs(adapter)— generic React Query hooks backed by the typed RPC client. 5s default poll; mutations invalidate the status query on success.<RuntimeStatusBar adapter='…'>replacesGatewayStatusBar. Compact one-line bar with state pill + optional Restart.extraPillandextraActionsslots let openclaw add its control-plane pill and Open Terminal button without baking gateway specifics into the runtime layer.<RuntimeControlPanel adapter='…'>replacesGatewayStateCardsfrom OpenClawControls. Capability-gated state-appropriate primary CTA:not_installed → Install,stopped → Start,errored → Restart + Reset,installing/starting → spinner,cli_missing/unhealthy → Reinstall CLI,running → optional Stop.extrasslot for adapter-specific affordances (e.g. openclaw's provider Setup dialog trigger).AgentSummaryChips.tsxdeletes (capabilities-driven UI surfaces the signal more usefully on the new RuntimeControlPanel).GatewayStatusBar.tsxdeletes outright.ControlPlaneAlert/LifecycleAlert/InlineErrorAlertfrom OpenClawControls remain — they cover gateway-specific concerns the runtime layer doesn't model.Out of scope (deferred follow-ups):
/claw/{status,start,stop,restart,logs}lifecycle routes — UI still polls/claw/statusfor control-plane info that lives outside the runtime registry. Will land once the control-plane surface is moved to the runtime layer (Phase 7+).useOpenClaw.ts's lifecycle mutations — they're now a fallback, replaced by the new hooks at the call sites that matter.Test plan
bun run typecheckclean across server + UI (pre-existing missing-generated-graphql errors aside)biome checkclean on touched filestests/api/routes/runtimes.test.tscovering list/status/actions (capability gate, unknown action, agentId requirement, throw → 500) / logs (container vs host-process)ContainerCliflake also reproduces on plainorigin/dev)