Skip to content

feat(runtime): /runtimes/* HTTP surface + RuntimeStatusBar/ControlPanel UI#971

Open
Dani Akash (DaniAkash) wants to merge 9 commits intofeat/openclaw-runtimefrom
feat/runtime-control-ui
Open

feat(runtime): /runtimes/* HTTP surface + RuntimeStatusBar/ControlPanel UI#971
Dani Akash (DaniAkash) wants to merge 9 commits intofeat/openclaw-runtimefrom
feat/runtime-control-ui

Conversation

@DaniAkash
Copy link
Copy Markdown
Contributor

Summary

Stacked on #970 (feat/openclaw-runtime). Lands the user-visible piece of the AgentRuntime architecture: a uniform /runtimes/<adapter>/* HTTP surface backed by runtime.executeAction(...) through AgentRuntimeRegistry, plus capability-gated UI components that consume it.

Server:

  • GET /runtimes — list all registered runtimes with descriptor + status snapshot + capabilities
  • GET /runtimes/:adapter/status — single runtime status
  • GET /runtimes/:adapter/status/stream — SSE: snapshot on connect + every state transition + 15s heartbeat
  • POST /runtimes/:adapter/actions/:action — capability-gated dispatch through executeAction. Body schema picks up agentId for reset-wipe-agent. 405 if action not in capabilities; 400 on unknown action; 500 on action throw.
  • GET /runtimes/:adapter/logs — container-runtime logs (405 for host-process)
  • All routes use zValidator for path/query/body so the typed RPC client (hc<AppType>) picks up the schemas.

UI:

  • useRuntime(adapter) / useRuntimeAction(adapter) / useRuntimeLogs(adapter) — generic React Query hooks backed by the typed RPC client. 5s default poll; mutations invalidate the status query on success.
  • <RuntimeStatusBar adapter='…'> replaces GatewayStatusBar. Compact one-line bar with state pill + optional Restart. extraPill and extraActions slots let openclaw add its control-plane pill and Open Terminal button without baking gateway specifics into the runtime layer.
  • <RuntimeControlPanel adapter='…'> replaces GatewayStateCards from OpenClawControls. Capability-gated state-appropriate primary CTA: not_installed → Install, stopped → Start, errored → Restart + Reset, installing/starting → spinner, cli_missing/unhealthy → Reinstall CLI, running → optional Stop. extras slot for adapter-specific affordances (e.g. openclaw's provider Setup dialog trigger).
  • AgentsPage rewired to mount the new components. The 'Unavailable' badge in AgentSummaryChips.tsx deletes (capabilities-driven UI surfaces the signal more usefully on the new RuntimeControlPanel).
  • GatewayStatusBar.tsx deletes outright.
  • ControlPlaneAlert / LifecycleAlert / InlineErrorAlert from OpenClawControls remain — they cover gateway-specific concerns the runtime layer doesn't model.

Out of scope (deferred follow-ups):

  • Deleting the legacy /claw/{status,start,stop,restart,logs} lifecycle routes — UI still polls /claw/status for control-plane info that lives outside the runtime registry. Will land once the control-plane surface is moved to the runtime layer (Phase 7+).
  • Slimming useOpenClaw.ts's lifecycle mutations — they're now a fallback, replaced by the new hooks at the call sites that matter.

Test plan

  • bun run typecheck clean across server + UI (pre-existing missing-generated-graphql errors aside)
  • biome check clean on touched files
  • 11 new server-side tests in tests/api/routes/runtimes.test.ts covering list/status/actions (capability gate, unknown action, agentId requirement, throw → 500) / logs (container vs host-process)
  • Full server test sweep — 1042 pass, 0 fail (one pre-existing ContainerCli flake also reproduces on plain origin/dev)
  • End-to-end UI verification by Dani — full openclaw lifecycle via the new RuntimeStatusBar + RuntimeControlPanel before merging this stack

Uniform HTTP surface backed by AgentRuntimeRegistry + runtime.executeAction:
- GET /runtimes — list all registered runtimes (descriptor + status + capabilities)
- GET /runtimes/:adapter/status — single status snapshot
- GET /runtimes/:adapter/status/stream — SSE: snapshot on connect + every state transition
- POST /runtimes/:adapter/actions/:action — capability-gated dispatch through executeAction
- GET /runtimes/:adapter/logs — container-runtime logs (405 for host-process)

Routes use zValidator for path/query/body so the typed RPC client picks
up the schemas; mounted with the same requireTrustedAppOrigin
middleware as /claw/* /terminal /acl-rules /monitoring.
Generic React Query hooks backed by the typed RPC client (hc<AppType>),
keyed by adapter id. useRuntime polls /runtimes/:adapter/status every
5s by default; useRuntimeAction issues a capability-gated POST to
/runtimes/:adapter/actions/:action and invalidates the status query
on success; useRuntimeLogs is opt-in (disabled by default) for
container runtimes.
RuntimeStatusBar — compact one-line bar with adapter name + state pill
+ optional Restart action. Reads from useRuntime(adapter); the pill
covers every container and host-process state. extraPill / extraActions
slots let openclaw add its control-plane pill and Open Terminal
button without baking gateway specifics into the runtime layer.

RuntimeControlPanel — capability-gated state-appropriate primary CTA:
not_installed → Install, stopped → Start, errored → Restart + Reset,
installing/starting → spinner, cli_missing/unhealthy → Reinstall CLI,
running → optional Stop. extras slot for adapter-specific affordances
(e.g. openclaw provider Setup dialog trigger).
…ge; drop legacy lifecycle UI

AgentsPage now uses the new runtime-control components for OpenClaw
lifecycle:
- RuntimeControlPanel replaces GatewayStateCards (state-appropriate
  CTAs gated on capabilities). Provider config dialog trigger lives
  in the panel's extras slot.
- RuntimeStatusBar replaces GatewayStatusBar (running pill +
  Restart). Control-plane pill + Open Terminal live in the bar's
  extra slots — gateway specifics stay outside the runtime layer.

GatewayStatusBar.tsx deletes outright. The 'Unavailable' badge in
AgentSummaryChips.tsx deletes — capabilities-driven UI surfaces the
same signal more usefully on the new RuntimeControlPanel; the prop
stays for upstream callers but is now a no-op.

ControlPlaneAlert / LifecycleAlert / InlineErrorAlert from
OpenClawControls remain — they're alerts for control-plane and
mid-flight lifecycle states, distinct from the runtime control
surface. They cover gateway-specific concerns the runtime layer
doesn't model. Cleanup deferred to a follow-up.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 8, 2026

✅ Tests passed — 1224/1228

Suite Passed Failed Skipped
agent 76/76 0 0
build 9/9 0 0
eval 93/93 0 0
server-agent 261/261 0 0
server-api 197/197 0 0
server-browser 4/4 0 0
server-integration 9/10 0 1
server-lib 253/253 0 0
server-root 60/63 0 3
server-skills 31/31 0 0
server-tools 231/231 0 0

View workflow run

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 8, 2026

Greptile Summary

This PR lands the user-visible runtime layer: a uniform /runtimes/<adapter>/* HTTP surface backed by AgentRuntimeRegistry, plus generic React Query hooks and two new UI components (RuntimeStatusBar, RuntimeControlPanel) that replace the openclaw-specific GatewayStatusBar and GatewayStateCards. The AgentsPage is rewired to consume the new components, and the per-row Unavailable badge is removed in favour of the capability-driven control panel.

  • Server (runtimes.ts): five new routes (list, status, SSE stream, action dispatch, logs) all validated with zValidator; action dispatch is capability-gated with correct 405/400/500 error handling; 11 new unit tests cover the key branches.
  • Client hooks (useRuntime.ts): typed RPC-backed useRuntime / useRuntimeAction / useRuntimeLogs with 5 s default poll and post-action query invalidation.
  • UI components: RuntimeControlPanel maps runtime state to capability-gated CTAs; RuntimeStatusBar renders a compact pill bar — both accept adapter-specific slots (extras, extraPill, extraActions) so openclaw-specific concerns stay out of the generic layer.

Confidence Score: 4/5

Safe to merge after addressing the minor cleanup items — the core server routes, hooks, and UI components are well-structured and covered by tests.

The new routes are capability-gated, validated, and tested. The UI components cleanly replace their predecessors without introducing regressions on the primary openclaw flow. The findings are quality/cleanup items: an unused query key, a dead prop retained for callers that is never read, a label-fidelity regression in the control-plane pill, and a subtle SSE heartbeat leak on silent TCP drops. None affect correctness of the main flow today.

useRuntime.ts (unused RUNTIME_QUERY_KEYS.list), AgentSummaryChips.tsx (dead adapterHealth prop), AgentsPage.tsx (ControlPlanePill label regression), runtimes.ts (SSE heartbeat cleanup)

Important Files Changed

Filename Overview
packages/browseros-agent/apps/server/src/api/routes/runtimes.ts New /runtimes/* HTTP surface: list, status, SSE stream, action dispatch, logs. Route logic is well-structured and capability-gated. Minor: SSE heartbeat write errors are silently swallowed and won't trigger early cleanup on a dead connection.
packages/browseros-agent/apps/agent/entrypoints/app/agents/useRuntime.ts New React Query hooks for runtime status, action dispatch, and logs. RUNTIME_QUERY_KEYS.list is exported but never consumed — dead code that should be removed per project rules.
packages/browseros-agent/apps/agent/entrypoints/app/agents/runtime-controls/RuntimeControlPanel.tsx New generic capability-gated control panel. State-to-CTA mapping is clear and exhaustive. extras slot correctly threads adapter-specific affordances without leaking openclaw specifics into the base component.
packages/browseros-agent/apps/agent/entrypoints/app/agents/runtime-controls/RuntimeStatusBar.tsx New compact status bar with extensible pill + action slots. State pill mapping is thorough. Separator rendering logic is correct.
packages/browseros-agent/apps/agent/entrypoints/app/agents/AgentsPage.tsx Rewires AgentsPage to use new runtime components; removes GatewayStatusBar/GatewayStateCards. New ControlPlanePill merges 'reconnecting'/'recovering' into a single "Connecting" label, losing the granularity the old component provided.
packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/AgentSummaryChips.tsx Removes the 'Unavailable' badge from per-row chips. adapterHealth prop is retained as optional but never used inside the component — dead interface surface.
packages/browseros-agent/apps/server/src/api/server.ts Mounts the new /runtimes router behind requireTrustedAppOrigin(). No issues.
packages/browseros-agent/apps/server/tests/api/routes/runtimes.test.ts 11 focused tests covering the new routes: capability gate, unknown action (400), agentId requirement, action throw (500), container vs host-process logs. Coverage is comprehensive for the happy path and key error branches.
packages/browseros-agent/apps/agent/entrypoints/app/agents/GatewayStatusBar.tsx Deleted entirely — replaced by the generic RuntimeStatusBar. Clean removal.

Sequence Diagram

sequenceDiagram
    participant UI as AgentsPage (React)
    participant Hook as useRuntime / useRuntimeAction
    participant RPC as Hono RPC Client
    participant Server as /runtimes/* routes
    participant Reg as AgentRuntimeRegistry
    participant RT as AgentRuntime (openclaw)

    UI->>Hook: useRuntime("openclaw") [5s poll]
    Hook->>RPC: GET /runtimes/:adapter/status
    RPC->>Server: GET /runtimes/openclaw/status
    Server->>Reg: registry.get("openclaw")
    Reg-->>Server: runtime instance
    Server->>RT: runtime.getStatusSnapshot()
    RT-->>Server: RuntimeStatusSnapshot
    Server-->>RPC: "{ descriptor, status, capabilities }"
    RPC-->>Hook: RuntimeView
    Hook-->>UI: "{ data, isLoading }"

    UI->>Hook: useRuntimeAction("openclaw")
    UI->>Hook: "action.mutate({ action: "restart" })"
    Hook->>RPC: POST /runtimes/openclaw/actions/restart
    RPC->>Server: POST /runtimes/:adapter/actions/:action
    Server->>RT: capabilities.includes("restart")?
    RT-->>Server: true
    Server->>RT: "runtime.executeAction({ type: "restart" })"
    RT-->>Server: void
    Server-->>RPC: "{ status: "ok", state: "starting" }"
    RPC-->>Hook: success
    Hook->>Hook: invalidateQueries(["runtime-status","openclaw"])

    UI->>RPC: GET /runtimes/openclaw/status/stream (SSE)
    RPC->>Server: SSE connect
    Server->>RT: runtime.subscribe(writeSnapshot)
    RT-->>Server: unsubscribe fn
    loop every state change
        RT->>Server: listener(snapshot)
        Server-->>UI: "event: snapshot data: {...}"
    end
    loop every 15s
        Server-->>UI: "event: heartbeat data: {ts:...}"
    end
    UI->>Server: abort
    Server->>RT: unsubscribe()
    Server->>Server: clearInterval(heartbeat)
Loading

Comments Outside Diff (2)

  1. packages/browseros-agent/apps/agent/entrypoints/app/agents/AgentsPage.tsx, line 131-138 (link)

    P2 pillForControlPlane loses distinct "Reconnecting" / "Recovering" labels

    The old GatewayStatusBar.tsx mapped 'reconnecting'"Reconnecting" and 'recovering'"Recovering" as separate cases with separate labels. The new implementation folds both under a single "Connecting" label. A user whose gateway is in a slow recovery loop now sees the same text as a fresh connect attempt, making it harder to tell that the situation is degraded. Consider preserving the individual labels to match the previous UX fidelity.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: packages/browseros-agent/apps/agent/entrypoints/app/agents/AgentsPage.tsx
    Line: 131-138
    
    Comment:
    **`pillForControlPlane` loses distinct "Reconnecting" / "Recovering" labels**
    
    The old `GatewayStatusBar.tsx` mapped `'reconnecting'``"Reconnecting"` and `'recovering'``"Recovering"` as separate cases with separate labels. The new implementation folds both under a single `"Connecting"` label. A user whose gateway is in a slow recovery loop now sees the same text as a fresh connect attempt, making it harder to tell that the situation is degraded. Consider preserving the individual labels to match the previous UX fidelity.
    
    How can I resolve this? If you propose a fix, please make it concise.

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

  2. packages/browseros-agent/apps/server/src/api/routes/runtimes.ts, line 1067-1103 (link)

    P2 SSE stream: heartbeat write errors are silently swallowed after abort

    The heartbeat setInterval callback calls s.write(...).catch(() => {}), which suppresses every write error including those that occur while the stream is still considered alive but the underlying connection has silently dropped (e.g., TCP RST before the Hono abort handler fires). In that window, the interval continues firing and accumulating silently-failing writes. The pattern is fine for the snapshot writes (fire-and-forget after subscribe), but the heartbeat would benefit from detecting write failure and resolving the abort promise early to trigger cleanup. Minimal fix: track a closed flag and clearInterval on the first failed heartbeat write.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: packages/browseros-agent/apps/server/src/api/routes/runtimes.ts
    Line: 1067-1103
    
    Comment:
    **SSE stream: heartbeat write errors are silently swallowed after abort**
    
    The heartbeat `setInterval` callback calls `s.write(...).catch(() => {})`, which suppresses every write error including those that occur while the stream is still considered alive but the underlying connection has silently dropped (e.g., TCP RST before the Hono abort handler fires). In that window, the interval continues firing and accumulating silently-failing writes. The pattern is fine for the snapshot writes (fire-and-forget after subscribe), but the heartbeat would benefit from detecting write failure and resolving the abort promise early to trigger cleanup. Minimal fix: track a `closed` flag and `clearInterval` on the first failed heartbeat write.
    
    How can I resolve this? If you propose a fix, please make it concise.
Prompt To Fix All With AI
Fix the following 4 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 4
packages/browseros-agent/apps/agent/entrypoints/app/agents/useRuntime.ts:914-918
**Unused `list` key in `RUNTIME_QUERY_KEYS`**

`RUNTIME_QUERY_KEYS.list` is exported but never consumed — no `useRuntimeList` hook exists and the key isn't referenced anywhere in this PR. Leaving it creates false signal that a list-level invalidation pattern is in use. Per the project's cleanup guidelines, dead code should be removed rather than retained for hypothetical future use.

### Issue 2 of 4
packages/browseros-agent/apps/agent/entrypoints/app/agents/agent-row/AgentSummaryChips.tsx:7-15
**`adapterHealth` prop declared but never used**

`adapterHealth` is kept as an optional prop "for upstream callers" but is never destructured, read, or acted on inside the component. Any caller that still passes it has the value silently discarded. The prop should be removed entirely — callers can be updated in the same pass since the change is purely additive (optional → removed). Keeping it as dead interface surface contradicts the project's remove-dead-code rule.

### Issue 3 of 4
packages/browseros-agent/apps/agent/entrypoints/app/agents/AgentsPage.tsx:131-138
**`pillForControlPlane` loses distinct "Reconnecting" / "Recovering" labels**

The old `GatewayStatusBar.tsx` mapped `'reconnecting'``"Reconnecting"` and `'recovering'``"Recovering"` as separate cases with separate labels. The new implementation folds both under a single `"Connecting"` label. A user whose gateway is in a slow recovery loop now sees the same text as a fresh connect attempt, making it harder to tell that the situation is degraded. Consider preserving the individual labels to match the previous UX fidelity.

### Issue 4 of 4
packages/browseros-agent/apps/server/src/api/routes/runtimes.ts:1067-1103
**SSE stream: heartbeat write errors are silently swallowed after abort**

The heartbeat `setInterval` callback calls `s.write(...).catch(() => {})`, which suppresses every write error including those that occur while the stream is still considered alive but the underlying connection has silently dropped (e.g., TCP RST before the Hono abort handler fires). In that window, the interval continues firing and accumulating silently-failing writes. The pattern is fine for the snapshot writes (fire-and-forget after subscribe), but the heartbeat would benefit from detecting write failure and resolving the abort promise early to trigger cleanup. Minimal fix: track a `closed` flag and `clearInterval` on the first failed heartbeat write.

Reviews (1): Last reviewed commit: "refactor(ui): wire RuntimeStatusBar + Ru..." | Re-trigger Greptile

…nder Start CTA for installed state

Two stuck-state bugs in the new RuntimeControlPanel:

1. The runtime's state machine started fresh at not_installed on every
   server boot. tryAutoStart's short-circuit branches (gateway already
   running, auth pass) never drove the state transitions, so the UI
   saw not_installed for a gateway that was actually running. Add a
   syncState() method on OpenClawContainerRuntime that probes the
   actual container via cli.inspectContainer + /readyz and sets state
   accordingly. Wire it into tryAutoStart as the first step so it
   runs regardless of which branch the rest takes.

2. RuntimeControlPanel had no case for state === 'installed', so after
   a successful Install the panel went blank instead of offering the
   next step. Treat installed the same as stopped — show the Start
   CTA with copy that reflects the difference (image is pulled vs
   container exists but stopped).

Optional-chained the syncState call so existing tests with partial
runtime mocks don't crash on the missing method.
When a previous server boot wrote runtime-state.json after the gateway
container had already been created with a different hostPort (e.g. 18789
held at allocate-time → container started on 18790), the persisted port
disagrees with the live mapping. The runtime then probes the persisted
port forever and the UI sticks at `starting`.

`syncState` now reads `NetworkSettings.Ports` from inspect-container and
adopts the actual host port for the gateway container's published port
when it differs. The service then re-syncs `hostPort`/`httpClient` and
rewrites runtime-state.json so the next boot starts from a clean slate.

- ContainerInfo gains a flat `ports` array (parsed from
  `NetworkSettings.Ports`)
- OpenClawContainerRuntime.syncState: reconcile hostPort from live
  mapping before probing /readyz
- OpenClawService.tryAutoStart: adopt the runtime's reconciled port and
  persist it via writePersistedGatewayPort
…ismatch

When a previous boot leaves a gateway running with a stale token, the
realloc-on-auth-mismatch branch was bumping the persisted port without
actually freeing the old container — ManagedContainer.start() no-ops
when state==='running', so the next start cycle never recreated the
container on the new port. The result: persisted/service/runtime drift
back into mismatch, and history requests 500 with "gateway is not ready"
even while the (stale) gateway keeps serving chat from the old port.

Stop the gateway explicitly when we decide to bump off the port, so the
upcoming start cycle goes through the full remove + create + start path
on the freshly-allocated port. The token-mismatch test still passes;
adds a new test pinning the stop-before-realloc behaviour.
…fresh install

Starting the gateway via the new RuntimeControlPanel "Start" CTA goes
through runtime.executeAction({type:'start'}) directly, bypassing
OpenClawService.tryAutoStart and its ensureStateEnvFile() seeding step.
On a freshly-wiped .browseros-dev that left nerdctl create failing with
"failed to open env file .../.openclaw/.env: no such file or directory".

Seed the file (empty, mode 0600) inside buildContainerSpec so the
runtime is self-sufficient. Service callers continue to work — their
ensureStateEnvFile is now an idempotent no-op once the file exists.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant