feat: add GUI click-model browser mode by neel04 · Pull Request #910 · browseros-ai/BrowserOS

Neel Gupta (neel04) · 2026-05-01T16:39:46Z

Summary

Add GUI click-only browser mode backed by Molmo point prediction for click/hover coordinates.
Restrict agent browser tools to GUI-safe actions, page navigation, screenshots, scroll, and focused text entry.
Add GUI click logging, coordinate scaling, ACL checks, and configurable Molmo endpoint settings.
Expose eval one-off task overrides and set AGI SDK suite workers to 20.

Run AGI SDK Eval

Use Fireworks for Kimi:

cd packages/browseros-agent/apps/eval

FIREWORKS_API_KEY=... \
BROWSEROS_EVAL_PYTHON=.venv/bin/python \
bun run eval run --config configs/legacy/agisdk-real.json

This runs the full agisdk-real suite with 20 workers using accounts/fireworks/models/kimi-k2p5 on Fireworks.

github-actions · 2026-05-01T16:42:44Z

❌ Tests failed — 3/1159 failed

Suite	Passed	Failed	Skipped
✅ `agent`	76/76	0	0
✅ `build`	9/9	0	0
❌ `eval`	101/103	2	0
❌ `server-agent`	263/264	1	0
✅ `server-api`	202/202	0	0
✅ `server-browser`	4/4	0	0
✅ `server-integration`	9/10	0	1
✅ `server-lib`	161/161	0	0
✅ `server-root`	60/63	0	3
✅ `server-skills`	31/31	0	0
✅ `server-tools`	236/236	0	0

Failed tests

eval — adaptEvalConfigFile > adapts BrowserOS AGI SDK comparison configs
eval — EvalSuiteSchema > validates the daily AGISDK 10-task suite
server-agent — mode-aware framing > GUI click-only mode exposes only GUI click and page-opening guidance

View workflow run

greptile-apps · 2026-05-01T16:44:42Z

Greptile Summary

This PR replaces the element-ID-based click/hover tools with a Molmo visual-model backend that resolves coordinates from a natural-language prompt, adds a type_text tool for typing into the focused element, and gates the whole flow behind a GUI_CLICK_ONLY_MODE flag that is currently hardcoded to true. The eval CLI gains --query/--start-url/--output-dir pass-through flags and the AGI SDK worker count is bumped to 20.

P1 — hardcoded ephemeral endpoint: MOLMO_POINT_ENDPOINT is baked to a specific RunPod proxy URL; when the pod restarts every click/hover will hang for 60 s before throwing. It must be read from an env var.
P1 — no kill switch: GUI_CLICK_ONLY_MODE = true is a compile-time constant with no env-var override; it silently makes the chatMode tool-filter branch dead code and removes all element-based input from every agent session.
P2 — excessive logging: every click/hover emits three logger.info calls with large payloads, violating the project's debug-logging rule.

Confidence Score: 3/5

Not safe to merge as-is — the hardcoded ephemeral RunPod URL will break production click/hover when the pod restarts, and the always-on mode flag silently disables chatMode restrictions.

Two independent P1 issues: an ephemeral infrastructure endpoint with no env-var escape hatch, and a compile-time constant that globally overrides all agent modes with no kill switch.

molmo-point-config.ts (hardcoded endpoint), gui-click-only.ts (hardcoded mode flag), ai-sdk-agent.ts (dead chatMode branch)

Important Files Changed

Filename	Overview
packages/browseros-agent/apps/server/src/tools/molmo-point-config.ts	New config file with hardcoded ephemeral RunPod endpoint and no env-var override — will break when the pod restarts
packages/browseros-agent/apps/server/src/agent/gui-click-only.ts	New mode file with GUI_CLICK_ONLY_MODE hardcoded to true — no env/config kill switch, affects all agent sessions globally
packages/browseros-agent/apps/server/src/agent/ai-sdk-agent.ts	Wires GUI click-only mode into agent setup; chatMode tool-filter branch is now unreachable dead code because GUI_CLICK_ONLY_MODE is always true
packages/browseros-agent/apps/server/src/tools/input.ts	click/hover globally replaced with GUI-prompt versions; type_text added; scroll element param removed; excessive per-action debug logging
packages/browseros-agent/apps/server/src/tools/molmo-point-client.ts	New Molmo HTTP client with good error handling, response truncation, and PNG dimension parsing; verbose info-level logging on every request/response cycle
packages/browseros-agent/apps/server/src/tools/gui-click-resolver.ts	New coordinate resolver: takes screenshot, queries Molmo, scales point from image to viewport space; scaling logic is correct and validated by tests
packages/browseros-agent/apps/server/src/tools/acl/acl-guard.ts	Adds type_text to guarded tools and resolves focused element for ACL checks; integrates cleanly with framework-level checkAcl
packages/browseros-agent/apps/server/src/browser/browser.ts	Adds resolveFocusedElement, viewportSize, and typeText helpers; fixes scroll center calculation to use cssVisualViewport
packages/browseros-agent/apps/server/src/agent/prompt.ts	Adds guiClickOnly prompt sections for all prompt builders; cleanly isolated with early-return guards
packages/browseros-agent/apps/eval/src/cli/args.ts	Adds --query, --start-url, --output-dir CLI args to thread through to existing RunEvalOptions fields

Prompt To Fix All With AI

Fix the following 5 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 5
packages/browseros-agent/apps/server/src/tools/molmo-point-config.ts:1-5
**Hardcoded ephemeral RunPod endpoint**

`MOLMO_POINT_ENDPOINT` is set to a specific RunPod proxy URL. RunPod proxy hostnames are ephemeral — they become invalid as soon as the pod is stopped or restarted. When that happens every `click` and `hover` call will hang for the full `MOLMO_POINT_TIMEOUT_MS` (60 s) before throwing, making the browser completely unusable. There is no environment variable or config override to change it at runtime.

### Issue 2 of 5
packages/browseros-agent/apps/server/src/tools/molmo-point-config.ts:1-2
Read the endpoint from an environment variable so the RunPod pod can be rotated without a code deploy. The hardcoded URL will become invalid the moment the pod restarts.

```suggestion
export const MOLMO_POINT_ENDPOINT =
  process.env.MOLMO_POINT_ENDPOINT ??
  'https://gseb9k0a2n2vhl-8000.proxy.runpod.net/'
```

### Issue 3 of 5
packages/browseros-agent/apps/server/src/agent/gui-click-only.ts:1
**`GUI_CLICK_ONLY_MODE` is hardcoded `true` with no runtime kill switch**

`GUI_CLICK_ONLY_MODE = true` unconditionally puts every agent session into GUI-click-only mode. There is no environment variable, config flag, or per-session toggle to disable it. The consequence is that the existing chat-mode tool restriction in `ai-sdk-agent.ts` is now dead code (the `else if (config.resolvedConfig.chatMode)` branch can never execute). The old element-based `click` and `hover` schemas are also permanently gone from the registry, so any caller that depended on `element` IDs will silently get wrong behaviour. A guard like `process.env.GUI_CLICK_ONLY_MODE === 'true'` would give an operational kill switch without a code deploy.

### Issue 4 of 5
packages/browseros-agent/apps/server/src/agent/ai-sdk-agent.ts:115-133
**Dead `chatMode` tool-filter branch**

Because `GUI_CLICK_ONLY_MODE` is always `true`, the `else if (config.resolvedConfig.chatMode)` branch filtering tools by `CHAT_MODE_ALLOWED_TOOLS` can never be reached. The chat-mode tool restriction silently no longer applies, which may expose write tools in chat sessions. This violates the project rule to remove dead code.

### Issue 5 of 5
packages/browseros-agent/apps/server/src/tools/input.ts:68-115
**Excessive debug logging per project rule**

Every `click` and `hover` call emits three `logger.info` calls (`'GUI click dispatching'`, `'GUI click dispatched'`, and the same for hover), each carrying large structured payloads including screenshot dimensions, hit-element properties, and model responses. The project rule explicitly asks to remove excessive logging statements after debugging. This pattern also appears in `molmo-point-client.ts` (`'Molmo point request started'`, `'Molmo point response received'`). The request-started and dispatched/dispatching pairs should be removed or gated behind a `DEBUG`-level guard.

_{Reviews (1): Last reviewed commit: "fix: evals & timeoue" | Re-trigger Greptile}

greptile-apps · 2026-05-01T16:44:46Z

+export const MOLMO_POINT_ENDPOINT =
+  'https://gseb9k0a2n2vhl-8000.proxy.runpod.net/'
+
+export const MOLMO_POINT_MAX_NEW_TOKENS = 64
+export const MOLMO_POINT_TIMEOUT_MS = 60_000


Hardcoded ephemeral RunPod endpoint

MOLMO_POINT_ENDPOINT is set to a specific RunPod proxy URL. RunPod proxy hostnames are ephemeral — they become invalid as soon as the pod is stopped or restarted. When that happens every click and hover call will hang for the full MOLMO_POINT_TIMEOUT_MS (60 s) before throwing, making the browser completely unusable. There is no environment variable or config override to change it at runtime.

Prompt To Fix With AI

This is a comment left during a code review. Path: packages/browseros-agent/apps/server/src/tools/molmo-point-config.ts Line: 1-5 Comment: **Hardcoded ephemeral RunPod endpoint** `MOLMO_POINT_ENDPOINT` is set to a specific RunPod proxy URL. RunPod proxy hostnames are ephemeral — they become invalid as soon as the pod is stopped or restarted. When that happens every `click` and `hover` call will hang for the full `MOLMO_POINT_TIMEOUT_MS` (60 s) before throwing, making the browser completely unusable. There is no environment variable or config override to change it at runtime. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-05-01T16:44:47Z

+export const MOLMO_POINT_ENDPOINT =
+  'https://gseb9k0a2n2vhl-8000.proxy.runpod.net/'


Read the endpoint from an environment variable so the RunPod pod can be rotated without a code deploy. The hardcoded URL will become invalid the moment the pod restarts.

Suggested change

export const MOLMO_POINT_ENDPOINT =

'https://gseb9k0a2n2vhl-8000.proxy.runpod.net/'

export const MOLMO_POINT_ENDPOINT =

process.env.MOLMO_POINT_ENDPOINT ??

'https://gseb9k0a2n2vhl-8000.proxy.runpod.net/'

Prompt To Fix With AI

This is a comment left during a code review. Path: packages/browseros-agent/apps/server/src/tools/molmo-point-config.ts Line: 1-2 Comment: Read the endpoint from an environment variable so the RunPod pod can be rotated without a code deploy. The hardcoded URL will become invalid the moment the pod restarts. ```suggestion export const MOLMO_POINT_ENDPOINT = process.env.MOLMO_POINT_ENDPOINT ?? 'https://gseb9k0a2n2vhl-8000.proxy.runpod.net/' ``` How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-05-01T16:44:48Z

@@ -0,0 +1,18 @@
+export const GUI_CLICK_ONLY_MODE = true


GUI_CLICK_ONLY_MODE is hardcoded true with no runtime kill switch

GUI_CLICK_ONLY_MODE = true unconditionally puts every agent session into GUI-click-only mode. There is no environment variable, config flag, or per-session toggle to disable it. The consequence is that the existing chat-mode tool restriction in ai-sdk-agent.ts is now dead code (the else if (config.resolvedConfig.chatMode) branch can never execute). The old element-based click and hover schemas are also permanently gone from the registry, so any caller that depended on element IDs will silently get wrong behaviour. A guard like process.env.GUI_CLICK_ONLY_MODE === 'true' would give an operational kill switch without a code deploy.

Prompt To Fix With AI

This is a comment left during a code review. Path: packages/browseros-agent/apps/server/src/agent/gui-click-only.ts Line: 1 Comment: **`GUI_CLICK_ONLY_MODE` is hardcoded `true` with no runtime kill switch** `GUI_CLICK_ONLY_MODE = true` unconditionally puts every agent session into GUI-click-only mode. There is no environment variable, config flag, or per-session toggle to disable it. The consequence is that the existing chat-mode tool restriction in `ai-sdk-agent.ts` is now dead code (the `else if (config.resolvedConfig.chatMode)` branch can never execute). The old element-based `click` and `hover` schemas are also permanently gone from the registry, so any caller that depended on `element` IDs will silently get wrong behaviour. A guard like `process.env.GUI_CLICK_ONLY_MODE === 'true'` would give an operational kill switch without a code deploy. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-05-01T16:44:49Z

      toolContext,
      config.resolvedConfig.toolApprovalConfig,
    )
-    const browserTools = config.resolvedConfig.chatMode
-      ? Object.fromEntries(
-          Object.entries(allBrowserTools).filter(([name]) =>
-            CHAT_MODE_ALLOWED_TOOLS.has(name),
-          ),
-        )
-      : allBrowserTools
+    let browserTools = allBrowserTools
+    if (GUI_CLICK_ONLY_MODE) {
+      browserTools = Object.fromEntries(
+        Object.entries(allBrowserTools).filter(([name]) =>
+          isGuiClickOnlyBrowserToolAllowed(name),
+        ),
+      )
+    } else if (config.resolvedConfig.chatMode) {
+      browserTools = Object.fromEntries(
+        Object.entries(allBrowserTools).filter(([name]) =>
+          CHAT_MODE_ALLOWED_TOOLS.has(name),
+        ),
+      )
+    }
    if (config.resolvedConfig.chatMode) {
      logger.info('Chat mode enabled, restricting to read-only browser tools', {


Dead chatMode tool-filter branch

Because GUI_CLICK_ONLY_MODE is always true, the else if (config.resolvedConfig.chatMode) branch filtering tools by CHAT_MODE_ALLOWED_TOOLS can never be reached. The chat-mode tool restriction silently no longer applies, which may expose write tools in chat sessions. This violates the project rule to remove dead code.

Rule Used: Remove unused/dead code rather than leaving it in ... (source)

Learned From
browseros-ai/BrowserOS-agent#126

Prompt To Fix With AI

This is a comment left during a code review. Path: packages/browseros-agent/apps/server/src/agent/ai-sdk-agent.ts Line: 115-133 Comment: **Dead `chatMode` tool-filter branch** Because `GUI_CLICK_ONLY_MODE` is always `true`, the `else if (config.resolvedConfig.chatMode)` branch filtering tools by `CHAT_MODE_ALLOWED_TOOLS` can never be reached. The chat-mode tool restriction silently no longer applies, which may expose write tools in chat sessions. This violates the project rule to remove dead code. **Rule Used:** Remove unused/dead code rather than leaving it in ... ([source](https://app.greptile.com/review/custom-context?memory=9b045db4-2630-428c-95b7-ccf048d34547)) **Learned From** [browseros-ai/BrowserOS-agent#126](https://github.com/browseros-ai/BrowserOS-agent/pull/126) How can I resolve this? If you propose a fix, please make it concise.

fix: extend molmo point timeout feat: modal endpoint

github-actions Bot added the Feature label May 1, 2026

greptile-apps Bot reviewed May 1, 2026

View reviewed changes

Neel Gupta (neel04) force-pushed the feat/click-model branch 4 times, most recently from 8a21b97 to 4955b48 Compare May 5, 2026 14:52

Neel Gupta (neel04) added 12 commits May 6, 2026 13:59

feat(agent): Initial click model integration

3bc399f

feat: added screenshot for agent perception

940ef6d

fix: evals & timeout

3c66cba

fix: extend molmo point timeout feat: modal endpoint

feat: enabled reasoning and added configs

b444afa

feat(eval): explain AGISDK state-diff failures

daeb33f

feat(eval): retry provider errors before failing

fc2f669

feat(server): log agent finish reasons

0816e21

feat: return GUI click hit element feedback

a62f977

fix: tune opus eval provider configs

0eddab8

fix(eval): route opus via openrouter bedrock

9031062

fix: avoid wheel dispatch for page scroll

758445f

fix(eval): continue after empty tool-result stop

283f76e

Neel Gupta (neel04) force-pushed the feat/click-model branch from 6dcd310 to 283f76e Compare May 7, 2026 11:35

fix(agent): improve gui eval reliability

0f7e209

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add GUI click-model browser mode#910

feat: add GUI click-model browser mode#910
Neel Gupta (neel04) wants to merge 13 commits intodevfrom
feat/click-model

Neel Gupta (neel04) commented May 1, 2026

Uh oh!

github-actions Bot commented May 1, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented May 1, 2026

Uh oh!

greptile-apps Bot May 1, 2026

Uh oh!

greptile-apps Bot May 1, 2026

Uh oh!

greptile-apps Bot May 1, 2026

Uh oh!

greptile-apps Bot May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		export const MOLMO_POINT_ENDPOINT =
		'https://gseb9k0a2n2vhl-8000.proxy.runpod.net/'

Conversation

Neel Gupta (neel04) commented May 1, 2026

Summary

Run AGI SDK Eval

Uh oh!

github-actions Bot commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

❌ Tests failed — 3/1159 failed

Uh oh!

greptile-apps Bot commented May 1, 2026

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Uh oh!

greptile-apps Bot May 1, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 1, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 1, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions Bot commented May 1, 2026 •

edited

Loading