Skip to content

[fix]: make file upload elements more explicit in page snapshot#1975

Open
seanmcguire12 wants to merge 4 commits intomainfrom
seanmcguire/stg-934-include-file-upload-elements-in-llm-context
Open

[fix]: make file upload elements more explicit in page snapshot#1975
seanmcguire12 wants to merge 4 commits intomainfrom
seanmcguire/stg-934-include-file-upload-elements-in-llm-context

Conversation

@seanmcguire12
Copy link
Copy Markdown
Member

@seanmcguire12 seanmcguire12 commented Apr 8, 2026

why

  • file upload elements sometimes have a 'button' AX role
  • this causes issues when users prompt .observe() to find "file upload elements", since the model sees them as a 'button'
  • addresses bug: can't observe input type file #972

what changed

  • this PR preserves the semantics of file upload elements by using the actual DOM tag name and 'type'
  • eg, instead of [0-12] button: Choose File, we now use [0-12] input, file: Choose File

test plan

  • added a unit test for the replacement logic
  • added an observe() eval which checks that hidden file upload elements are correctly found by the LLM

Summary by cubic

Make file upload inputs explicit in the page snapshot so .observe() can reliably find them. Addresses Linear STG-934 by preventing file inputs from being misclassified as buttons.

  • Bug Fixes
    • Enrich DOM tag names to include input type (e.g., input, file) and use them in DOM maps.
    • Force file inputs to appear as input, file in the a11y outline instead of AX button.
    • Add unit test and fix/register the observe_file_uploads eval in evals.config.json.
    • Publish patch for @browserbasehq/stagehand.

Written for commit 2fad86f. Summary will update on new commits. Review in cubic

@changeset-bot
Copy link
Copy Markdown

changeset-bot bot commented Apr 8, 2026

🦋 Changeset detected

Latest commit: 2fad86f

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 4 packages
Name Type
@browserbasehq/stagehand Patch
@browserbasehq/stagehand-evals Patch
@browserbasehq/stagehand-server-v3 Patch
@browserbasehq/stagehand-server-v4 Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@seanmcguire12 seanmcguire12 added the observe These changes pertain to the observe function label Apr 8, 2026
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 6 files

Confidence score: 5/5

  • Automated review surfaced no issues in the provided summaries.
  • No files require special attention.
Architecture diagram
sequenceDiagram
    participant Client as Eval / User Script
    participant Core as Stagehand Core
    participant CDP as Browser (CDP)
    participant DOM as domTree.ts
    participant A11y as a11yTree.ts
    participant LLM as AI Model

    Note over Client, LLM: Request Flow for .observe()

    Client->>Core: observe("find file upload")
    Core->>CDP: DOM.getFlattenedDocument
    CDP-->>Core: Raw DOM Nodes (attributes array)
    Core->>CDP: Accessibility.getFullAXTree
    CDP-->>Core: Raw AX Nodes (roles, names)

    rect rgb(240, 240, 240)
    Note over Core, DOM: DOM Processing
    Core->>DOM: domMapsForSession(nodes)
    loop For each Node
        DOM->>DOM: NEW: enrichedTagName(node)
        opt tag is "input"
            DOM->>DOM: NEW: Extract "type" from attributes array
            Note right of DOM: e.g., result is "input, file"
        end
        DOM->>DOM: Map backendNodeId to enriched tag
    end
    DOM-->>Core: tagNameMap
    end

    rect rgb(240, 240, 240)
    Note over Core, A11y: Accessibility Tree Decoration
    Core->>A11y: decorateRoles(axNodes, tagNameMap)
    loop For each AX Node
        A11y->>A11y: Look up tag in tagNameMap
        alt CHANGED: Tag is "input, file"
            A11y->>A11y: CHANGED: Override AX role to "input, file"
            Note right of A11y: Prevents Chrome's default "button" role
        else Standard behavior
            A11y->>A11y: Use standard AX role
        end
    end
    A11y-->>Core: Enriched A11y Snapshot
    end

    Core->>LLM: Send Snapshot (includes "input, file")
    Note over LLM: Model identifies element correctly<br/>instead of seeing a generic button
    LLM-->>Core: Observations (selector for file input)
    Core-->>Client: Return observations
Loading

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

observe These changes pertain to the observe function

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant