Skip to content

[codex] add native agent file uploads#1927

Draft
shrey150 wants to merge 2 commits intomainfrom
codex/native-agent-upload
Draft

[codex] add native agent file uploads#1927
shrey150 wants to merge 2 commits intomainfrom
codex/native-agent-upload

Conversation

@shrey150
Copy link
Copy Markdown
Contributor

@shrey150 shrey150 commented Mar 31, 2026

Linear: STG-1732

Summary

  • add a first-class upload tool for Stagehand agents
  • enable native file uploads in DOM, hybrid, and CUA agent modes
  • record upload steps in agent replay/action mapping without exposing full file paths
  • document native upload usage across the agent docs and README

Why

Users can already upload files through low-level Playwright APIs, but agent mode could not handle file uploads natively in the middle of a multi-step workflow. This change closes that gap by teaching the agent to target the real file input and route uploads through the existing Stagehand upload path.

Testing

  • pnpm --dir /Users/shrey/Developer/stagehand-native-agent-upload --filter @browserbasehq/stagehand run gen-version
  • pnpm --dir /Users/shrey/Developer/stagehand-native-agent-upload --filter @browserbasehq/stagehand run build-dom-scripts
  • pnpm --dir /Users/shrey/Developer/stagehand-native-agent-upload --filter @browserbasehq/stagehand run typecheck
  • pnpm --dir /Users/shrey/Developer/stagehand-native-agent-upload --filter @browserbasehq/stagehand run build:esm
  • pnpm --dir /Users/shrey/Developer/stagehand-native-agent-upload --filter @browserbasehq/stagehand run test:core -- packages/core/dist/esm/tests/unit/agent-upload-tools.test.js packages/core/dist/esm/tests/unit/agent-upload-tool-execute.test.js packages/core/dist/esm/tests/unit/agent-system-prompt-variables.test.js

Notes

  • existing agent execution integration tests still need provider API keys in this environment
  • there are separate local template-repo updates that were not included in this Stagehand PR

Summary by cubic

Adds native file uploads to Stagehand agents with a dedicated upload tool across DOM, hybrid, and CUA, and adds a guard that forces uploads to use this tool instead of click, type, act, or form tools for better reliability and safety.

  • New Features
    • New upload tool targets the real <input type="file"> via observe; supports %variables%, per-tool timeouts, and returns file basenames; records an upload step for agent replay and maps results to actions.
    • Tool selection guard detects upload intent or local file paths and blocks click, type, act, keys(type), fillForm, and fillFormVision with a clear error to use upload instead.
    • System prompt updates prefer upload for local paths and explicitly forbid other tools for file inputs; variables section includes upload; CUA system prompt includes the same note.
    • CUA: adds a built-in upload tool (user tools can override) and wires it into CUA handlers.
    • Types, tool registration, and docs updated to include upload across modes; tests added for guard behavior, upload execution, hybrid prompt content, and an end-to-end upload flow.

Written for commit 5d7ab5f. Summary will update on new commits. Review in cubic

@changeset-bot
Copy link
Copy Markdown

changeset-bot bot commented Mar 31, 2026

⚠️ No Changeset found

Latest commit: 5d7ab5f

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@mintlify
Copy link
Copy Markdown
Contributor

mintlify bot commented Mar 31, 2026

Preview deployment for your docs. Learn more about Mintlify Previews.

Project Status Preview Updated (UTC)
stagehand 🟢 Ready View Preview Mar 31, 2026, 1:50 AM

@shrey150 shrey150 marked this pull request as ready for review March 31, 2026 03:42
@shrey150 shrey150 requested review from tkattkat and removed request for tkattkat March 31, 2026 03:42
@shrey150 shrey150 marked this pull request as draft March 31, 2026 03:42
name: "click",
description:
"Click on an element (PREFERRED - more reliable when element is visible in viewport)",
"Click on an element (PREFERRED - more reliable when element is visible in viewport). Never use this for file upload buttons or file inputs.",
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These "never use this for file upload" annotations to tool calls seems overly prescriptive and not the right pattern to add

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 issues found across 25 files

Confidence score: 2/5

  • There is a high-confidence, high-severity issue in packages/core/lib/v3/agent/tools/upload.ts: raw exception messages are returned without sanitization, which can expose resolved variables or local file paths in tool output.
  • packages/core/lib/v3/agent/tools/upload.ts also uses path.basename in an OS-dependent way, creating a concrete risk of leaking full Windows paths when executed on non-Windows runtimes.
  • Both packages/core/lib/v3/agent/tools/type.ts and packages/core/lib/v3/agent/tools/fillFormVision.ts validate only raw values, so %variable% substitutions can bypass upload/file-path guards and allow local file paths to be typed.
  • Pay close attention to packages/core/lib/v3/agent/tools/upload.ts, packages/core/lib/v3/agent/tools/type.ts, and packages/core/lib/v3/agent/tools/fillFormVision.ts - these paths contain the main leakage and guard-bypass risks that should be fixed before merge.
Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/core/lib/v3/agent/tools/upload.ts">

<violation number="1" location="packages/core/lib/v3/agent/tools/upload.ts:13">
P2: `path.basename` is OS-dependent here and can leak full Windows paths on non-Windows runtimes.</violation>

<violation number="2" location="packages/core/lib/v3/agent/tools/upload.ts:116">
P1: Custom agent: **Exception and error message sanitization**

Do not expose raw exception messages in tool output; return a sanitized error string to avoid leaking resolved variables or local file paths.</violation>
</file>

<file name="packages/core/lib/v3/agent/tools/fillFormVision.ts">

<violation number="1" location="packages/core/lib/v3/agent/tools/fillFormVision.ts:66">
P2: The upload guard only inspects the raw field values; any `%variable%` that expands to a local file path will bypass the guard even though the substituted value is what gets typed. Run the guard against the substituted value so file uploads can’t slip through fillFormVision.</violation>
</file>

<file name="packages/core/lib/v3/agent/tools/type.ts">

<violation number="1" location="packages/core/lib/v3/agent/tools/type.ts:42">
P2: Run the file-upload guard on the substituted text; `%variable%` file paths currently bypass the guard and allow the type tool to enter local file paths.</violation>
</file>
Architecture diagram
sequenceDiagram
    participant LLM as AI Model / LLM
    participant Agent as Agent Handler (DOM/Hybrid/CUA)
    participant Tools as Standard Tools (click, type, act, etc.)
    participant Guard as NEW: File Upload Guard
    participant Upload as NEW: Upload Tool
    participant V3 as V3 Service (observe)
    participant Page as Browser Page (Playwright)
    participant Cache as Agent Cache / Replay

    Note over LLM, Cache: Agent Execution Flow

    LLM->>Agent: Call tool (e.g., click, type, fillForm)
    Agent->>Tools: execute(params)
    Tools->>Guard: NEW: getFileUploadGuardError(text/action)
    
    alt NEW: Intent or local path detected in standard tool
        Guard-->>Tools: Return "Use upload tool" error
        Tools-->>Agent: Fail with Guard Error
        Agent-->>LLM: Error: Request 'upload' tool instead
    else No upload intent
        Note right of Guard: Normal tool execution
    end

    Note over LLM, Cache: Native Upload Flow

    LLM->>Agent: NEW: Call 'upload' tool
    Agent->>Upload: execute(target, paths)
    
    Upload->>Upload: Substitute %variables% in paths
    
    Upload->>V3: NEW: observe("find actual <input type=file> for target")
    V3-->>Upload: Return selector for real input element
    
    alt Selector found
        Upload->>Page: NEW: setInputFiles(selector, paths)
        
        opt Replay Active
            Upload->>Cache: NEW: recordAgentReplayStep(type: "upload")
        end
        
        Upload-->>Agent: Success (summarized basenames)
    else Selector not found
        Upload-->>Agent: Fail (helpful instruction error)
    end
    
    Agent-->>LLM: Tool Result

    Note over Cache, Page: Replay Flow (Later)
    Cache->>Page: NEW: replayAgentUploadStep()
    Page->>Page: setInputFiles(cached selector, paths)
Loading

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

}
return {
success: false,
error: error instanceof Error ? error.message : String(error),
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Custom agent: Exception and error message sanitization

Do not expose raw exception messages in tool output; return a sanitized error string to avoid leaking resolved variables or local file paths.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/core/lib/v3/agent/tools/upload.ts, line 116:

<comment>Do not expose raw exception messages in tool output; return a sanitized error string to avoid leaking resolved variables or local file paths.</comment>

<file context>
@@ -0,0 +1,121 @@
+        }
+        return {
+          success: false,
+          error: error instanceof Error ? error.message : String(error),
+        };
+      }
</file context>
Fix with Cubic

Comment on lines +66 to +68
const fileUploadGuardError = getFileUploadGuardError(
...fields.flatMap((field) => [field.action, field.value]),
);
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: The upload guard only inspects the raw field values; any %variable% that expands to a local file path will bypass the guard even though the substituted value is what gets typed. Run the guard against the substituted value so file uploads can’t slip through fillFormVision.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/core/lib/v3/agent/tools/fillFormVision.ts, line 66:

<comment>The upload guard only inspects the raw field values; any `%variable%` that expands to a local file path will bypass the guard even though the substituted value is what gets typed. Run the guard against the substituted value so file uploads can’t slip through fillFormVision.</comment>

<file context>
@@ -62,6 +63,16 @@ MANDATORY USE CASES (always use fillFormVision for these):
     }),
     execute: async ({ fields }): Promise<FillFormVisionToolResult> => {
       try {
+        const fileUploadGuardError = getFileUploadGuardError(
+          ...fields.flatMap((field) => [field.action, field.value]),
+        );
</file context>
Suggested change
const fileUploadGuardError = getFileUploadGuardError(
...fields.flatMap((field) => [field.action, field.value]),
);
const fileUploadGuardError = getFileUploadGuardError(
...fields.flatMap((field) => [
field.action,
substituteVariables(field.value, variables),
]),
);
Fix with Cubic

`Find the actual <input type="file"> element for ${target}. Return the real upload input element itself, not a visible button, wrapper, label, or drag-and-drop container.`;

function summarizePaths(paths: string[]): string[] {
return paths.map((filePath) => path.basename(filePath));
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: path.basename is OS-dependent here and can leak full Windows paths on non-Windows runtimes.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/core/lib/v3/agent/tools/upload.ts, line 13:

<comment>`path.basename` is OS-dependent here and can leak full Windows paths on non-Windows runtimes.</comment>

<file context>
@@ -0,0 +1,121 @@
+  `Find the actual <input type="file"> element for ${target}. Return the real upload input element itself, not a visible button, wrapper, label, or drag-and-drop container.`;
+
+function summarizePaths(paths: string[]): string[] {
+  return paths.map((filePath) => path.basename(filePath));
+}
+
</file context>
Fix with Cubic

text,
}): Promise<TypeToolResult> => {
try {
const fileUploadGuardError = getFileUploadGuardError(describe, text);
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Run the file-upload guard on the substituted text; %variable% file paths currently bypass the guard and allow the type tool to enter local file paths.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/core/lib/v3/agent/tools/type.ts, line 42:

<comment>Run the file-upload guard on the substituted text; `%variable%` file paths currently bypass the guard and allow the type tool to enter local file paths.</comment>

<file context>
@@ -38,6 +39,14 @@ export const typeTool = (v3: V3, provider?: string, variables?: Variables) => {
       text,
     }): Promise<TypeToolResult> => {
       try {
+        const fileUploadGuardError = getFileUploadGuardError(describe, text);
+        if (fileUploadGuardError) {
+          return {
</file context>
Suggested change
const fileUploadGuardError = getFileUploadGuardError(describe, text);
const fileUploadGuardError = getFileUploadGuardError(
describe,
substituteVariables(text, variables),
);
Fix with Cubic

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant