-
Notifications
You must be signed in to change notification settings - Fork 240
fix: improve browser tool instructions and element resolution #351
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1 +1 @@ | ||
| Browser automation tool. Launch a headless Chrome browser, navigate pages, interact with elements, take screenshots, and extract page content. Workflow: launch → navigate → snapshot (get element refs) → act (click/type by ref) → screenshot. Element refs like "e1", "e2" are assigned during snapshot and used in act calls. | ||
| Browser automation tool. Workflow: launch → navigate → snapshot → act → close. The `act` action REQUIRES `act_kind` (click, type, press_key, hover, scroll_into_view, focus) and `element_ref` from the last snapshot. Example: {"action": "act", "act_kind": "click", "element_ref": "e3"}. Always snapshot before acting — refs reset on navigation. Use `navigate` to go to URLs (not `open`, which creates new tabs). | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -93,24 +93,60 @@ Run a subprocess with specific arguments. Use this for programs that need struct | |||||||||
|
|
||||||||||
| Automate a headless Chrome browser. Use this for web scraping, testing web interfaces, filling out forms, or any task requiring browser interaction. | ||||||||||
|
|
||||||||||
| **Workflow:** | ||||||||||
|
|
||||||||||
| 1. `launch` — Start the browser | ||||||||||
| 2. `navigate` — Go to a URL | ||||||||||
| 3. `snapshot` — Get the page's accessibility tree with element refs (e1, e2, e3...) | ||||||||||
| 4. `act` — Interact with elements by ref: `click`, `type`, `press_key`, `hover`, `scroll_into_view`, `focus` | ||||||||||
| 5. `screenshot` — Capture the page or a specific element | ||||||||||
| **Workflow:** launch → navigate → snapshot → act → (repeat snapshot/act as needed) → close | ||||||||||
|
|
||||||||||
| **Actions:** | ||||||||||
|
|
||||||||||
| | Action | Required params | Description | | ||||||||||
| |--------|----------------|-------------| | ||||||||||
| | `launch` | — | Start or reconnect to the browser. Always call first. | | ||||||||||
| | `navigate` | `url` | Go to a URL in the active tab. | | ||||||||||
| | `open` | `url` (optional) | Open a **new** tab. Don't use this to navigate — use `navigate` instead. | | ||||||||||
| | `tabs` | — | List all open tabs. | | ||||||||||
| | `focus` | `target_id` | Switch to a tab by target ID. | | ||||||||||
| | `close_tab` | `target_id` (optional) | Close a tab (active tab if omitted). | | ||||||||||
| | `snapshot` | — | Get the accessibility tree with interactive element refs. | | ||||||||||
| | `act` | `act_kind`, `element_ref` | Interact with an element. **`act_kind` is mandatory.** | | ||||||||||
| | `screenshot` | `full_page` (optional) | Capture the viewport (or full page). | | ||||||||||
|
Comment on lines
+109
to
+110
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Minor nit: the quick-reference table makes
Suggested change
|
||||||||||
| | `evaluate` | `script` | Run JavaScript in the page. Last resort — prefer snapshot+act. | | ||||||||||
| | `content` | — | Get page HTML (large, use sparingly). | | ||||||||||
| {%- if browser_persist_session %} | ||||||||||
| 6. `close` — Detach from the browser when done (tabs and session are preserved for the next worker) | ||||||||||
| | `close` | — | Detach from the browser. Tabs and session are preserved for the next worker. | | ||||||||||
| {%- else %} | ||||||||||
| 6. `close` — Shut down the browser when done | ||||||||||
| | `close` | — | Shut down the browser when done. | | ||||||||||
| {%- endif %} | ||||||||||
|
|
||||||||||
| **Multi-tab support:** Use `open` to create new tabs, `tabs` to list them, `focus` to switch between them, `close_tab` to close one. | ||||||||||
| **The `act` action — IMPORTANT:** | ||||||||||
|
|
||||||||||
| The `act_kind` parameter is **always required** when `action` is `act`. Valid values: | ||||||||||
|
|
||||||||||
| - `click` — Click the element. Requires: `element_ref`. | ||||||||||
| - `type` — Type text into the element. Requires: `element_ref`, `text`. | ||||||||||
| - `press_key` — Press a key (e.g., "Enter", "Tab", "Escape"). Requires: `key`. Optional: `element_ref`. | ||||||||||
| - `hover` — Hover over the element. Requires: `element_ref`. | ||||||||||
| - `scroll_into_view` — Scroll element into viewport. Requires: `element_ref`. | ||||||||||
| - `focus` — Focus the element. Requires: `element_ref`. | ||||||||||
|
|
||||||||||
| Examples: | ||||||||||
| ```json | ||||||||||
| {"action": "act", "act_kind": "click", "element_ref": "e3"} | ||||||||||
| {"action": "act", "act_kind": "type", "element_ref": "e5", "text": "hello@example.com"} | ||||||||||
| {"action": "act", "act_kind": "press_key", "key": "Enter"} | ||||||||||
| ``` | ||||||||||
|
|
||||||||||
| **Element refs:** | ||||||||||
|
|
||||||||||
| - Refs like `e0`, `e1`, `e2` are assigned by `snapshot` and reset on each snapshot or navigation. | ||||||||||
| - Always run `snapshot` before using `act` — stale refs will fail. | ||||||||||
| - If an `act` call fails with "Could not find node", run `snapshot` again to get fresh refs, then retry with the new ref. | ||||||||||
| - Don't pass `url` or `text` when you mean `act_kind` — these are different parameters. | ||||||||||
|
|
||||||||||
| **Element refs** are assigned during `snapshot` and look like "e1", "e2". Always snapshot before interacting — refs reset on each snapshot or navigation. | ||||||||||
| **Common mistakes to avoid:** | ||||||||||
|
|
||||||||||
| **Additional actions:** `content` (get page HTML), `evaluate` (run JavaScript, if enabled in config). | ||||||||||
| - Calling `act` without `act_kind` — this will always error. Every `act` call needs an `act_kind`. | ||||||||||
| - Using `open` to navigate — `open` creates a new tab. Use `navigate` to go to a URL in the current tab. | ||||||||||
| - Using `evaluate` to click buttons — use `snapshot` + `act` with `act_kind: "click"` instead. Only use `evaluate` when the accessibility tree doesn't expose what you need. | ||||||||||
| - Retrying the exact same failed call — read the error, fix the parameters, then retry. | ||||||||||
|
|
||||||||||
| ### secret_set | ||||||||||
|
|
||||||||||
|
|
||||||||||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small consistency thing with the worker prompt:
press_keydoesn’t require anelement_ref, so sayingactrequireselement_refcan push workers into always inventing one.