Browser Agent

A natural language browser automation agent powered by ConnectOnion and Playwright.

Quick Start

The fastest way to use the browser agent is via the ConnectOnion CLI:

pip install connectonion
co browser

For Developers

If you want to customize the browser agent (modify tools, prompts, or add new capabilities), clone this repo as a starting point:

git clone https://github.com/openonion/browser-agent.git
cd browser-agent
./setup.sh

Customizing Browser Tools

The browser tools live in the ConnectOnion SDK. Instead of modifying them directly, use co copy to copy them into your project for customization:

# Copy browser tools to your local ./tools/ directory
co copy browser_tools

This gives you a local copy you can modify freely. The default tools/browser.py is a thin re-export from the SDK:

from connectonion.useful_tools.browser_tools import BrowserAutomation
web = BrowserAutomation()

After running co copy, you'll have the full source locally and can customize element finding, scrolling, keyboard handling, etc.

Manual Setup

pip install -r requirements.txt
co init
playwright install
co auth

Usage

# Single task
python cli.py run "Go to news.ycombinator.com and find the top story"

# Interactive mode
python cli.py interactive

from agent import create_agent

agent = create_agent()
agent.input("Go to google.com and search for AI news")

Features

Natural language browser control
Automatic screenshots with vision support (LLM sees the page)
Smart element finding (no CSS selectors needed)
Form automation
Persistent Chrome profile (cookies, sessions survive restarts)
Deep Research Mode (spawns sub-agents for multi-source research)
Platform-aware keyboard shortcuts (auto-detects macOS/Windows/Linux)

Project Structure

browser-agent/
├── cli.py                   # CLI entry point
├── agent.py                 # Agent configuration
├── main.py                  # HTTP/WebSocket host
├── tools/
│   ├── browser.py           # Re-exports BrowserAutomation from SDK
│   ├── file_tools.py        # File operations for research
│   └── deep_research.py     # Deep research tool
├── agents/
│   └── deep_research.py     # Deep research sub-agent
├── prompts/
│   ├── agent.md             # Main agent prompt
│   ├── deep_research.md     # Research sub-agent prompt
│   ├── element_matcher.md   # Element finding strategy
│   ├── form_filler.md       # Form handling strategy
│   └── scroll_strategy.md   # Scrolling strategy
├── tests/
└── requirements.txt

How It Works

You describe what you want in plain English
The agent plans browser actions via LLM
Playwright executes the browser control
Screenshots feed back to the LLM for visual verification
Agent reports results at each step

Run Tests

python tests/test_all.py

Name		Name	Last commit message	Last commit date
Latest commit History 78 Commits
.co/docs		.co/docs
.github/workflows		.github/workflows
agents		agents
docs		docs
examples		examples
prompts		prompts
tests		tests
tools		tools
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
GEMINI.md		GEMINI.md
README.md		README.md
agent.py		agent.py
cli.py		cli.py
main.py		main.py
pytest.ini		pytest.ini
requirements.txt		requirements.txt
setup.sh		setup.sh
test_deep_research_direct.py		test_deep_research_direct.py
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Browser Agent

Quick Start

For Developers

Customizing Browser Tools

Manual Setup

Usage

Features

Project Structure

How It Works

Run Tests

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Browser Agent

Quick Start

For Developers

Customizing Browser Tools

Manual Setup

Usage

Features

Project Structure

How It Works

Run Tests

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages