Generated: 2026-02-20 Sources: 32 resources analyzed Depth: deep
- Basic familiarity with Node.js
npxand/or Pythonpip/uv - Understanding of what cookies and browser sessions are
- Awareness of Chrome DevTools Protocol (CDP) at a conceptual level
- A working Node.js 18+ or Python 3.11+ environment
- Playwright CLI (
npx playwright) is the lowest-friction entry point:codegen,screenshot,pdf,open, andshow-tracecommands require zero scripting. The--save-storage/--load-storageflags give you a complete auth-handoff pattern in two commands. - Playwright MCP exposes 30+ named tools (
browser_navigate,browser_click,browser_take_screenshot, etc.) to any MCP-capable agent host without writing a single line of Playwright API code. - browser-use is the highest-level Python option: it ships a CLI (
browser-use open,browser-use click N,browser-use screenshot) and a Python agent API. Designed specifically for LLM agents. - Steel Browser and Browserless expose a REST API (
POST /v1/screenshot,/v1/scrape,/v1/pdf) so agents can drive a browser with plaincurlorfetch. - For the auth handoff, the canonical pattern is: headed
npx playwright codegen --save-storage=auth.json <login-url>→ user logs in → agent uses--load-storage=auth.jsonon all subsequent headless commands.
Browser automation tools fall into three layers. Understanding which layer you are at tells you how much boilerplate you need.
Layer 1 - CLI/REST verbs (zero boilerplate)
You call a binary or HTTP endpoint. No session object, no page object, no awaiting. Examples: npx playwright screenshot, curl http://localhost:3000/v1/screenshot, browser-use click 3.
Layer 2 - MCP tools (agent-native, zero boilerplate)
The browser is a running server exposing named tools. Your agent calls browser_navigate({url}) as a tool call, same as any other MCP tool. The session is persistent across calls. No JS or Python required in the agent.
Layer 3 - Library API (full power, more boilerplate) You write Playwright/Puppeteer/Rod scripts. Full control of every event but you must manage the async lifecycle yourself.
For AI agents, Layer 1 and Layer 2 are almost always preferable. Layer 3 is the implementation layer for building Layer 1/2 wrappers.
- Headed: real browser window appears. Required for user-interactive auth flows.
- Headless: no window. Faster, suitable for automated runs after auth is established.
All major tools default to headless or can be switched with a flag.
Most interesting pages require login. The canonical pattern for CLI agents:
- Trigger a headed browser so the user can see and interact with a real login form.
- Capture the resulting session (cookies + localStorage) into a file.
- Inject that file into all subsequent headless requests.
This is a one-time human action. The agent then operates fully autonomously from step 3 onward.
npx playwright codegen --save-storage=auth.json writes a JSON file with this structure:
{
"cookies": [
{
"name": "session",
"value": "abc123...",
"domain": ".example.com",
"path": "/",
"expires": 1771234567.0,
"httpOnly": true,
"secure": true,
"sameSite": "Lax"
}
],
"origins": [
{
"origin": "https://example.com",
"localStorage": [
{ "name": "auth_token", "value": "eyJ..." }
]
}
]
}This file is directly understood by --load-storage, by Playwright MCP's --storage-state, and can be converted to Netscape cookies.txt for curl/wget/yt-dlp.
The Netscape cookie file format is a 7-field tab-separated text file:
# Netscape HTTP Cookie File
# Generated by browser-automation tool
.example.com TRUE / TRUE 1771234567 session abc123...
example.com FALSE /api FALSE 0 csrf_token xyz789
Fields: domain, include_subdomains (TRUE/FALSE), path, https_only (TRUE/FALSE), expires_unix_epoch (0 = session cookie), name, value.
Lines starting with # are comments. Lines starting with #HttpOnly_ indicate HttpOnly cookies.
Used by: curl -b cookies.txt, wget --load-cookies, yt-dlp --cookies, httpx.
Installation: npm install -D playwright or npm install -g playwright
Core commands:
| Command | What it does | Key flags |
|---|---|---|
npx playwright codegen [url] |
Opens headed browser, records interactions to test script | --save-storage=auth.json, -o out.js, --target python |
npx playwright screenshot [url] [file] |
Headless screenshot | --full-page, --load-storage=auth.json, -b chromium|firefox|webkit |
npx playwright pdf [url] [file] |
Save page as PDF (Chromium only) | --paper-format=A4, --load-storage=auth.json |
npx playwright open [url] |
Open headed browser interactively | --load-storage=auth.json, --save-storage=auth.json |
npx playwright show-trace [file] |
View recorded trace | --port 9323 |
The auth handoff in two commands:
# Step 1: User logs in (headed browser opens, user sees real page)
npx playwright codegen --save-storage=auth.json https://example.com/login
# (user logs in manually, closes browser, auth.json now has cookies)
# Step 2: Agent uses saved session for headless work
npx playwright screenshot --load-storage=auth.json \
https://example.com/dashboard dashboard.png
npx playwright pdf --load-storage=auth.json \
https://example.com/report report.pdfNote on interactivity: npx playwright codegen opens a visible browser and a side panel with generated code. The user can navigate, log in, and then close the window. The --save-storage flag captures state at close. This is the cleanest agent-triggered human-auth pattern available.
Standard options (shared across all commands):
--browser/-b:cr(chromium),ff(firefox),wk(webkit),msedge,chrome--device: emulate device ("iPhone 13","Pixel 5")--viewport-size:"1280,720"--user-agent,--lang,--timezone,--geolocation--proxy-server--ignore-https-errors--user-data-dir: use persistent Chrome profile with existing logins--channel:chrome,msedge,chrome-beta
Screenshot example with wait:
npx playwright screenshot \
--full-page \
--wait-for-selector=".dashboard-loaded" \
--load-storage=auth.json \
https://example.com/dashboard \
out.pngWhat it is: A Model Context Protocol server that exposes browser automation as ~30 named tools. Any MCP-capable agent host (Claude Desktop, VS Code Copilot, Cursor, Cline, Windsurf) can call these tools without any Playwright code.
Installation (add to MCP client config):
{
"mcpServers": {
"playwright": {
"command": "npx",
"args": ["@playwright/mcp@latest"]
}
}
}With options (headed browser + persistent auth):
{
"mcpServers": {
"playwright": {
"command": "npx",
"args": [
"@playwright/mcp@latest",
"--browser", "chrome",
"--user-data-dir", "/home/user/.playwright-agent-profile"
]
}
}
}Available MCP tools (complete list):
Core navigation & interaction:
| Tool | Description |
|---|---|
browser_navigate |
Navigate to a URL |
browser_navigate_back |
Go back in history |
browser_click |
Click an element (by accessibility label/text/role) |
browser_type |
Type text into a focused field |
browser_fill_form |
Fill multiple form fields at once |
browser_select_option |
Choose from a dropdown |
browser_hover |
Hover over an element |
browser_drag |
Drag and drop between elements |
browser_press_key |
Send keyboard input |
browser_handle_dialog |
Respond to alert/confirm/prompt dialogs |
browser_file_upload |
Upload a file |
Page inspection:
| Tool | Description |
|---|---|
browser_snapshot |
Get accessibility tree of current page (preferred over screenshot for LLMs) |
browser_take_screenshot |
Capture PNG screenshot |
browser_evaluate |
Run JavaScript and return result |
browser_console_messages |
Get browser console logs |
browser_network_requests |
List all network requests since load |
browser_wait_for |
Wait for text to appear/disappear or timeout |
Tab & session management:
| Tool | Description |
|---|---|
browser_tabs |
List, create, close, or switch tabs |
browser_resize |
Resize browser window |
browser_close |
Close current page |
browser_install |
Install browser binaries |
Vision mode (requires --caps vision):
| Tool | Description |
|---|---|
browser_mouse_click_xy |
Click at pixel coordinates |
browser_mouse_move_xy |
Move mouse to coordinates |
browser_mouse_drag_xy |
Drag using pixel coordinates |
browser_mouse_wheel |
Scroll |
PDF (requires --caps pdf):
| Tool | Description |
|---|---|
browser_pdf_save |
Save current page as PDF |
Testing assertions (requires --caps testing):
| Tool | Description |
|---|---|
browser_verify_text_visible |
Assert text is present |
browser_verify_element_visible |
Assert element exists |
browser_generate_locator |
Generate stable CSS/Aria selector |
How agents use Playwright MCP:
The server runs a persistent headed or headless browser. The agent calls tools sequentially:
agent → browser_navigate({url: "https://example.com/login"})
agent → browser_snapshot() # read page structure
agent → browser_type({element: "Email field", text: "user@example.com"})
agent → browser_type({element: "Password field", text: "..."})
agent → browser_click({element: "Sign in button"})
agent → browser_snapshot() # verify login succeeded
agent → browser_navigate({url: "https://example.com/dashboard"})
agent → browser_take_screenshot({filename: "dashboard.png"})
Auth handoff with Playwright MCP:
Option A - Persistent profile (simplest):
"args": ["@playwright/mcp@latest", "--user-data-dir", "/path/to/profile"]User logs into a normal Chrome window using that profile once. The agent uses that profile forever.
Option B - Storage state file:
"args": ["@playwright/mcp@latest", "--storage-state", "/path/to/auth.json"]Auth was captured separately (e.g., with npx playwright codegen --save-storage).
Option C - Chrome extension bridge:
"args": ["@playwright/mcp@latest", "--extension"]The agent connects to your currently running Chrome browser tab. Uses whatever session is already active.
Option D - CDP endpoint (connect to running Chrome):
"args": [
"@playwright/mcp@latest",
"--cdp-endpoint", "http://localhost:9222"
]User launches Chrome with --remote-debugging-port=9222, logs in, agent connects to that live session.
Key insight: browser_snapshot returns the accessibility tree as structured text, not a screenshot. This is far more token-efficient for LLM consumption and does not require a vision model.
What it is: A Python library with a CLI and agent API designed specifically for LLM-driven browser automation. The agent receives a high-level task description and plans/executes browser interactions autonomously.
Installation:
pip install browser-use
# or
uv add browser-use
uvx browser-use install # downloads ChromiumCLI interface (stateful session persists between commands):
browser-use open https://example.com # navigate
browser-use state # list clickable elements by index
browser-use click 5 # click element #5
browser-use type "search query" # type text
browser-use screenshot page.png # capture screen
browser-use close # end sessionAgent API (LLM controls browser autonomously):
from browser_use import Agent, Browser, ChatBrowserUse
import asyncio
async def run():
browser = Browser()
llm = ChatBrowserUse() # or use OpenAI, Anthropic, etc.
agent = Agent(
task="Log into GitHub, go to my notifications, summarize the top 3",
llm=llm,
browser=browser,
)
result = await agent.run()
print(result)
asyncio.run(run())Auth with real Chrome profile:
from browser_use import Browser, BrowserConfig
browser = Browser(config=BrowserConfig(
chrome_instance_path="/usr/bin/google-chrome",
# Uses default Chrome profile with existing logins
))Custom tools extension:
from browser_use import Agent
from browser_use.browser.context import BrowserContext
@agent.action("Read the current page URL and return it")
async def get_current_url(browser: BrowserContext) -> str:
page = await browser.get_current_page()
return page.urlComparison to Playwright MCP: browser-use is more autonomous - you give it a task and it figures out the steps. Playwright MCP gives you individual tool calls (more control, less autonomy). browser-use requires Python; Playwright MCP is language-agnostic.
What it is: Puppeteer with a plugin system. The key plugin is puppeteer-extra-plugin-stealth which patches ~20 bot-detection signals.
Installation:
npm install puppeteer-extra puppeteer-extra-plugin-stealthBasic usage (still requires scripting, no CLI wrapper):
const puppeteer = require('puppeteer-extra');
const StealthPlugin = require('puppeteer-extra-plugin-stealth');
puppeteer.use(StealthPlugin());
const browser = await puppeteer.launch({ headless: false });
const page = await browser.newPage();
await page.goto('https://example.com');
await page.screenshot({ path: 'screenshot.png' });
await browser.close();Key note: There is no standalone puppeteer CLI tool for agents. Puppeteer is a library only. For CLI-driven use, Playwright CLI is the better choice. Puppeteer-extra's main value is stealth for avoiding bot detection.
Comparison to Playwright: Playwright is now generally preferred. Playwright has a built-in CLI, supports 3 browser engines natively, and has a richer ecosystem including MCP. Puppeteer supports Chrome/Firefox only and has no CLI.
What it is: CDP is the underlying wire protocol that Playwright, Puppeteer, and all Chromium-based automation tools use. You can drive Chrome directly via HTTP and WebSocket without any framework.
Launching Chrome with debugging port:
# Headed (user-visible) - good for auth
google-chrome \
--remote-debugging-port=9222 \
--user-data-dir=/tmp/chrome-agent \
https://example.com/login
# Or headless
google-chrome \
--headless=new \
--remote-debugging-port=9222 \
--user-data-dir=/tmp/chrome-agentHTTP API endpoints (no WebSocket needed for these):
# List tabs
curl http://localhost:9222/json/list
# Create new tab
curl "http://localhost:9222/json/new?https://example.com"
# Close tab
curl "http://localhost:9222/json/close/{targetId}"
# Get browser version
curl http://localhost:9222/json/versionWebSocket CDP commands (for actual page control):
const CDP = require('chrome-remote-interface');
async function captureAuth() {
const client = await CDP();
const { Network, Page } = client;
await Network.enable();
await Page.enable();
await Page.navigate({ url: 'https://example.com/login' });
await Page.loadEventFired();
// After user logs in (poll or wait), capture cookies
const { cookies } = await Network.getAllCookies();
console.log(JSON.stringify(cookies));
await client.close();
}CLI REPL with chrome-remote-interface:
npm install -g chrome-remote-interface
# List targets
chrome-remote-interface list
# Open a URL in new tab
chrome-remote-interface new 'https://example.com'
# Interactive REPL (send CDP commands interactively)
chrome-remote-interface inspect
# Then inside REPL:
# > Page.navigate({url: 'https://example.com'})
# > Network.getAllCookies()Getting cookies via CDP:
# Using websocat + jq (pure CLI, no Node.js needed after browser launch)
WS=$(curl -s http://localhost:9222/json/list | jq -r '.[0].webSocketDebuggerUrl')
echo '{"id":1,"method":"Network.getAllCookies"}' \
| websocat "$WS" \
| jq '.result.cookies[]'CDP verdict for agents: CDP is powerful but verbose. Best used as a foundation layer. The chrome-remote-interface REPL is useful for exploration. For production agent use, Playwright MCP or Playwright CLI are cleaner because they handle the WebSocket protocol, target management, and element selectors automatically.
What it is: A Docker service that wraps headless Chrome and exposes a REST API. Agents call HTTP endpoints without managing any browser process.
Run locally:
docker run -p 3000:3000 ghcr.io/browserless/chromeREST endpoints (all POST with JSON body):
# Screenshot
curl -X POST http://localhost:3000/screenshot \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com", "fullPage": true}' \
--output out.png
# PDF
curl -X POST http://localhost:3000/pdf \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com"}' \
--output out.pdf
# HTML content
curl -X POST http://localhost:3000/content \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com"}'
# Execute Puppeteer script
curl -X POST http://localhost:3000/function \
-H "Content-Type: application/json" \
-d '{
"code": "module.exports = async ({page}) => { await page.goto(args.url); return await page.title(); }",
"context": {"url": "https://example.com"}
}'Passing cookies to Browserless:
# Inject cookies in the request body
curl -X POST http://localhost:3000/screenshot \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com/dashboard",
"cookies": [
{"name": "session", "value": "abc123", "domain": "example.com"}
]
}' --output dashboard.pngTrade-off: Requires Docker. But once running, agents just need curl. No Node.js, no Python. Good for polyglot agents.
What it is: An open-source browser API service similar to Browserless but with a session-oriented architecture. Good for multi-step authenticated workflows.
Run locally:
# Via npm
npx @steel-dev/steel start
# Or Docker
docker run -p 3000:3000 ghcr.io/steel-dev/steelREST endpoints:
# Create a session (returns sessionId)
SESSION=$(curl -s -X POST http://localhost:3000/v1/sessions \
-H "Content-Type: application/json" \
-d '{"blockAds": true}' | jq -r '.id')
# Screenshot a URL (stateless quick action)
curl -X POST http://localhost:3000/v1/screenshot \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com", "fullPage": true}' \
--output out.png
# Scrape page content
curl -X POST http://localhost:3000/v1/scrape \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com"}'
# PDF
curl -X POST http://localhost:3000/v1/pdf \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com"}' \
--output out.pdfSessions persist cookies across requests - once you log into a page within a session, all subsequent requests in that session are authenticated.
Connect Playwright to Steel session:
const { chromium } = require('playwright');
const browser = await chromium.connectOverCDP(
`ws://localhost:3000?sessionId=${sessionId}`
);When to use: Your agent runs from a shell, you want zero framework knowledge required.
# ---- Human does this once ----
# Open headed browser for user login
npx playwright codegen \
--save-storage=~/.agent/auth/example-auth.json \
https://example.com/login
# [Browser opens, user logs in, browser closes, auth.json written]
# ---- Agent does this autonomously ----
npx playwright screenshot \
--load-storage=~/.agent/auth/example-auth.json \
https://example.com/dashboard \
/tmp/dashboard.png
# Agent can also generate a full-page PDF
npx playwright pdf \
--load-storage=~/.agent/auth/example-auth.json \
https://example.com/report \
/tmp/report.pdfNo Playwright code written. No async/await. Just CLI commands.
When to use: Your agent is running inside an MCP host and you want to connect to the user's real logged-in browser.
{
"mcpServers": {
"playwright": {
"command": "npx",
"args": ["@playwright/mcp@latest", "--extension"]
}
}
}- User installs "Playwright MCP Bridge" Chrome extension.
- User is already logged into sites in their normal Chrome.
- Agent calls
browser_navigate/browser_snapshot/browser_clickdirectly on those tabs. - No auth file needed - the user's live session is used.
When to use: You want the most direct control, or you're already running Chrome elsewhere.
# User launches Chrome with debugging enabled
google-chrome \
--remote-debugging-port=9222 \
--user-data-dir=$HOME/.agent-chrome-profile \
https://example.com/login
# User logs in normally.
# Agent now connects and captures cookies
node -e "
const CDP = require('chrome-remote-interface');
CDP(async (client) => {
await client.Network.enable();
const {cookies} = await client.Network.getAllCookies();
const fs = require('fs');
// Convert to Playwright storageState format
fs.writeFileSync('auth.json', JSON.stringify({cookies, origins: []}, null, 2));
await client.close();
});
"Then use auth.json with npx playwright screenshot --load-storage=auth.json ....
Useful when you want to use the captured session with curl, wget, or yt-dlp.
import json, sys
from datetime import datetime
auth = json.load(open('auth.json'))
print("# Netscape HTTP Cookie File")
for c in auth.get('cookies', []):
domain = c['domain']
include_subdomains = 'TRUE' if domain.startswith('.') else 'FALSE'
path = c.get('path', '/')
https_only = 'TRUE' if c.get('secure', False) else 'FALSE'
expires = int(c.get('expires', 0)) if c.get('expires', -1) != -1 else 0
name = c['name']
value = c['value']
print(f"{domain}\t{include_subdomains}\t{path}\t{https_only}\t{expires}\t{name}\t{value}")python3 convert.py > cookies.txt
curl -b cookies.txt https://example.com/api/data
wget --load-cookies=cookies.txt https://example.com/api/data
yt-dlp --cookies cookies.txt https://example.com/videoimport json, time
def netscape_to_playwright(cookies_file):
cookies = []
with open(cookies_file) as f:
for line in f:
line = line.strip()
if not line or line.startswith('#'):
continue
parts = line.split('\t')
if len(parts) != 7:
continue
domain, incl_sub, path, https_only, expires, name, value = parts
cookies.append({
'name': name,
'value': value,
'domain': domain,
'path': path,
'expires': float(expires) if expires and expires != '0' else -1,
'httpOnly': False,
'secure': https_only == 'TRUE',
'sameSite': 'None'
})
return {'cookies': cookies, 'origins': []}
state = netscape_to_playwright('cookies.txt')
json.dump(state, open('auth.json', 'w'), indent=2)| Tool | Interface | Auth Handoff | Boilerplate | Best For |
|---|---|---|---|---|
| Playwright CLI | Shell commands | --save-storage / --load-storage |
Zero | CLI agents, shell scripts |
| Playwright MCP | MCP tool calls | --storage-state, --extension, --cdp-endpoint |
Zero | MCP agent hosts (Claude, Cursor, etc.) |
| browser-use | Python + CLI | Chrome profile reuse | Low (Python) | Autonomous task agents (Python) |
| Chrome CDP direct | WebSocket + HTTP | Manual cookie capture | Medium (JS) | Fine-grained control, low-level |
| chrome-remote-interface | CLI REPL + JS | Network.getAllCookies() |
Low-medium | Exploration, scripting |
| Browserless | REST API (curl) | Cookie injection in JSON body | Zero (needs Docker) | Polyglot agents, Docker-friendly |
| Steel Browser | REST API (curl) | Session-scoped cookie persistence | Zero (needs Docker/npx) | Multi-step auth workflows |
| puppeteer-extra | JS library | Manual scripting | High | Bot-detection avoidance |
| Pitfall | Why It Happens | How to Avoid |
|---|---|---|
| Capturing auth.json but cookies expire | Session cookies have short TTL | Check expires field; re-capture if expired. Use --user-data-dir for persistent profile instead. |
| Playwright PDF not working | PDF command only works with Chromium | Always pass -b chromium or --channel chrome for PDF |
| Screenshot captures login page, not dashboard | Session not loaded | Always pass --load-storage=auth.json |
| Browser bot-detection blocking | Playwright leaves fingerprints | Use --channel chrome (real Chrome binary) instead of Chromium. Or use puppeteer-extra-stealth. |
| MCP tools using accessibility tree but page has poor ARIA | Site has no semantic markup | Fall back to browser_take_screenshot + vision, or use browser_evaluate for DOM queries |
| CDP WebSocket closes on page navigation | WebSocket is per-target | Re-attach after navigation using Target events |
| Netscape cookies.txt parse error | Wrong line endings (CRLF vs LF) | Normalize to LF on Unix: sed -i 's/\r//' cookies.txt |
browser-use agent gets stuck in loop |
LLM hallucinating element states | Set max_steps limit; use browser-use state to inspect actual element indices |
| auth.json committed to git | Forgot to gitignore | Add *.auth.json, auth/, .auth/ to .gitignore |
-
Store auth files outside the repo — use
~/.agent/auth/{service}-auth.jsonor environment-relative paths. Never commit session files. (Multiple sources) -
Prefer
--user-data-dirover--save-storagefor long-running agents — user data directories persist across browser restarts, handle refresh tokens, and work for sites that rotate session cookies. (Playwright MCP docs) -
Use
browser_snapshotover screenshots for text extraction — the accessibility tree is ~10x more token-efficient than describing a screenshot and does not require a vision model. (Playwright MCP README) -
Use
--channel chrome(real Chrome) when bot detection is an issue — websites fingerprint Chrome vs Chromium. The real Chrome binary passes more checks. (Playwright docs, chrome-for-testing) -
Separate the headed auth step from the headless work step — document these as two distinct phases in your agent code. This makes re-authentication easy when sessions expire. (browser-use docs)
-
For multi-step workflows, use session-based tools — Steel Browser sessions and Playwright MCP's persistent browser maintain cookie state across page navigations automatically. One-shot REST calls lose state. (Steel Browser docs)
-
Test for element visibility before interaction — use
--wait-for-selector(CLI) orbrowser_wait_for(MCP) to avoid flaky automation on dynamic pages. (Playwright CLI docs) -
Validate the captured auth immediately — after
--save-storage, run one screenshot with--load-storageand check it shows the logged-in state before using the auth file in production. (Playwright docs)
#!/bin/bash
# auth-handoff.sh - Agent auth handoff using only Playwright CLI
AUTH_FILE="$HOME/.agent/auth/myapp-auth.json"
BASE_URL="https://myapp.example.com"
# Phase 1: Human auth (run once, or when session expires)
capture_auth() {
mkdir -p "$(dirname "$AUTH_FILE")"
echo "Opening browser for login..."
npx playwright codegen \
--save-storage="$AUTH_FILE" \
"$BASE_URL/login"
echo "Auth captured: $AUTH_FILE"
}
# Phase 2: Agent uses auth headlessly
take_screenshot() {
local url="$1"
local out="$2"
npx playwright screenshot \
--load-storage="$AUTH_FILE" \
--full-page \
"$url" "$out"
}
save_pdf() {
local url="$1"
local out="$2"
npx playwright pdf \
--load-storage="$AUTH_FILE" \
-b chromium \
"$url" "$out"
}
# If auth file is missing or stale, capture it
if [ ! -f "$AUTH_FILE" ]; then
capture_auth
fi
# Agent work
take_screenshot "$BASE_URL/dashboard" /tmp/dashboard.png
save_pdf "$BASE_URL/report/monthly" /tmp/monthly-report.pdfWhen an MCP agent wants to do browser work:
# Agent internal monologue:
# 1. Check if page is accessible
tool_call: browser_navigate({url: "https://app.example.com/dashboard"})
tool_call: browser_snapshot()
# → Returns accessibility tree; if login wall detected, trigger auth flow
# 2. If login needed (persistent profile approach):
# Agent tells user: "Please log into the browser window that just opened"
# (Browser was started with --user-data-dir, user's existing login may already work)
# 3. Once authenticated, proceed
tool_call: browser_snapshot() # verify dashboard loaded
tool_call: browser_evaluate({expression: "document.title"}) # extract data
tool_call: browser_take_screenshot({filename: "/tmp/dashboard.png"})
import asyncio, json
from browser_use import Agent, Browser, BrowserConfig, ChatBrowserUse
async def authenticated_scrape():
# Option A: Use existing Chrome profile (simplest for auth)
browser = Browser(config=BrowserConfig(
chrome_instance_path="/usr/bin/google-chrome",
headless=False,
))
# Option B: Use previously saved Playwright storageState
# browser = Browser(config=BrowserConfig(storage_state="auth.json"))
llm = ChatBrowserUse()
agent = Agent(
task="""
Go to https://app.example.com/reports.
Find the most recent report dated this month.
Download it or return its URL.
""",
llm=llm,
browser=browser,
max_steps=20,
)
result = await agent.run()
print(result)
await browser.close()
asyncio.run(authenticated_scrape())# After capturing auth.json with playwright codegen --save-storage
# Quick Python converter (inline)
python3 -c "
import json, sys
data = json.load(open('auth.json'))
print('# Netscape HTTP Cookie File')
for c in data.get('cookies', []):
dom = c['domain']
sub = 'TRUE' if dom.startswith('.') else 'FALSE'
sec = 'TRUE' if c.get('secure') else 'FALSE'
exp = int(c.get('expires', 0)) if c.get('expires', -1) > 0 else 0
print(f\"{dom}\t{sub}\t{c['path']}\t{sec}\t{exp}\t{c['name']}\t{c['value']}\")
" > cookies.txt
# Use with curl
curl -b cookies.txt https://app.example.com/api/data | jq .
# Use with wget
wget --load-cookies=cookies.txt -O data.json https://app.example.com/api/data
# Use with yt-dlp
yt-dlp --cookies cookies.txt https://app.example.com/video/123| Resource | Type | Why Recommended |
|---|---|---|
| Playwright CLI docs | Official Docs | Authoritative reference for all CLI commands and flags |
| Playwright Auth docs | Official Docs | Comprehensive guide to storageState, setup projects, session reuse |
| Playwright MCP on GitHub | Official Repo | Complete tool list, config options, Chrome extension setup |
| browser-use on GitHub | Official Repo | Agent API, CLI reference, custom tools, production deployment |
| Chrome DevTools Protocol | Official Spec | Complete CDP domain/method reference |
| chrome-remote-interface | Library | Node.js CDP wrapper with CLI REPL |
| Steel Browser | Open Source | REST API browser service, session management |
| Browserless | Open Source | Docker REST browser service |
| yt-dlp cookies guide | Guide | Netscape cookie format, browser extension recommendations |
| puppeteer-extra-stealth | Plugin | 20+ bot-detection patches for Puppeteer |
Generated by /learn from 32 sources.
See resources/cli-browser-automation-agents-sources.json for full source metadata.