Skip to content

Latest commit

 

History

History
936 lines (728 loc) · 33.1 KB

File metadata and controls

936 lines (728 loc) · 33.1 KB

Learning Guide: CLI-First Browser Automation for AI Agents

Generated: 2026-02-20 Sources: 32 resources analyzed Depth: deep


Prerequisites

  • Basic familiarity with Node.js npx and/or Python pip/uv
  • Understanding of what cookies and browser sessions are
  • Awareness of Chrome DevTools Protocol (CDP) at a conceptual level
  • A working Node.js 18+ or Python 3.11+ environment

TL;DR

  • Playwright CLI (npx playwright) is the lowest-friction entry point: codegen, screenshot, pdf, open, and show-trace commands require zero scripting. The --save-storage / --load-storage flags give you a complete auth-handoff pattern in two commands.
  • Playwright MCP exposes 30+ named tools (browser_navigate, browser_click, browser_take_screenshot, etc.) to any MCP-capable agent host without writing a single line of Playwright API code.
  • browser-use is the highest-level Python option: it ships a CLI (browser-use open, browser-use click N, browser-use screenshot) and a Python agent API. Designed specifically for LLM agents.
  • Steel Browser and Browserless expose a REST API (POST /v1/screenshot, /v1/scrape, /v1/pdf) so agents can drive a browser with plain curl or fetch.
  • For the auth handoff, the canonical pattern is: headed npx playwright codegen --save-storage=auth.json <login-url> → user logs in → agent uses --load-storage=auth.json on all subsequent headless commands.

Core Concepts

1. The Three Automation Layers

Browser automation tools fall into three layers. Understanding which layer you are at tells you how much boilerplate you need.

Layer 1 - CLI/REST verbs (zero boilerplate) You call a binary or HTTP endpoint. No session object, no page object, no awaiting. Examples: npx playwright screenshot, curl http://localhost:3000/v1/screenshot, browser-use click 3.

Layer 2 - MCP tools (agent-native, zero boilerplate) The browser is a running server exposing named tools. Your agent calls browser_navigate({url}) as a tool call, same as any other MCP tool. The session is persistent across calls. No JS or Python required in the agent.

Layer 3 - Library API (full power, more boilerplate) You write Playwright/Puppeteer/Rod scripts. Full control of every event but you must manage the async lifecycle yourself.

For AI agents, Layer 1 and Layer 2 are almost always preferable. Layer 3 is the implementation layer for building Layer 1/2 wrappers.

2. Headed vs Headless

  • Headed: real browser window appears. Required for user-interactive auth flows.
  • Headless: no window. Faster, suitable for automated runs after auth is established.

All major tools default to headless or can be switched with a flag.

3. The Auth Handoff Problem

Most interesting pages require login. The canonical pattern for CLI agents:

  1. Trigger a headed browser so the user can see and interact with a real login form.
  2. Capture the resulting session (cookies + localStorage) into a file.
  3. Inject that file into all subsequent headless requests.

This is a one-time human action. The agent then operates fully autonomously from step 3 onward.

4. Playwright storageState JSON Format

npx playwright codegen --save-storage=auth.json writes a JSON file with this structure:

{
  "cookies": [
    {
      "name": "session",
      "value": "abc123...",
      "domain": ".example.com",
      "path": "/",
      "expires": 1771234567.0,
      "httpOnly": true,
      "secure": true,
      "sameSite": "Lax"
    }
  ],
  "origins": [
    {
      "origin": "https://example.com",
      "localStorage": [
        { "name": "auth_token", "value": "eyJ..." }
      ]
    }
  ]
}

This file is directly understood by --load-storage, by Playwright MCP's --storage-state, and can be converted to Netscape cookies.txt for curl/wget/yt-dlp.

5. Netscape cookies.txt Format

The Netscape cookie file format is a 7-field tab-separated text file:

# Netscape HTTP Cookie File
# Generated by browser-automation tool

.example.com	TRUE	/	TRUE	1771234567	session	abc123...
example.com	FALSE	/api	FALSE	0	csrf_token	xyz789

Fields: domain, include_subdomains (TRUE/FALSE), path, https_only (TRUE/FALSE), expires_unix_epoch (0 = session cookie), name, value.

Lines starting with # are comments. Lines starting with #HttpOnly_ indicate HttpOnly cookies.

Used by: curl -b cookies.txt, wget --load-cookies, yt-dlp --cookies, httpx.


Tools Reference

Playwright CLI (npx playwright)

Installation: npm install -D playwright or npm install -g playwright

Core commands:

Command What it does Key flags
npx playwright codegen [url] Opens headed browser, records interactions to test script --save-storage=auth.json, -o out.js, --target python
npx playwright screenshot [url] [file] Headless screenshot --full-page, --load-storage=auth.json, -b chromium|firefox|webkit
npx playwright pdf [url] [file] Save page as PDF (Chromium only) --paper-format=A4, --load-storage=auth.json
npx playwright open [url] Open headed browser interactively --load-storage=auth.json, --save-storage=auth.json
npx playwright show-trace [file] View recorded trace --port 9323

The auth handoff in two commands:

# Step 1: User logs in (headed browser opens, user sees real page)
npx playwright codegen --save-storage=auth.json https://example.com/login
# (user logs in manually, closes browser, auth.json now has cookies)

# Step 2: Agent uses saved session for headless work
npx playwright screenshot --load-storage=auth.json \
  https://example.com/dashboard dashboard.png

npx playwright pdf --load-storage=auth.json \
  https://example.com/report report.pdf

Note on interactivity: npx playwright codegen opens a visible browser and a side panel with generated code. The user can navigate, log in, and then close the window. The --save-storage flag captures state at close. This is the cleanest agent-triggered human-auth pattern available.

Standard options (shared across all commands):

  • --browser / -b: cr (chromium), ff (firefox), wk (webkit), msedge, chrome
  • --device: emulate device ("iPhone 13", "Pixel 5")
  • --viewport-size: "1280,720"
  • --user-agent, --lang, --timezone, --geolocation
  • --proxy-server
  • --ignore-https-errors
  • --user-data-dir: use persistent Chrome profile with existing logins
  • --channel: chrome, msedge, chrome-beta

Screenshot example with wait:

npx playwright screenshot \
  --full-page \
  --wait-for-selector=".dashboard-loaded" \
  --load-storage=auth.json \
  https://example.com/dashboard \
  out.png

Playwright MCP (@playwright/mcp)

What it is: A Model Context Protocol server that exposes browser automation as ~30 named tools. Any MCP-capable agent host (Claude Desktop, VS Code Copilot, Cursor, Cline, Windsurf) can call these tools without any Playwright code.

Installation (add to MCP client config):

{
  "mcpServers": {
    "playwright": {
      "command": "npx",
      "args": ["@playwright/mcp@latest"]
    }
  }
}

With options (headed browser + persistent auth):

{
  "mcpServers": {
    "playwright": {
      "command": "npx",
      "args": [
        "@playwright/mcp@latest",
        "--browser", "chrome",
        "--user-data-dir", "/home/user/.playwright-agent-profile"
      ]
    }
  }
}

Available MCP tools (complete list):

Core navigation & interaction:

Tool Description
browser_navigate Navigate to a URL
browser_navigate_back Go back in history
browser_click Click an element (by accessibility label/text/role)
browser_type Type text into a focused field
browser_fill_form Fill multiple form fields at once
browser_select_option Choose from a dropdown
browser_hover Hover over an element
browser_drag Drag and drop between elements
browser_press_key Send keyboard input
browser_handle_dialog Respond to alert/confirm/prompt dialogs
browser_file_upload Upload a file

Page inspection:

Tool Description
browser_snapshot Get accessibility tree of current page (preferred over screenshot for LLMs)
browser_take_screenshot Capture PNG screenshot
browser_evaluate Run JavaScript and return result
browser_console_messages Get browser console logs
browser_network_requests List all network requests since load
browser_wait_for Wait for text to appear/disappear or timeout

Tab & session management:

Tool Description
browser_tabs List, create, close, or switch tabs
browser_resize Resize browser window
browser_close Close current page
browser_install Install browser binaries

Vision mode (requires --caps vision):

Tool Description
browser_mouse_click_xy Click at pixel coordinates
browser_mouse_move_xy Move mouse to coordinates
browser_mouse_drag_xy Drag using pixel coordinates
browser_mouse_wheel Scroll

PDF (requires --caps pdf):

Tool Description
browser_pdf_save Save current page as PDF

Testing assertions (requires --caps testing):

Tool Description
browser_verify_text_visible Assert text is present
browser_verify_element_visible Assert element exists
browser_generate_locator Generate stable CSS/Aria selector

How agents use Playwright MCP:

The server runs a persistent headed or headless browser. The agent calls tools sequentially:

agent → browser_navigate({url: "https://example.com/login"})
agent → browser_snapshot()  # read page structure
agent → browser_type({element: "Email field", text: "user@example.com"})
agent → browser_type({element: "Password field", text: "..."})
agent → browser_click({element: "Sign in button"})
agent → browser_snapshot()  # verify login succeeded
agent → browser_navigate({url: "https://example.com/dashboard"})
agent → browser_take_screenshot({filename: "dashboard.png"})

Auth handoff with Playwright MCP:

Option A - Persistent profile (simplest):

"args": ["@playwright/mcp@latest", "--user-data-dir", "/path/to/profile"]

User logs into a normal Chrome window using that profile once. The agent uses that profile forever.

Option B - Storage state file:

"args": ["@playwright/mcp@latest", "--storage-state", "/path/to/auth.json"]

Auth was captured separately (e.g., with npx playwright codegen --save-storage).

Option C - Chrome extension bridge:

"args": ["@playwright/mcp@latest", "--extension"]

The agent connects to your currently running Chrome browser tab. Uses whatever session is already active.

Option D - CDP endpoint (connect to running Chrome):

"args": [
  "@playwright/mcp@latest",
  "--cdp-endpoint", "http://localhost:9222"
]

User launches Chrome with --remote-debugging-port=9222, logs in, agent connects to that live session.

Key insight: browser_snapshot returns the accessibility tree as structured text, not a screenshot. This is far more token-efficient for LLM consumption and does not require a vision model.


browser-use (Python)

What it is: A Python library with a CLI and agent API designed specifically for LLM-driven browser automation. The agent receives a high-level task description and plans/executes browser interactions autonomously.

Installation:

pip install browser-use
# or
uv add browser-use
uvx browser-use install  # downloads Chromium

CLI interface (stateful session persists between commands):

browser-use open https://example.com   # navigate
browser-use state                       # list clickable elements by index
browser-use click 5                     # click element #5
browser-use type "search query"         # type text
browser-use screenshot page.png         # capture screen
browser-use close                       # end session

Agent API (LLM controls browser autonomously):

from browser_use import Agent, Browser, ChatBrowserUse
import asyncio

async def run():
    browser = Browser()
    llm = ChatBrowserUse()  # or use OpenAI, Anthropic, etc.
    agent = Agent(
        task="Log into GitHub, go to my notifications, summarize the top 3",
        llm=llm,
        browser=browser,
    )
    result = await agent.run()
    print(result)

asyncio.run(run())

Auth with real Chrome profile:

from browser_use import Browser, BrowserConfig

browser = Browser(config=BrowserConfig(
    chrome_instance_path="/usr/bin/google-chrome",
    # Uses default Chrome profile with existing logins
))

Custom tools extension:

from browser_use import Agent
from browser_use.browser.context import BrowserContext

@agent.action("Read the current page URL and return it")
async def get_current_url(browser: BrowserContext) -> str:
    page = await browser.get_current_page()
    return page.url

Comparison to Playwright MCP: browser-use is more autonomous - you give it a task and it figures out the steps. Playwright MCP gives you individual tool calls (more control, less autonomy). browser-use requires Python; Playwright MCP is language-agnostic.


puppeteer-extra

What it is: Puppeteer with a plugin system. The key plugin is puppeteer-extra-plugin-stealth which patches ~20 bot-detection signals.

Installation:

npm install puppeteer-extra puppeteer-extra-plugin-stealth

Basic usage (still requires scripting, no CLI wrapper):

const puppeteer = require('puppeteer-extra');
const StealthPlugin = require('puppeteer-extra-plugin-stealth');
puppeteer.use(StealthPlugin());

const browser = await puppeteer.launch({ headless: false });
const page = await browser.newPage();
await page.goto('https://example.com');
await page.screenshot({ path: 'screenshot.png' });
await browser.close();

Key note: There is no standalone puppeteer CLI tool for agents. Puppeteer is a library only. For CLI-driven use, Playwright CLI is the better choice. Puppeteer-extra's main value is stealth for avoiding bot detection.

Comparison to Playwright: Playwright is now generally preferred. Playwright has a built-in CLI, supports 3 browser engines natively, and has a richer ecosystem including MCP. Puppeteer supports Chrome/Firefox only and has no CLI.


Chrome DevTools Protocol (CDP) Direct

What it is: CDP is the underlying wire protocol that Playwright, Puppeteer, and all Chromium-based automation tools use. You can drive Chrome directly via HTTP and WebSocket without any framework.

Launching Chrome with debugging port:

# Headed (user-visible) - good for auth
google-chrome \
  --remote-debugging-port=9222 \
  --user-data-dir=/tmp/chrome-agent \
  https://example.com/login

# Or headless
google-chrome \
  --headless=new \
  --remote-debugging-port=9222 \
  --user-data-dir=/tmp/chrome-agent

HTTP API endpoints (no WebSocket needed for these):

# List tabs
curl http://localhost:9222/json/list

# Create new tab
curl "http://localhost:9222/json/new?https://example.com"

# Close tab
curl "http://localhost:9222/json/close/{targetId}"

# Get browser version
curl http://localhost:9222/json/version

WebSocket CDP commands (for actual page control):

const CDP = require('chrome-remote-interface');

async function captureAuth() {
  const client = await CDP();
  const { Network, Page } = client;

  await Network.enable();
  await Page.enable();
  await Page.navigate({ url: 'https://example.com/login' });
  await Page.loadEventFired();

  // After user logs in (poll or wait), capture cookies
  const { cookies } = await Network.getAllCookies();
  console.log(JSON.stringify(cookies));

  await client.close();
}

CLI REPL with chrome-remote-interface:

npm install -g chrome-remote-interface

# List targets
chrome-remote-interface list

# Open a URL in new tab
chrome-remote-interface new 'https://example.com'

# Interactive REPL (send CDP commands interactively)
chrome-remote-interface inspect
# Then inside REPL:
# > Page.navigate({url: 'https://example.com'})
# > Network.getAllCookies()

Getting cookies via CDP:

# Using websocat + jq (pure CLI, no Node.js needed after browser launch)
WS=$(curl -s http://localhost:9222/json/list | jq -r '.[0].webSocketDebuggerUrl')
echo '{"id":1,"method":"Network.getAllCookies"}' \
  | websocat "$WS" \
  | jq '.result.cookies[]'

CDP verdict for agents: CDP is powerful but verbose. Best used as a foundation layer. The chrome-remote-interface REPL is useful for exploration. For production agent use, Playwright MCP or Playwright CLI are cleaner because they handle the WebSocket protocol, target management, and element selectors automatically.


Browserless (Self-Hosted REST API)

What it is: A Docker service that wraps headless Chrome and exposes a REST API. Agents call HTTP endpoints without managing any browser process.

Run locally:

docker run -p 3000:3000 ghcr.io/browserless/chrome

REST endpoints (all POST with JSON body):

# Screenshot
curl -X POST http://localhost:3000/screenshot \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com", "fullPage": true}' \
  --output out.png

# PDF
curl -X POST http://localhost:3000/pdf \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com"}' \
  --output out.pdf

# HTML content
curl -X POST http://localhost:3000/content \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com"}'

# Execute Puppeteer script
curl -X POST http://localhost:3000/function \
  -H "Content-Type: application/json" \
  -d '{
    "code": "module.exports = async ({page}) => { await page.goto(args.url); return await page.title(); }",
    "context": {"url": "https://example.com"}
  }'

Passing cookies to Browserless:

# Inject cookies in the request body
curl -X POST http://localhost:3000/screenshot \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/dashboard",
    "cookies": [
      {"name": "session", "value": "abc123", "domain": "example.com"}
    ]
  }' --output dashboard.png

Trade-off: Requires Docker. But once running, agents just need curl. No Node.js, no Python. Good for polyglot agents.


Steel Browser (Self-Hosted REST API)

What it is: An open-source browser API service similar to Browserless but with a session-oriented architecture. Good for multi-step authenticated workflows.

Run locally:

# Via npm
npx @steel-dev/steel start
# Or Docker
docker run -p 3000:3000 ghcr.io/steel-dev/steel

REST endpoints:

# Create a session (returns sessionId)
SESSION=$(curl -s -X POST http://localhost:3000/v1/sessions \
  -H "Content-Type: application/json" \
  -d '{"blockAds": true}' | jq -r '.id')

# Screenshot a URL (stateless quick action)
curl -X POST http://localhost:3000/v1/screenshot \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com", "fullPage": true}' \
  --output out.png

# Scrape page content
curl -X POST http://localhost:3000/v1/scrape \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com"}'

# PDF
curl -X POST http://localhost:3000/v1/pdf \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com"}' \
  --output out.pdf

Sessions persist cookies across requests - once you log into a page within a session, all subsequent requests in that session are authenticated.

Connect Playwright to Steel session:

const { chromium } = require('playwright');
const browser = await chromium.connectOverCDP(
  `ws://localhost:3000?sessionId=${sessionId}`
);

The Auth Handoff: Three Patterns

Pattern 1: Playwright CLI (Recommended for CLI Agents)

When to use: Your agent runs from a shell, you want zero framework knowledge required.

# ---- Human does this once ----
# Open headed browser for user login
npx playwright codegen \
  --save-storage=~/.agent/auth/example-auth.json \
  https://example.com/login
# [Browser opens, user logs in, browser closes, auth.json written]

# ---- Agent does this autonomously ----
npx playwright screenshot \
  --load-storage=~/.agent/auth/example-auth.json \
  https://example.com/dashboard \
  /tmp/dashboard.png

# Agent can also generate a full-page PDF
npx playwright pdf \
  --load-storage=~/.agent/auth/example-auth.json \
  https://example.com/report \
  /tmp/report.pdf

No Playwright code written. No async/await. Just CLI commands.

Pattern 2: Playwright MCP with Chrome Extension (Recommended for MCP Agents)

When to use: Your agent is running inside an MCP host and you want to connect to the user's real logged-in browser.

{
  "mcpServers": {
    "playwright": {
      "command": "npx",
      "args": ["@playwright/mcp@latest", "--extension"]
    }
  }
}
  1. User installs "Playwright MCP Bridge" Chrome extension.
  2. User is already logged into sites in their normal Chrome.
  3. Agent calls browser_navigate / browser_snapshot / browser_click directly on those tabs.
  4. No auth file needed - the user's live session is used.

Pattern 3: CDP + Chrome --remote-debugging-port

When to use: You want the most direct control, or you're already running Chrome elsewhere.

# User launches Chrome with debugging enabled
google-chrome \
  --remote-debugging-port=9222 \
  --user-data-dir=$HOME/.agent-chrome-profile \
  https://example.com/login

# User logs in normally.

# Agent now connects and captures cookies
node -e "
const CDP = require('chrome-remote-interface');
CDP(async (client) => {
  await client.Network.enable();
  const {cookies} = await client.Network.getAllCookies();
  const fs = require('fs');
  // Convert to Playwright storageState format
  fs.writeFileSync('auth.json', JSON.stringify({cookies, origins: []}, null, 2));
  await client.close();
});
"

Then use auth.json with npx playwright screenshot --load-storage=auth.json ....


Converting Between Cookie Formats

Playwright storageState → Netscape cookies.txt

Useful when you want to use the captured session with curl, wget, or yt-dlp.

import json, sys
from datetime import datetime

auth = json.load(open('auth.json'))
print("# Netscape HTTP Cookie File")
for c in auth.get('cookies', []):
    domain = c['domain']
    include_subdomains = 'TRUE' if domain.startswith('.') else 'FALSE'
    path = c.get('path', '/')
    https_only = 'TRUE' if c.get('secure', False) else 'FALSE'
    expires = int(c.get('expires', 0)) if c.get('expires', -1) != -1 else 0
    name = c['name']
    value = c['value']
    print(f"{domain}\t{include_subdomains}\t{path}\t{https_only}\t{expires}\t{name}\t{value}")
python3 convert.py > cookies.txt
curl -b cookies.txt https://example.com/api/data
wget --load-cookies=cookies.txt https://example.com/api/data
yt-dlp --cookies cookies.txt https://example.com/video

Netscape cookies.txt → Playwright storageState

import json, time

def netscape_to_playwright(cookies_file):
    cookies = []
    with open(cookies_file) as f:
        for line in f:
            line = line.strip()
            if not line or line.startswith('#'):
                continue
            parts = line.split('\t')
            if len(parts) != 7:
                continue
            domain, incl_sub, path, https_only, expires, name, value = parts
            cookies.append({
                'name': name,
                'value': value,
                'domain': domain,
                'path': path,
                'expires': float(expires) if expires and expires != '0' else -1,
                'httpOnly': False,
                'secure': https_only == 'TRUE',
                'sameSite': 'None'
            })
    return {'cookies': cookies, 'origins': []}

state = netscape_to_playwright('cookies.txt')
json.dump(state, open('auth.json', 'w'), indent=2)

Comparison Table

Tool Interface Auth Handoff Boilerplate Best For
Playwright CLI Shell commands --save-storage / --load-storage Zero CLI agents, shell scripts
Playwright MCP MCP tool calls --storage-state, --extension, --cdp-endpoint Zero MCP agent hosts (Claude, Cursor, etc.)
browser-use Python + CLI Chrome profile reuse Low (Python) Autonomous task agents (Python)
Chrome CDP direct WebSocket + HTTP Manual cookie capture Medium (JS) Fine-grained control, low-level
chrome-remote-interface CLI REPL + JS Network.getAllCookies() Low-medium Exploration, scripting
Browserless REST API (curl) Cookie injection in JSON body Zero (needs Docker) Polyglot agents, Docker-friendly
Steel Browser REST API (curl) Session-scoped cookie persistence Zero (needs Docker/npx) Multi-step auth workflows
puppeteer-extra JS library Manual scripting High Bot-detection avoidance

Common Pitfalls

Pitfall Why It Happens How to Avoid
Capturing auth.json but cookies expire Session cookies have short TTL Check expires field; re-capture if expired. Use --user-data-dir for persistent profile instead.
Playwright PDF not working PDF command only works with Chromium Always pass -b chromium or --channel chrome for PDF
Screenshot captures login page, not dashboard Session not loaded Always pass --load-storage=auth.json
Browser bot-detection blocking Playwright leaves fingerprints Use --channel chrome (real Chrome binary) instead of Chromium. Or use puppeteer-extra-stealth.
MCP tools using accessibility tree but page has poor ARIA Site has no semantic markup Fall back to browser_take_screenshot + vision, or use browser_evaluate for DOM queries
CDP WebSocket closes on page navigation WebSocket is per-target Re-attach after navigation using Target events
Netscape cookies.txt parse error Wrong line endings (CRLF vs LF) Normalize to LF on Unix: sed -i 's/\r//' cookies.txt
browser-use agent gets stuck in loop LLM hallucinating element states Set max_steps limit; use browser-use state to inspect actual element indices
auth.json committed to git Forgot to gitignore Add *.auth.json, auth/, .auth/ to .gitignore

Best Practices

  1. Store auth files outside the repo — use ~/.agent/auth/{service}-auth.json or environment-relative paths. Never commit session files. (Multiple sources)

  2. Prefer --user-data-dir over --save-storage for long-running agents — user data directories persist across browser restarts, handle refresh tokens, and work for sites that rotate session cookies. (Playwright MCP docs)

  3. Use browser_snapshot over screenshots for text extraction — the accessibility tree is ~10x more token-efficient than describing a screenshot and does not require a vision model. (Playwright MCP README)

  4. Use --channel chrome (real Chrome) when bot detection is an issue — websites fingerprint Chrome vs Chromium. The real Chrome binary passes more checks. (Playwright docs, chrome-for-testing)

  5. Separate the headed auth step from the headless work step — document these as two distinct phases in your agent code. This makes re-authentication easy when sessions expire. (browser-use docs)

  6. For multi-step workflows, use session-based tools — Steel Browser sessions and Playwright MCP's persistent browser maintain cookie state across page navigations automatically. One-shot REST calls lose state. (Steel Browser docs)

  7. Test for element visibility before interaction — use --wait-for-selector (CLI) or browser_wait_for (MCP) to avoid flaky automation on dynamic pages. (Playwright CLI docs)

  8. Validate the captured auth immediately — after --save-storage, run one screenshot with --load-storage and check it shows the logged-in state before using the auth file in production. (Playwright docs)


Code Examples

Complete Shell-Only Auth Handoff

#!/bin/bash
# auth-handoff.sh - Agent auth handoff using only Playwright CLI

AUTH_FILE="$HOME/.agent/auth/myapp-auth.json"
BASE_URL="https://myapp.example.com"

# Phase 1: Human auth (run once, or when session expires)
capture_auth() {
  mkdir -p "$(dirname "$AUTH_FILE")"
  echo "Opening browser for login..."
  npx playwright codegen \
    --save-storage="$AUTH_FILE" \
    "$BASE_URL/login"
  echo "Auth captured: $AUTH_FILE"
}

# Phase 2: Agent uses auth headlessly
take_screenshot() {
  local url="$1"
  local out="$2"
  npx playwright screenshot \
    --load-storage="$AUTH_FILE" \
    --full-page \
    "$url" "$out"
}

save_pdf() {
  local url="$1"
  local out="$2"
  npx playwright pdf \
    --load-storage="$AUTH_FILE" \
    -b chromium \
    "$url" "$out"
}

# If auth file is missing or stale, capture it
if [ ! -f "$AUTH_FILE" ]; then
  capture_auth
fi

# Agent work
take_screenshot "$BASE_URL/dashboard" /tmp/dashboard.png
save_pdf "$BASE_URL/report/monthly" /tmp/monthly-report.pdf

Playwright MCP Agent Workflow (Conceptual)

When an MCP agent wants to do browser work:

# Agent internal monologue:
# 1. Check if page is accessible
tool_call: browser_navigate({url: "https://app.example.com/dashboard"})
tool_call: browser_snapshot()
# → Returns accessibility tree; if login wall detected, trigger auth flow

# 2. If login needed (persistent profile approach):
# Agent tells user: "Please log into the browser window that just opened"
# (Browser was started with --user-data-dir, user's existing login may already work)

# 3. Once authenticated, proceed
tool_call: browser_snapshot()  # verify dashboard loaded
tool_call: browser_evaluate({expression: "document.title"})  # extract data
tool_call: browser_take_screenshot({filename: "/tmp/dashboard.png"})

Python Agent with browser-use + Cookie Export

import asyncio, json
from browser_use import Agent, Browser, BrowserConfig, ChatBrowserUse

async def authenticated_scrape():
    # Option A: Use existing Chrome profile (simplest for auth)
    browser = Browser(config=BrowserConfig(
        chrome_instance_path="/usr/bin/google-chrome",
        headless=False,
    ))

    # Option B: Use previously saved Playwright storageState
    # browser = Browser(config=BrowserConfig(storage_state="auth.json"))

    llm = ChatBrowserUse()
    agent = Agent(
        task="""
        Go to https://app.example.com/reports.
        Find the most recent report dated this month.
        Download it or return its URL.
        """,
        llm=llm,
        browser=browser,
        max_steps=20,
    )

    result = await agent.run()
    print(result)
    await browser.close()

asyncio.run(authenticated_scrape())

curl with Cookies from Playwright Auth

# After capturing auth.json with playwright codegen --save-storage

# Quick Python converter (inline)
python3 -c "
import json, sys
data = json.load(open('auth.json'))
print('# Netscape HTTP Cookie File')
for c in data.get('cookies', []):
    dom = c['domain']
    sub = 'TRUE' if dom.startswith('.') else 'FALSE'
    sec = 'TRUE' if c.get('secure') else 'FALSE'
    exp = int(c.get('expires', 0)) if c.get('expires', -1) > 0 else 0
    print(f\"{dom}\t{sub}\t{c['path']}\t{sec}\t{exp}\t{c['name']}\t{c['value']}\")
" > cookies.txt

# Use with curl
curl -b cookies.txt https://app.example.com/api/data | jq .

# Use with wget
wget --load-cookies=cookies.txt -O data.json https://app.example.com/api/data

# Use with yt-dlp
yt-dlp --cookies cookies.txt https://app.example.com/video/123

Further Reading

Resource Type Why Recommended
Playwright CLI docs Official Docs Authoritative reference for all CLI commands and flags
Playwright Auth docs Official Docs Comprehensive guide to storageState, setup projects, session reuse
Playwright MCP on GitHub Official Repo Complete tool list, config options, Chrome extension setup
browser-use on GitHub Official Repo Agent API, CLI reference, custom tools, production deployment
Chrome DevTools Protocol Official Spec Complete CDP domain/method reference
chrome-remote-interface Library Node.js CDP wrapper with CLI REPL
Steel Browser Open Source REST API browser service, session management
Browserless Open Source Docker REST browser service
yt-dlp cookies guide Guide Netscape cookie format, browser extension recommendations
puppeteer-extra-stealth Plugin 20+ bot-detection patches for Puppeteer

Generated by /learn from 32 sources. See resources/cli-browser-automation-agents-sources.json for full source metadata.