Learning Guide: CLI-First Browser Automation for AI Agents

Generated: 2026-02-20 Sources: 32 resources analyzed Depth: deep

Prerequisites

Basic familiarity with Node.js npx and/or Python pip/uv
Understanding of what cookies and browser sessions are
Awareness of Chrome DevTools Protocol (CDP) at a conceptual level
A working Node.js 18+ or Python 3.11+ environment

TL;DR

Playwright CLI (npx playwright) is the lowest-friction entry point: codegen, screenshot, pdf, open, and show-trace commands require zero scripting. The --save-storage / --load-storage flags give you a complete auth-handoff pattern in two commands.
Playwright MCP exposes 30+ named tools (browser_navigate, browser_click, browser_take_screenshot, etc.) to any MCP-capable agent host without writing a single line of Playwright API code.
browser-use is the highest-level Python option: it ships a CLI (browser-use open, browser-use click N, browser-use screenshot) and a Python agent API. Designed specifically for LLM agents.
Steel Browser and Browserless expose a REST API (POST /v1/screenshot, /v1/scrape, /v1/pdf) so agents can drive a browser with plain curl or fetch.
For the auth handoff, the canonical pattern is: headed npx playwright codegen --save-storage=auth.json <login-url> → user logs in → agent uses --load-storage=auth.json on all subsequent headless commands.

Core Concepts

1. The Three Automation Layers

Browser automation tools fall into three layers. Understanding which layer you are at tells you how much boilerplate you need.

Layer 1 - CLI/REST verbs (zero boilerplate) You call a binary or HTTP endpoint. No session object, no page object, no awaiting. Examples: npx playwright screenshot, curl http://localhost:3000/v1/screenshot, browser-use click 3.

Layer 2 - MCP tools (agent-native, zero boilerplate) The browser is a running server exposing named tools. Your agent calls browser_navigate({url}) as a tool call, same as any other MCP tool. The session is persistent across calls. No JS or Python required in the agent.

Layer 3 - Library API (full power, more boilerplate) You write Playwright/Puppeteer/Rod scripts. Full control of every event but you must manage the async lifecycle yourself.

For AI agents, Layer 1 and Layer 2 are almost always preferable. Layer 3 is the implementation layer for building Layer 1/2 wrappers.

2. Headed vs Headless

Headed: real browser window appears. Required for user-interactive auth flows.
Headless: no window. Faster, suitable for automated runs after auth is established.

All major tools default to headless or can be switched with a flag.

3. The Auth Handoff Problem

Most interesting pages require login. The canonical pattern for CLI agents:

Trigger a headed browser so the user can see and interact with a real login form.
Capture the resulting session (cookies + localStorage) into a file.
Inject that file into all subsequent headless requests.

This is a one-time human action. The agent then operates fully autonomously from step 3 onward.

4. Playwright storageState JSON Format

npx playwright codegen --save-storage=auth.json writes a JSON file with this structure:

{
  "cookies": [
    {
      "name": "session",
      "value": "abc123...",
      "domain": ".example.com",
      "path": "/",
      "expires": 1771234567.0,
      "httpOnly": true,
      "secure": true,
      "sameSite": "Lax"
    }
  ],
  "origins": [
    {
      "origin": "https://example.com",
      "localStorage": [
        { "name": "auth_token", "value": "eyJ..." }
      ]
    }
  ]
}

This file is directly understood by --load-storage, by Playwright MCP's --storage-state, and can be converted to Netscape cookies.txt for curl/wget/yt-dlp.

5. Netscape cookies.txt Format

The Netscape cookie file format is a 7-field tab-separated text file:

# Netscape HTTP Cookie File
# Generated by browser-automation tool

.example.com	TRUE	/	TRUE	1771234567	session	abc123...
example.com	FALSE	/api	FALSE	0	csrf_token	xyz789

Fields: domain, include_subdomains (TRUE/FALSE), path, https_only (TRUE/FALSE), expires_unix_epoch (0 = session cookie), name, value.

Lines starting with # are comments. Lines starting with #HttpOnly_ indicate HttpOnly cookies.

Used by: curl -b cookies.txt, wget --load-cookies, yt-dlp --cookies, httpx.

Tools Reference

Playwright CLI (`npx playwright`)

Installation: npm install -D playwright or npm install -g playwright

Core commands:

Command	What it does	Key flags
`npx playwright codegen [url]`	Opens headed browser, records interactions to test script	`--save-storage=auth.json`, `-o out.js`, `--target python`
`npx playwright screenshot [url] [file]`	Headless screenshot	`--full-page`, `--load-storage=auth.json`, `-b chromium\|firefox\|webkit`
`npx playwright pdf [url] [file]`	Save page as PDF (Chromium only)	`--paper-format=A4`, `--load-storage=auth.json`
`npx playwright open [url]`	Open headed browser interactively	`--load-storage=auth.json`, `--save-storage=auth.json`
`npx playwright show-trace [file]`	View recorded trace	`--port 9323`

The auth handoff in two commands:

# Step 1: User logs in (headed browser opens, user sees real page)
npx playwright codegen --save-storage=auth.json https://example.com/login
# (user logs in manually, closes browser, auth.json now has cookies)

# Step 2: Agent uses saved session for headless work
npx playwright screenshot --load-storage=auth.json \
  https://example.com/dashboard dashboard.png

npx playwright pdf --load-storage=auth.json \
  https://example.com/report report.pdf

Note on interactivity: npx playwright codegen opens a visible browser and a side panel with generated code. The user can navigate, log in, and then close the window. The --save-storage flag captures state at close. This is the cleanest agent-triggered human-auth pattern available.

Standard options (shared across all commands):

--browser / -b: cr (chromium), ff (firefox), wk (webkit), msedge, chrome
--device: emulate device ("iPhone 13", "Pixel 5")
--viewport-size: "1280,720"
--user-agent, --lang, --timezone, --geolocation
--proxy-server
--ignore-https-errors
--user-data-dir: use persistent Chrome profile with existing logins
--channel: chrome, msedge, chrome-beta

Screenshot example with wait:

npx playwright screenshot \
  --full-page \
  --wait-for-selector=".dashboard-loaded" \
  --load-storage=auth.json \
  https://example.com/dashboard \
  out.png

Playwright MCP (`@playwright/mcp`)

What it is: A Model Context Protocol server that exposes browser automation as ~30 named tools. Any MCP-capable agent host (Claude Desktop, VS Code Copilot, Cursor, Cline, Windsurf) can call these tools without any Playwright code.

Installation (add to MCP client config):

{
  "mcpServers": {
    "playwright": {
      "command": "npx",
      "args": ["@playwright/mcp@latest"]
    }
  }
}

With options (headed browser + persistent auth):

{
  "mcpServers": {
    "playwright": {
      "command": "npx",
      "args": [
        "@playwright/mcp@latest",
        "--browser", "chrome",
        "--user-data-dir", "/home/user/.playwright-agent-profile"
      ]
    }
  }
}

Available MCP tools (complete list):

Core navigation & interaction:

Tool	Description
`browser_navigate`	Navigate to a URL
`browser_navigate_back`	Go back in history
`browser_click`	Click an element (by accessibility label/text/role)
`browser_type`	Type text into a focused field
`browser_fill_form`	Fill multiple form fields at once
`browser_select_option`	Choose from a dropdown
`browser_hover`	Hover over an element
`browser_drag`	Drag and drop between elements
`browser_press_key`	Send keyboard input
`browser_handle_dialog`	Respond to alert/confirm/prompt dialogs
`browser_file_upload`	Upload a file

Page inspection:

Tool	Description
`browser_snapshot`	Get accessibility tree of current page (preferred over screenshot for LLMs)
`browser_take_screenshot`	Capture PNG screenshot
`browser_evaluate`	Run JavaScript and return result
`browser_console_messages`	Get browser console logs
`browser_network_requests`	List all network requests since load
`browser_wait_for`	Wait for text to appear/disappear or timeout

Tab & session management:

Tool	Description
`browser_tabs`	List, create, close, or switch tabs
`browser_resize`	Resize browser window
`browser_close`	Close current page
`browser_install`	Install browser binaries

Vision mode (requires --caps vision):

Tool	Description
`browser_mouse_click_xy`	Click at pixel coordinates
`browser_mouse_move_xy`	Move mouse to coordinates
`browser_mouse_drag_xy`	Drag using pixel coordinates
`browser_mouse_wheel`	Scroll

PDF (requires --caps pdf):

Tool	Description
`browser_pdf_save`	Save current page as PDF

Testing assertions (requires --caps testing):

Tool	Description
`browser_verify_text_visible`	Assert text is present
`browser_verify_element_visible`	Assert element exists
`browser_generate_locator`	Generate stable CSS/Aria selector

How agents use Playwright MCP:

The server runs a persistent headed or headless browser. The agent calls tools sequentially:

agent → browser_navigate({url: "https://example.com/login"})
agent → browser_snapshot()  # read page structure
agent → browser_type({element: "Email field", text: "user@example.com"})
agent → browser_type({element: "Password field", text: "..."})
agent → browser_click({element: "Sign in button"})
agent → browser_snapshot()  # verify login succeeded
agent → browser_navigate({url: "https://example.com/dashboard"})
agent → browser_take_screenshot({filename: "dashboard.png"})

Auth handoff with Playwright MCP:

Option A - Persistent profile (simplest):

"args": ["@playwright/mcp@latest", "--user-data-dir", "/path/to/profile"]

User logs into a normal Chrome window using that profile once. The agent uses that profile forever.

Option B - Storage state file:

"args": ["@playwright/mcp@latest", "--storage-state", "/path/to/auth.json"]

Auth was captured separately (e.g., with npx playwright codegen --save-storage).

Option C - Chrome extension bridge:

"args": ["@playwright/mcp@latest", "--extension"]

The agent connects to your currently running Chrome browser tab. Uses whatever session is already active.

Option D - CDP endpoint (connect to running Chrome):

"args": [
  "@playwright/mcp@latest",
  "--cdp-endpoint", "http://localhost:9222"
]

User launches Chrome with --remote-debugging-port=9222, logs in, agent connects to that live session.

Key insight: browser_snapshot returns the accessibility tree as structured text, not a screenshot. This is far more token-efficient for LLM consumption and does not require a vision model.

browser-use (Python)

What it is: A Python library with a CLI and agent API designed specifically for LLM-driven browser automation. The agent receives a high-level task description and plans/executes browser interactions autonomously.

Installation:

pip install browser-use
# or
uv add browser-use
uvx browser-use install  # downloads Chromium

CLI interface (stateful session persists between commands):

browser-use open https://example.com   # navigate
browser-use state                       # list clickable elements by index
browser-use click 5                     # click element #5
browser-use type "search query"         # type text
browser-use screenshot page.png         # capture screen
browser-use close                       # end session

Agent API (LLM controls browser autonomously):

from browser_use import Agent, Browser, ChatBrowserUse
import asyncio

async def run():
    browser = Browser()
    llm = ChatBrowserUse()  # or use OpenAI, Anthropic, etc.
    agent = Agent(
        task="Log into GitHub, go to my notifications, summarize the top 3",
        llm=llm,
        browser=browser,
    )
    result = await agent.run()
    print(result)

asyncio.run(run())

Auth with real Chrome profile:

from browser_use import Browser, BrowserConfig

browser = Browser(config=BrowserConfig(
    chrome_instance_path="/usr/bin/google-chrome",
    # Uses default Chrome profile with existing logins
))

Custom tools extension:

from browser_use import Agent
from browser_use.browser.context import BrowserContext

@agent.action("Read the current page URL and return it")
async def get_current_url(browser: BrowserContext) -> str:
    page = await browser.get_current_page()
    return page.url

Comparison to Playwright MCP: browser-use is more autonomous - you give it a task and it figures out the steps. Playwright MCP gives you individual tool calls (more control, less autonomy). browser-use requires Python; Playwright MCP is language-agnostic.

puppeteer-extra

What it is: Puppeteer with a plugin system. The key plugin is puppeteer-extra-plugin-stealth which patches ~20 bot-detection signals.

Installation:

npm install puppeteer-extra puppeteer-extra-plugin-stealth

Basic usage (still requires scripting, no CLI wrapper):

const puppeteer = require('puppeteer-extra');
const StealthPlugin = require('puppeteer-extra-plugin-stealth');
puppeteer.use(StealthPlugin());

const browser = await puppeteer.launch({ headless: false });
const page = await browser.newPage();
await page.goto('https://example.com');
await page.screenshot({ path: 'screenshot.png' });
await browser.close();

Key note: There is no standalone puppeteer CLI tool for agents. Puppeteer is a library only. For CLI-driven use, Playwright CLI is the better choice. Puppeteer-extra's main value is stealth for avoiding bot detection.

Comparison to Playwright: Playwright is now generally preferred. Playwright has a built-in CLI, supports 3 browser engines natively, and has a richer ecosystem including MCP. Puppeteer supports Chrome/Firefox only and has no CLI.

Chrome DevTools Protocol (CDP) Direct

What it is: CDP is the underlying wire protocol that Playwright, Puppeteer, and all Chromium-based automation tools use. You can drive Chrome directly via HTTP and WebSocket without any framework.

Launching Chrome with debugging port:

# Headed (user-visible) - good for auth
google-chrome \
  --remote-debugging-port=9222 \
  --user-data-dir=/tmp/chrome-agent \
  https://example.com/login

# Or headless
google-chrome \
  --headless=new \
  --remote-debugging-port=9222 \
  --user-data-dir=/tmp/chrome-agent

HTTP API endpoints (no WebSocket needed for these):

# List tabs
curl http://localhost:9222/json/list

# Create new tab
curl "http://localhost:9222/json/new?https://example.com"

# Close tab
curl "http://localhost:9222/json/close/{targetId}"

# Get browser version
curl http://localhost:9222/json/version

WebSocket CDP commands (for actual page control):

const CDP = require('chrome-remote-interface');

async function captureAuth() {
  const client = await CDP();
  const { Network, Page } = client;

  await Network.enable();
  await Page.enable();
  await Page.navigate({ url: 'https://example.com/login' });
  await Page.loadEventFired();

  // After user logs in (poll or wait), capture cookies
  const { cookies } = await Network.getAllCookies();
  console.log(JSON.stringify(cookies));

  await client.close();
}

CLI REPL with chrome-remote-interface:

npm install -g chrome-remote-interface

# List targets
chrome-remote-interface list

# Open a URL in new tab
chrome-remote-interface new 'https://example.com'

# Interactive REPL (send CDP commands interactively)
chrome-remote-interface inspect
# Then inside REPL:
# > Page.navigate({url: 'https://example.com'})
# > Network.getAllCookies()

Getting cookies via CDP:

# Using websocat + jq (pure CLI, no Node.js needed after browser launch)
WS=$(curl -s http://localhost:9222/json/list | jq -r '.[0].webSocketDebuggerUrl')
echo '{"id":1,"method":"Network.getAllCookies"}' \
  | websocat "$WS" \
  | jq '.result.cookies[]'

CDP verdict for agents: CDP is powerful but verbose. Best used as a foundation layer. The chrome-remote-interface REPL is useful for exploration. For production agent use, Playwright MCP or Playwright CLI are cleaner because they handle the WebSocket protocol, target management, and element selectors automatically.

Browserless (Self-Hosted REST API)

What it is: A Docker service that wraps headless Chrome and exposes a REST API. Agents call HTTP endpoints without managing any browser process.

Run locally:

docker run -p 3000:3000 ghcr.io/browserless/chrome

REST endpoints (all POST with JSON body):

# Screenshot
curl -X POST http://localhost:3000/screenshot \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com", "fullPage": true}' \
  --output out.png

# PDF
curl -X POST http://localhost:3000/pdf \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com"}' \
  --output out.pdf

# HTML content
curl -X POST http://localhost:3000/content \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com"}'

# Execute Puppeteer script
curl -X POST http://localhost:3000/function \
  -H "Content-Type: application/json" \
  -d '{
    "code": "module.exports = async ({page}) => { await page.goto(args.url); return await page.title(); }",
    "context": {"url": "https://example.com"}
  }'

Passing cookies to Browserless:

# Inject cookies in the request body
curl -X POST http://localhost:3000/screenshot \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/dashboard",
    "cookies": [
      {"name": "session", "value": "abc123", "domain": "example.com"}
    ]
  }' --output dashboard.png

Trade-off: Requires Docker. But once running, agents just need curl. No Node.js, no Python. Good for polyglot agents.

Steel Browser (Self-Hosted REST API)

What it is: An open-source browser API service similar to Browserless but with a session-oriented architecture. Good for multi-step authenticated workflows.

Run locally:

# Via npm
npx @steel-dev/steel start
# Or Docker
docker run -p 3000:3000 ghcr.io/steel-dev/steel

REST endpoints:

# Create a session (returns sessionId)
SESSION=$(curl -s -X POST http://localhost:3000/v1/sessions \
  -H "Content-Type: application/json" \
  -d '{"blockAds": true}' | jq -r '.id')

# Screenshot a URL (stateless quick action)
curl -X POST http://localhost:3000/v1/screenshot \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com", "fullPage": true}' \
  --output out.png

# Scrape page content
curl -X POST http://localhost:3000/v1/scrape \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com"}'

# PDF
curl -X POST http://localhost:3000/v1/pdf \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com"}' \
  --output out.pdf

Sessions persist cookies across requests - once you log into a page within a session, all subsequent requests in that session are authenticated.

Connect Playwright to Steel session:

const { chromium } = require('playwright');
const browser = await chromium.connectOverCDP(
  `ws://localhost:3000?sessionId=${sessionId}`
);

The Auth Handoff: Three Patterns

Pattern 1: Playwright CLI (Recommended for CLI Agents)

When to use: Your agent runs from a shell, you want zero framework knowledge required.

# ---- Human does this once ----
# Open headed browser for user login
npx playwright codegen \
  --save-storage=~/.agent/auth/example-auth.json \
  https://example.com/login
# [Browser opens, user logs in, browser closes, auth.json written]

# ---- Agent does this autonomously ----
npx playwright screenshot \
  --load-storage=~/.agent/auth/example-auth.json \
  https://example.com/dashboard \
  /tmp/dashboard.png

# Agent can also generate a full-page PDF
npx playwright pdf \
  --load-storage=~/.agent/auth/example-auth.json \
  https://example.com/report \
  /tmp/report.pdf

No Playwright code written. No async/await. Just CLI commands.

Pattern 2: Playwright MCP with Chrome Extension (Recommended for MCP Agents)

When to use: Your agent is running inside an MCP host and you want to connect to the user's real logged-in browser.

{
  "mcpServers": {
    "playwright": {
      "command": "npx",
      "args": ["@playwright/mcp@latest", "--extension"]
    }
  }
}

User installs "Playwright MCP Bridge" Chrome extension.
User is already logged into sites in their normal Chrome.
Agent calls browser_navigate / browser_snapshot / browser_click directly on those tabs.
No auth file needed - the user's live session is used.

Pattern 3: CDP + Chrome --remote-debugging-port

When to use: You want the most direct control, or you're already running Chrome elsewhere.

# User launches Chrome with debugging enabled
google-chrome \
  --remote-debugging-port=9222 \
  --user-data-dir=$HOME/.agent-chrome-profile \
  https://example.com/login

# User logs in normally.

# Agent now connects and captures cookies
node -e "
const CDP = require('chrome-remote-interface');
CDP(async (client) => {
  await client.Network.enable();
  const {cookies} = await client.Network.getAllCookies();
  const fs = require('fs');
  // Convert to Playwright storageState format
  fs.writeFileSync('auth.json', JSON.stringify({cookies, origins: []}, null, 2));
  await client.close();
});
"

Then use auth.json with npx playwright screenshot --load-storage=auth.json ....

Converting Between Cookie Formats

Playwright storageState → Netscape cookies.txt

Useful when you want to use the captured session with curl, wget, or yt-dlp.

import json, sys
from datetime import datetime

auth = json.load(open('auth.json'))
print("# Netscape HTTP Cookie File")
for c in auth.get('cookies', []):
    domain = c['domain']
    include_subdomains = 'TRUE' if domain.startswith('.') else 'FALSE'
    path = c.get('path', '/')
    https_only = 'TRUE' if c.get('secure', False) else 'FALSE'
    expires = int(c.get('expires', 0)) if c.get('expires', -1) != -1 else 0
    name = c['name']
    value = c['value']
    print(f"{domain}\t{include_subdomains}\t{path}\t{https_only}\t{expires}\t{name}\t{value}")

python3 convert.py > cookies.txt
curl -b cookies.txt https://example.com/api/data
wget --load-cookies=cookies.txt https://example.com/api/data
yt-dlp --cookies cookies.txt https://example.com/video

Netscape cookies.txt → Playwright storageState

import json, time

def netscape_to_playwright(cookies_file):
    cookies = []
    with open(cookies_file) as f:
        for line in f:
            line = line.strip()
            if not line or line.startswith('#'):
                continue
            parts = line.split('\t')
            if len(parts) != 7:
                continue
            domain, incl_sub, path, https_only, expires, name, value = parts
            cookies.append({
                'name': name,
                'value': value,
                'domain': domain,
                'path': path,
                'expires': float(expires) if expires and expires != '0' else -1,
                'httpOnly': False,
                'secure': https_only == 'TRUE',
                'sameSite': 'None'
            })
    return {'cookies': cookies, 'origins': []}

state = netscape_to_playwright('cookies.txt')
json.dump(state, open('auth.json', 'w'), indent=2)

Comparison Table

Tool	Interface	Auth Handoff	Boilerplate	Best For
Playwright CLI	Shell commands	`--save-storage` / `--load-storage`	Zero	CLI agents, shell scripts
Playwright MCP	MCP tool calls	`--storage-state`, `--extension`, `--cdp-endpoint`	Zero	MCP agent hosts (Claude, Cursor, etc.)
browser-use	Python + CLI	Chrome profile reuse	Low (Python)	Autonomous task agents (Python)
Chrome CDP direct	WebSocket + HTTP	Manual cookie capture	Medium (JS)	Fine-grained control, low-level
chrome-remote-interface	CLI REPL + JS	`Network.getAllCookies()`	Low-medium	Exploration, scripting
Browserless	REST API (curl)	Cookie injection in JSON body	Zero (needs Docker)	Polyglot agents, Docker-friendly
Steel Browser	REST API (curl)	Session-scoped cookie persistence	Zero (needs Docker/npx)	Multi-step auth workflows
puppeteer-extra	JS library	Manual scripting	High	Bot-detection avoidance

Common Pitfalls

Pitfall	Why It Happens	How to Avoid
Capturing auth.json but cookies expire	Session cookies have short TTL	Check `expires` field; re-capture if expired. Use `--user-data-dir` for persistent profile instead.
Playwright PDF not working	PDF command only works with Chromium	Always pass `-b chromium` or `--channel chrome` for PDF
Screenshot captures login page, not dashboard	Session not loaded	Always pass `--load-storage=auth.json`
Browser bot-detection blocking	Playwright leaves fingerprints	Use `--channel chrome` (real Chrome binary) instead of Chromium. Or use puppeteer-extra-stealth.
MCP tools using accessibility tree but page has poor ARIA	Site has no semantic markup	Fall back to `browser_take_screenshot` + vision, or use `browser_evaluate` for DOM queries
CDP WebSocket closes on page navigation	WebSocket is per-target	Re-attach after navigation using Target events
Netscape cookies.txt parse error	Wrong line endings (CRLF vs LF)	Normalize to LF on Unix: `sed -i 's/\r//' cookies.txt`
`browser-use` agent gets stuck in loop	LLM hallucinating element states	Set `max_steps` limit; use `browser-use state` to inspect actual element indices
auth.json committed to git	Forgot to gitignore	Add `*.auth.json`, `auth/`, `.auth/` to `.gitignore`

Best Practices

Store auth files outside the repo — use ~/.agent/auth/{service}-auth.json or environment-relative paths. Never commit session files. (Multiple sources)
Prefer --user-data-dir over --save-storage for long-running agents — user data directories persist across browser restarts, handle refresh tokens, and work for sites that rotate session cookies. (Playwright MCP docs)
Use browser_snapshot over screenshots for text extraction — the accessibility tree is ~10x more token-efficient than describing a screenshot and does not require a vision model. (Playwright MCP README)
Use --channel chrome (real Chrome) when bot detection is an issue — websites fingerprint Chrome vs Chromium. The real Chrome binary passes more checks. (Playwright docs, chrome-for-testing)
Separate the headed auth step from the headless work step — document these as two distinct phases in your agent code. This makes re-authentication easy when sessions expire. (browser-use docs)
For multi-step workflows, use session-based tools — Steel Browser sessions and Playwright MCP's persistent browser maintain cookie state across page navigations automatically. One-shot REST calls lose state. (Steel Browser docs)
Test for element visibility before interaction — use --wait-for-selector (CLI) or browser_wait_for (MCP) to avoid flaky automation on dynamic pages. (Playwright CLI docs)
Validate the captured auth immediately — after --save-storage, run one screenshot with --load-storage and check it shows the logged-in state before using the auth file in production. (Playwright docs)

Code Examples

Complete Shell-Only Auth Handoff

#!/bin/bash
# auth-handoff.sh - Agent auth handoff using only Playwright CLI

AUTH_FILE="$HOME/.agent/auth/myapp-auth.json"
BASE_URL="https://myapp.example.com"

# Phase 1: Human auth (run once, or when session expires)
capture_auth() {
  mkdir -p "$(dirname "$AUTH_FILE")"
  echo "Opening browser for login..."
  npx playwright codegen \
    --save-storage="$AUTH_FILE" \
    "$BASE_URL/login"
  echo "Auth captured: $AUTH_FILE"
}

# Phase 2: Agent uses auth headlessly
take_screenshot() {
  local url="$1"
  local out="$2"
  npx playwright screenshot \
    --load-storage="$AUTH_FILE" \
    --full-page \
    "$url" "$out"
}

save_pdf() {
  local url="$1"
  local out="$2"
  npx playwright pdf \
    --load-storage="$AUTH_FILE" \
    -b chromium \
    "$url" "$out"
}

# If auth file is missing or stale, capture it
if [ ! -f "$AUTH_FILE" ]; then
  capture_auth
fi

# Agent work
take_screenshot "$BASE_URL/dashboard" /tmp/dashboard.png
save_pdf "$BASE_URL/report/monthly" /tmp/monthly-report.pdf

Playwright MCP Agent Workflow (Conceptual)

When an MCP agent wants to do browser work:

# Agent internal monologue:
# 1. Check if page is accessible
tool_call: browser_navigate({url: "https://app.example.com/dashboard"})
tool_call: browser_snapshot()
# → Returns accessibility tree; if login wall detected, trigger auth flow

# 2. If login needed (persistent profile approach):
# Agent tells user: "Please log into the browser window that just opened"
# (Browser was started with --user-data-dir, user's existing login may already work)

# 3. Once authenticated, proceed
tool_call: browser_snapshot()  # verify dashboard loaded
tool_call: browser_evaluate({expression: "document.title"})  # extract data
tool_call: browser_take_screenshot({filename: "/tmp/dashboard.png"})

Python Agent with browser-use + Cookie Export

import asyncio, json
from browser_use import Agent, Browser, BrowserConfig, ChatBrowserUse

async def authenticated_scrape():
    # Option A: Use existing Chrome profile (simplest for auth)
    browser = Browser(config=BrowserConfig(
        chrome_instance_path="/usr/bin/google-chrome",
        headless=False,
    ))

    # Option B: Use previously saved Playwright storageState
    # browser = Browser(config=BrowserConfig(storage_state="auth.json"))

    llm = ChatBrowserUse()
    agent = Agent(
        task="""
        Go to https://app.example.com/reports.
        Find the most recent report dated this month.
        Download it or return its URL.
        """,
        llm=llm,
        browser=browser,
        max_steps=20,
    )

    result = await agent.run()
    print(result)
    await browser.close()

asyncio.run(authenticated_scrape())

curl with Cookies from Playwright Auth

# After capturing auth.json with playwright codegen --save-storage

# Quick Python converter (inline)
python3 -c "
import json, sys
data = json.load(open('auth.json'))
print('# Netscape HTTP Cookie File')
for c in data.get('cookies', []):
    dom = c['domain']
    sub = 'TRUE' if dom.startswith('.') else 'FALSE'
    sec = 'TRUE' if c.get('secure') else 'FALSE'
    exp = int(c.get('expires', 0)) if c.get('expires', -1) > 0 else 0
    print(f\"{dom}\t{sub}\t{c['path']}\t{sec}\t{exp}\t{c['name']}\t{c['value']}\")
" > cookies.txt

# Use with curl
curl -b cookies.txt https://app.example.com/api/data | jq .

# Use with wget
wget --load-cookies=cookies.txt -O data.json https://app.example.com/api/data

# Use with yt-dlp
yt-dlp --cookies cookies.txt https://app.example.com/video/123

Resource	Type	Why Recommended
Playwright CLI docs	Official Docs	Authoritative reference for all CLI commands and flags
Playwright Auth docs	Official Docs	Comprehensive guide to storageState, setup projects, session reuse
Playwright MCP on GitHub	Official Repo	Complete tool list, config options, Chrome extension setup
browser-use on GitHub	Official Repo	Agent API, CLI reference, custom tools, production deployment
Chrome DevTools Protocol	Official Spec	Complete CDP domain/method reference
chrome-remote-interface	Library	Node.js CDP wrapper with CLI REPL
Steel Browser	Open Source	REST API browser service, session management
Browserless	Open Source	Docker REST browser service
yt-dlp cookies guide	Guide	Netscape cookie format, browser extension recommendations
puppeteer-extra-stealth	Plugin	20+ bot-detection patches for Puppeteer

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Learning Guide: CLI-First Browser Automation for AI Agents

Prerequisites

TL;DR

Core Concepts

1. The Three Automation Layers

2. Headed vs Headless

3. The Auth Handoff Problem

4. Playwright storageState JSON Format

5. Netscape cookies.txt Format

Tools Reference

Playwright CLI (`npx playwright`)

Playwright MCP (`@playwright/mcp`)

browser-use (Python)

puppeteer-extra

Chrome DevTools Protocol (CDP) Direct

Browserless (Self-Hosted REST API)

Steel Browser (Self-Hosted REST API)

The Auth Handoff: Three Patterns

Pattern 1: Playwright CLI (Recommended for CLI Agents)

Pattern 2: Playwright MCP with Chrome Extension (Recommended for MCP Agents)

Pattern 3: CDP + Chrome --remote-debugging-port

Converting Between Cookie Formats

Playwright storageState → Netscape cookies.txt

Netscape cookies.txt → Playwright storageState

Comparison Table

Common Pitfalls

Best Practices

Code Examples

Complete Shell-Only Auth Handoff

Playwright MCP Agent Workflow (Conceptual)

Python Agent with browser-use + Cookie Export

curl with Cookies from Playwright Auth

Further Reading

FilesExpand file tree

cli-browser-automation-agents.md

Latest commit

History

cli-browser-automation-agents.md

File metadata and controls

Learning Guide: CLI-First Browser Automation for AI Agents

Prerequisites

TL;DR

Core Concepts

1. The Three Automation Layers

2. Headed vs Headless

3. The Auth Handoff Problem

4. Playwright storageState JSON Format

5. Netscape cookies.txt Format

Tools Reference

Playwright CLI (npx playwright)

Playwright MCP (@playwright/mcp)

browser-use (Python)

puppeteer-extra

Chrome DevTools Protocol (CDP) Direct

Browserless (Self-Hosted REST API)

Steel Browser (Self-Hosted REST API)

The Auth Handoff: Three Patterns

Pattern 1: Playwright CLI (Recommended for CLI Agents)

Pattern 2: Playwright MCP with Chrome Extension (Recommended for MCP Agents)

Pattern 3: CDP + Chrome --remote-debugging-port

Converting Between Cookie Formats

Playwright storageState → Netscape cookies.txt

Netscape cookies.txt → Playwright storageState

Comparison Table

Common Pitfalls

Best Practices

Code Examples

Complete Shell-Only Auth Handoff

Playwright MCP Agent Workflow (Conceptual)

Python Agent with browser-use + Cookie Export

curl with Cookies from Playwright Auth

Further Reading

Playwright CLI (`npx playwright`)

Playwright MCP (`@playwright/mcp`)