Skip to content

Latest commit

 

History

History
1064 lines (855 loc) · 34.2 KB

File metadata and controls

1064 lines (855 loc) · 34.2 KB
layout default
title Browser Use Tutorial - Chapter 8: Production Deployment
nav_order 8
has_children false
parent Browser Use Tutorial

Chapter 8: Production Deployment - Scaling, Reliability, and Best Practices

Welcome to Chapter 8: Production Deployment - Scaling, Reliability, and Best Practices. In this part of Browser Use Tutorial: AI-Powered Web Automation Agents, you will build an intuitive mental model first, then move into concrete implementation details and practical production tradeoffs.

Deploy Browser Use agents to production with enterprise-grade reliability, scaling, and operational best practices.

Overview

Production deployment of browser automation requires careful consideration of scalability, reliability, monitoring, and security. This chapter covers production-ready deployment patterns for Browser Use agents.

Production Architecture

Scalable Deployment Patterns

# docker-compose.prod.yml - Production deployment
version: '3.8'

services:
  browser-use-api:
    build:
      context: .
      dockerfile: Dockerfile.prod
    container_name: browser-use-prod
    restart: unless-stopped
    ports:
      - "8000:8000"
    environment:
      - NODE_ENV=production
      - REDIS_URL=redis://redis:6379
      - DATABASE_URL=postgresql://user:password@postgres:5432/browseruse
      - BROWSER_HEADLESS=true
      - BROWSER_CONCURRENT_LIMIT=5
      - API_RATE_LIMIT=100
      - JWT_SECRET=${JWT_SECRET}
    volumes:
      - browser_data:/app/data
      - ./ssl:/app/ssl:ro
    networks:
      - browser_network
    depends_on:
      - redis
      - postgres
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
      interval: 30s
      timeout: 10s
      retries: 3

  redis:
    image: redis:7-alpine
    container_name: redis-prod
    restart: unless-stopped
    volumes:
      - redis_data:/data
    networks:
      - browser_network
    command: redis-server --appendonly yes

  postgres:
    image: postgres:15-alpine
    container_name: postgres-prod
    restart: unless-stopped
    environment:
      POSTGRES_DB: browseruse
      POSTGRES_USER: browseruser
      POSTGRES_PASSWORD: ${DB_PASSWORD}
    volumes:
      - postgres_data:/var/lib/postgresql/data
    networks:
      - browser_network
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U browseruser -d browseruse"]
      interval: 30s
      timeout: 10s
      retries: 3

  nginx:
    image: nginx:alpine
    container_name: nginx-prod
    restart: unless-stopped
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf:ro
      - ./ssl:/etc/ssl/certs:ro
    networks:
      - browser_network
    depends_on:
      - browser-use-api

volumes:
  browser_data:
  redis_data:
  postgres_data:

networks:
  browser_network:
    driver: bridge

Horizontal Scaling

# Horizontal scaling with load balancer
services:
  browser-use-api-1:
    # ... configuration
    environment:
      - INSTANCE_ID=1
    deploy:
      replicas: 3

  browser-use-api-2:
    # ... configuration
    environment:
      - INSTANCE_ID=2
    deploy:
      replicas: 3

  browser-use-api-3:
    # ... configuration
    environment:
      - INSTANCE_ID=3
    deploy:
      replicas: 3

  load-balancer:
    image: nginx:alpine
    volumes:
      - ./load-balancer.conf:/etc/nginx/nginx.conf:ro
    ports:
      - "80:80"
    depends_on:
      - browser-use-api-1
      - browser-use-api-2
      - browser-use-api-3

Reliability and Fault Tolerance

Circuit Breaker Pattern

# circuit_breaker.py
import asyncio
import time
from enum import Enum
from typing import Callable, Any

class CircuitBreakerState(Enum):
    CLOSED = "closed"
    OPEN = "open"
    HALF_OPEN = "half_open"

class CircuitBreaker:
    def __init__(self, failure_threshold=5, recovery_timeout=60, success_threshold=3):
        self.failure_threshold = failure_threshold
        self.recovery_timeout = recovery_timeout
        self.success_threshold = success_threshold

        self.state = CircuitBreakerState.CLOSED
        self.failure_count = 0
        self.success_count = 0
        self.last_failure_time = None

    async def call(self, func: Callable, *args, **kwargs) -> Any:
        if self.state == CircuitBreakerState.OPEN:
            if time.time() - self.last_failure_time > self.recovery_timeout:
                self.state = CircuitBreakerState.HALF_OPEN
                self.success_count = 0
            else:
                raise Exception("Circuit breaker is OPEN")

        try:
            result = await func(*args, **kwargs)

            if self.state == CircuitBreakerState.HALF_OPEN:
                self.success_count += 1
                if self.success_count >= self.success_threshold:
                    self.state = CircuitBreakerState.CLOSED
                    self.failure_count = 0

            return result

        except Exception as e:
            self.failure_count += 1
            self.last_failure_time = time.time()

            if self.failure_count >= self.failure_threshold:
                self.state = CircuitBreakerState.OPEN

            raise e

# Usage in browser automation
circuit_breaker = CircuitBreaker()

async def safe_browser_operation(operation_func, *args, **kwargs):
    """Execute browser operation with circuit breaker protection"""
    return await circuit_breaker.call(operation_func, *args, **kwargs)

Retry and Backoff Strategies

# retry_strategies.py
import asyncio
import random
import time
from typing import Callable, Any, Optional

class RetryStrategy:
    def __init__(self, max_attempts=3, base_delay=1.0, max_delay=60.0, backoff_factor=2.0, jitter=True):
        self.max_attempts = max_attempts
        self.base_delay = base_delay
        self.max_delay = max_delay
        self.backoff_factor = backoff_factor
        self.jitter = jitter

    async def execute(self, func: Callable, *args, **kwargs) -> Any:
        last_exception = None

        for attempt in range(self.max_attempts):
            try:
                return await func(*args, **kwargs)
            except Exception as e:
                last_exception = e

                if attempt < self.max_attempts - 1:  # Not the last attempt
                    delay = self._calculate_delay(attempt)
                    print(f"Attempt {attempt + 1} failed, retrying in {delay:.2f} seconds...")
                    await asyncio.sleep(delay)

        raise last_exception

    def _calculate_delay(self, attempt: int) -> float:
        """Calculate delay with exponential backoff and optional jitter"""
        delay = self.base_delay * (self.backoff_factor ** attempt)
        delay = min(delay, self.max_delay)

        if self.jitter:
            # Add random jitter (±25%)
            jitter_range = delay * 0.25
            delay += random.uniform(-jitter_range, jitter_range)

        return max(0, delay)

# Browser-specific retry strategy
browser_retry = RetryStrategy(
    max_attempts=5,
    base_delay=2.0,
    max_delay=30.0,
    backoff_factor=1.5
)

async def reliable_browser_operation(operation_func, *args, **kwargs):
    """Execute browser operation with intelligent retry"""
    return await browser_retry.execute(operation_func, *args, **kwargs)

Graceful Degradation

# graceful_degradation.py
import asyncio
from typing import Dict, Any, Optional

class DegradationManager:
    def __init__(self):
        self.degradation_levels = {
            "full": {"vision": True, "custom_actions": True, "multi_tab": True},
            "standard": {"vision": False, "custom_actions": True, "multi_tab": True},
            "basic": {"vision": False, "custom_actions": False, "multi_tab": False},
            "minimal": {"vision": False, "custom_actions": False, "multi_tab": False, "llm": "simple"}
        }
        self.current_level = "full"

    def assess_system_health(self) -> str:
        """Assess system health and determine appropriate degradation level"""
        # Check various system metrics
        memory_usage = self._get_memory_usage()
        cpu_usage = self._get_cpu_usage()
        error_rate = self._get_error_rate()

        if error_rate > 0.5 or memory_usage > 0.9:
            return "minimal"
        elif error_rate > 0.2 or cpu_usage > 0.8:
            return "basic"
        elif error_rate > 0.1 or memory_usage > 0.7:
            return "standard"
        else:
            return "full"

    def apply_degradation(self, level: str) -> Dict[str, Any]:
        """Apply degradation settings"""
        if level in self.degradation_levels:
            self.current_level = level
            return self.degradation_levels[level]
        return self.degradation_levels["full"]

    async def execute_with_degradation(self, operation_func, *args, **kwargs):
        """Execute operation with appropriate degradation level"""
        level = self.assess_system_health()
        settings = self.apply_degradation(level)

        # Modify operation based on degradation settings
        if settings.get("vision") == False:
            kwargs["use_vision"] = False

        if settings.get("llm") == "simple":
            kwargs["model"] = "gpt-3.5-turbo"

        try:
            return await operation_func(*args, **kwargs)
        except Exception as e:
            if level != "minimal":
                # Try with more aggressive degradation
                minimal_settings = self.apply_degradation("minimal")
                kwargs.update(minimal_settings)
                return await operation_func(*args, **kwargs)
            else:
                raise e

    def _get_memory_usage(self) -> float:
        """Get current memory usage (0.0-1.0)"""
        # Implementation would check actual system memory
        return 0.6  # Mock value

    def _get_cpu_usage(self) -> float:
        """Get current CPU usage (0.0-1.0)"""
        # Implementation would check actual CPU usage
        return 0.4  # Mock value

    def _get_error_rate(self) -> float:
        """Get current error rate (0.0-1.0)"""
        # Implementation would check recent error metrics
        return 0.05  # Mock value

# Global degradation manager
degradation_manager = DegradationManager()

async def resilient_browser_operation(operation_func, *args, **kwargs):
    """Execute browser operation with graceful degradation"""
    return await degradation_manager.execute_with_degradation(operation_func, *args, **kwargs)

Monitoring and Observability

Comprehensive Metrics

# metrics_collector.py
import time
import psutil
from typing import Dict, Any
from collections import defaultdict

class MetricsCollector:
    def __init__(self):
        self.metrics = defaultdict(list)
        self.start_time = time.time()

    def record_operation(self, operation_name: str, duration: float, success: bool, metadata: Dict[str, Any] = None):
        """Record operation metrics"""
        self.metrics["operations"].append({
            "name": operation_name,
            "duration": duration,
            "success": success,
            "timestamp": time.time(),
            "metadata": metadata or {}
        })

    def record_browser_metrics(self, browser_info: Dict[str, Any]):
        """Record browser-specific metrics"""
        self.metrics["browser"].append({
            "timestamp": time.time(),
            "pages": browser_info.get("pages", 0),
            "memory_mb": browser_info.get("memory_mb", 0),
            "cpu_percent": browser_info.get("cpu_percent", 0)
        })

    def record_error(self, error_type: str, error_message: str, context: Dict[str, Any] = None):
        """Record error metrics"""
        self.metrics["errors"].append({
            "type": error_type,
            "message": error_message,
            "timestamp": time.time(),
            "context": context or {}
        })

    def get_summary(self) -> Dict[str, Any]:
        """Get metrics summary"""
        total_ops = len(self.metrics["operations"])
        successful_ops = len([op for op in self.metrics["operations"] if op["success"]])

        return {
            "uptime_seconds": time.time() - self.start_time,
            "total_operations": total_ops,
            "successful_operations": successful_ops,
            "success_rate": successful_ops / total_ops if total_ops > 0 else 0,
            "error_count": len(self.metrics["errors"]),
            "avg_operation_time": self._calculate_avg_operation_time(),
            "system_resources": self._get_system_resources()
        }

    def _calculate_avg_operation_time(self) -> float:
        """Calculate average operation time"""
        operations = self.metrics["operations"]
        if not operations:
            return 0.0

        total_time = sum(op["duration"] for op in operations)
        return total_time / len(operations)

    def _get_system_resources(self) -> Dict[str, float]:
        """Get current system resource usage"""
        return {
            "cpu_percent": psutil.cpu_percent(interval=1),
            "memory_percent": psutil.virtual_memory().percent,
            "disk_usage_percent": psutil.disk_usage('/').percent
        }

# Global metrics collector
metrics_collector = MetricsCollector()

Health Checks and Alerts

# health_monitor.py
import asyncio
import aiohttp
from typing import Dict, Any, List

class HealthMonitor:
    def __init__(self):
        self.checks = []
        self.alerts = []

    def add_health_check(self, name: str, check_func, interval_seconds: int = 60):
        """Add a health check"""
        self.checks.append({
            "name": name,
            "func": check_func,
            "interval": interval_seconds,
            "last_run": 0,
            "last_result": None
        })

    async def run_health_checks(self):
        """Run all health checks"""
        current_time = time.time()

        for check in self.checks:
            if current_time - check["last_run"] >= check["interval"]:
                try:
                    result = await check["func"]()
                    check["last_result"] = result
                    check["last_run"] = current_time

                    if not result["healthy"]:
                        await self._trigger_alert(check["name"], result)

                except Exception as e:
                    check["last_result"] = {
                        "healthy": False,
                        "message": f"Check failed: {str(e)}"
                    }
                    await self._trigger_alert(check["name"], check["last_result"])

    async def _trigger_alert(self, check_name: str, result: Dict[str, Any]):
        """Trigger alert for failed health check"""
        alert = {
            "check": check_name,
            "severity": "error",
            "message": result.get("message", "Health check failed"),
            "timestamp": time.time()
        }

        self.alerts.append(alert)

        # Send alert (email, Slack, etc.)
        await self._send_alert(alert)

    async def _send_alert(self, alert: Dict[str, Any]):
        """Send alert to monitoring system"""
        # Implementation would send to monitoring service
        print(f"ALERT: {alert['check']} - {alert['message']}")

    def get_health_status(self) -> Dict[str, Any]:
        """Get overall health status"""
        healthy_checks = sum(1 for check in self.checks
                           if check["last_result"] and check["last_result"]["healthy"])

        return {
            "overall_healthy": healthy_checks == len(self.checks),
            "total_checks": len(self.checks),
            "healthy_checks": healthy_checks,
            "failed_checks": len(self.checks) - healthy_checks,
            "checks": [
                {
                    "name": check["name"],
                    "healthy": check["last_result"]["healthy"] if check["last_result"] else False,
                    "message": check["last_result"]["message"] if check["last_result"] else "Not run yet"
                }
                for check in self.checks
            ]
        }

# Health check functions
async def check_database() -> Dict[str, Any]:
    """Check database connectivity"""
    try:
        # Database connection check
        return {"healthy": True, "message": "Database connection OK"}
    except Exception as e:
        return {"healthy": False, "message": f"Database error: {str(e)}"}

async def check_browser() -> Dict[str, Any]:
    """Check browser functionality"""
    try:
        # Browser launch and basic operation check
        return {"healthy": True, "message": "Browser functionality OK"}
    except Exception as e:
        return {"healthy": False, "message": f"Browser error: {str(e)}"}

async def check_external_services() -> Dict[str, Any]:
    """Check external service dependencies"""
    try:
        # Check LLM API, etc.
        return {"healthy": True, "message": "External services OK"}
    except Exception as e:
        return {"healthy": False, "message": f"External service error: {str(e)}"}

# Initialize health monitor
health_monitor = HealthMonitor()
health_monitor.add_health_check("database", check_database, 30)
health_monitor.add_health_check("browser", check_browser, 60)
health_monitor.add_health_check("external_services", check_external_services, 120)

Security Hardening

Access Control

# access_control.py
import jwt
import bcrypt
from datetime import datetime, timedelta
from typing import Dict, Any, Optional

class AccessController:
    def __init__(self, jwt_secret: str):
        self.jwt_secret = jwt_secret
        self.users = {}  # In production, use proper database
        self.roles = {
            "admin": ["read", "write", "delete", "admin"],
            "user": ["read", "write"],
            "viewer": ["read"]
        }

    def hash_password(self, password: str) -> str:
        """Hash password securely"""
        return bcrypt.hashpw(password.encode(), bcrypt.gensalt()).decode()

    def verify_password(self, password: str, hashed: str) -> bool:
        """Verify password against hash"""
        return bcrypt.checkpw(password.encode(), hashed.encode())

    def create_token(self, user_id: str, role: str) -> str:
        """Create JWT token"""
        payload = {
            "user_id": user_id,
            "role": role,
            "permissions": self.roles.get(role, []),
            "exp": datetime.utcnow() + timedelta(hours=24),
            "iat": datetime.utcnow()
        }
        return jwt.encode(payload, self.jwt_secret, algorithm="HS256")

    def verify_token(self, token: str) -> Optional[Dict[str, Any]]:
        """Verify and decode JWT token"""
        try:
            payload = jwt.decode(token, self.jwt_secret, algorithms=["HS256"])
            return payload
        except jwt.ExpiredSignatureError:
            return None
        except jwt.InvalidTokenError:
            return None

    def check_permission(self, token: str, required_permission: str) -> bool:
        """Check if token has required permission"""
        payload = self.verify_token(token)
        if not payload:
            return False

        permissions = payload.get("permissions", [])
        return required_permission in permissions

    def rate_limit_check(self, user_id: str, action: str) -> bool:
        """Check rate limits for user actions"""
        # Implementation would track user actions and enforce limits
        # Return True if within limits, False if exceeded
        return True

# Global access controller
access_controller = AccessController(jwt_secret="your-secure-jwt-secret")

Input Validation and Sanitization

# input_validation.py
import re
from typing import Dict, Any, Optional
from urllib.parse import urlparse

class InputValidator:
    def __init__(self):
        self.url_pattern = re.compile(
            r'^https?://'  # http:// or https://
            r'(?:(?:[A-Z0-9](?:[A-Z0-9-]{0,61}[A-Z0-9])?\.)+[A-Z]{2,6}\.?|'  # domain...
            r'localhost|'  # localhost...
            r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})'  # ...or ip
            r'(?::\d+)?'  # optional port
            r'(?:/?|[/?]\S+)$', re.IGNORECASE)

        self.safe_domains = [
            "google.com", "github.com", "stackoverflow.com",
            "wikipedia.org", "httpbin.org", "example.com"
        ]

    def validate_url(self, url: str) -> bool:
        """Validate URL format and safety"""
        if not self.url_pattern.match(url):
            return False

        parsed = urlparse(url)
        domain = parsed.netloc.lower()

        # Check against safe domains (whitelist approach)
        for safe_domain in self.safe_domains:
            if safe_domain in domain:
                return True

        # Additional checks for suspicious patterns
        if any(suspicious in domain for suspicious in ['localhost', '127.0.0.1', '0.0.0.0']):
            return False

        return True

    def sanitize_text(self, text: str, max_length: int = 10000) -> str:
        """Sanitize text input"""
        if not text:
            return ""

        # Limit length
        text = text[:max_length]

        # Remove potentially dangerous characters
        text = re.sub(r'[<>\"\'`]', '', text)

        # Normalize whitespace
        text = ' '.join(text.split())

        return text.strip()

    def validate_agent_task(self, task: str) -> Optional[str]:
        """Validate agent task for safety"""
        if not task or len(task) > 5000:
            return None

        # Check for potentially dangerous commands
        dangerous_patterns = [
            r'rm\s+-rf\s+/',  # Dangerous file operations
            r'sudo\s+',       # Privilege escalation
            r'curl.*\|.*sh',  # Pipe to shell
            r'wget.*\|.*sh',  # Pipe to shell
        ]

        for pattern in dangerous_patterns:
            if re.search(pattern, task, re.IGNORECASE):
                return None

        return self.sanitize_text(task)

    def validate_custom_action(self, action_data: Dict[str, Any]) -> bool:
        """Validate custom action data"""
        required_fields = ["name", "description"]

        for field in required_fields:
            if field not in action_data:
                return False

            if not isinstance(action_data[field], str) or len(action_data[field]) > 200:
                return False

        # Validate action name (alphanumeric + underscore)
        if not re.match(r'^[a-zA-Z_][a-zA-Z0-9_]*$', action_data["name"]):
            return False

        return True

# Global input validator
input_validator = InputValidator()

Performance Optimization

Resource Pooling

# resource_pool.py
import asyncio
from typing import List, Any, Optional
from contextlib import asynccontextmanager

class ResourcePool:
    def __init__(self, factory, max_size=10, min_size=2):
        self.factory = factory
        self.max_size = max_size
        self.min_size = min_size
        self.pool: List[Any] = []
        self.in_use: List[Any] = []
        self._lock = asyncio.Lock()

    async def initialize(self):
        """Initialize minimum pool size"""
        for _ in range(self.min_size):
            resource = await self.factory()
            self.pool.append(resource)

    async def acquire(self) -> Any:
        """Acquire resource from pool"""
        async with self._lock:
            if self.pool:
                resource = self.pool.pop()
                self.in_use.append(resource)
                return resource
            elif len(self.in_use) < self.max_size:
                resource = await self.factory()
                self.in_use.append(resource)
                return resource
            else:
                # Wait for resource to become available
                raise Exception("Resource pool exhausted")

    async def release(self, resource: Any):
        """Release resource back to pool"""
        async with self._lock:
            if resource in self.in_use:
                self.in_use.remove(resource)
                if len(self.pool) < self.max_size:
                    self.pool.append(resource)
                else:
                    # Destroy resource if pool is full
                    await self._destroy_resource(resource)

    async def _destroy_resource(self, resource: Any):
        """Destroy resource (browser cleanup, etc.)"""
        # Implementation depends on resource type
        pass

    @asynccontextmanager
    async def resource_context(self):
        """Context manager for resource usage"""
        resource = await self.acquire()
        try:
            yield resource
        finally:
            await self.release(resource)

# Browser pool example
async def create_browser():
    """Factory function for browser creation"""
    # Browser creation logic
    return browser_instance

browser_pool = ResourcePool(create_browser, max_size=5, min_size=1)

# Usage
async def perform_browser_task(task):
    async with browser_pool.resource_context() as browser:
        # Use browser for task
        result = await run_agent_with_browser(browser, task)
        return result

Caching Strategies

# caching_layer.py
import asyncio
import json
from typing import Any, Optional, Dict
import hashlib

class CacheLayer:
    def __init__(self, redis_client=None, ttl_seconds=3600):
        self.redis = redis_client
        self.ttl = ttl_seconds
        self.memory_cache: Dict[str, Dict[str, Any]] = {}

    def _generate_key(self, data: Any) -> str:
        """Generate cache key from data"""
        data_str = json.dumps(data, sort_keys=True)
        return hashlib.md5(data_str.encode()).hexdigest()

    async def get(self, key: str) -> Optional[Any]:
        """Get cached value"""
        if self.redis:
            cached = await self.redis.get(key)
            if cached:
                return json.loads(cached)
        else:
            return self.memory_cache.get(key, {}).get('value')

        return None

    async def set(self, key: str, value: Any, custom_ttl: Optional[int] = None):
        """Set cached value"""
        ttl = custom_ttl or self.ttl

        if self.redis:
            await self.redis.setex(key, ttl, json.dumps(value))
        else:
            self.memory_cache[key] = {
                'value': value,
                'expires': asyncio.get_event_loop().time() + ttl
            }

    async def invalidate_pattern(self, pattern: str):
        """Invalidate cache keys matching pattern"""
        if self.redis:
            keys = await self.redis.keys(pattern)
            if keys:
                await self.redis.delete(*keys)
        else:
            # Simple pattern matching for memory cache
            keys_to_remove = [k for k in self.memory_cache.keys() if pattern in k]
            for key in keys_to_remove:
                del self.memory_cache[key]

    async def cleanup_expired(self):
        """Clean up expired memory cache entries"""
        current_time = asyncio.get_event_loop().time()
        expired_keys = [
            key for key, data in self.memory_cache.items()
            if data['expires'] < current_time
        ]
        for key in expired_keys:
            del self.memory_cache[key]

# Global cache layer
cache_layer = CacheLayer()

async def cached_browser_operation(operation_func, *args, **kwargs):
    """Execute operation with caching"""
    cache_key = cache_layer._generate_key({
        'func': operation_func.__name__,
        'args': args,
        'kwargs': kwargs
    })

    # Check cache
    cached_result = await cache_layer.get(cache_key)
    if cached_result is not None:
        return cached_result

    # Execute operation
    result = await operation_func(*args, **kwargs)

    # Cache result
    await cache_layer.set(cache_key, result)

    return result

Backup and Recovery

Configuration Backup

# backup_config.sh
#!/bin/bash
# Configuration and data backup script

BACKUP_DIR="/opt/browser-use/backups"
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
BACKUP_FILE="$BACKUP_DIR/backup_$TIMESTAMP.tar.gz"

# Create backup directory
mkdir -p "$BACKUP_DIR"

# Backup application data
docker exec browser-use-prod tar czf - \
    /app/data \
    /app/config \
    /app/logs > "$BACKUP_FILE"

# Backup database
docker exec postgres-prod pg_dump -U browseruser browseruse > "$BACKUP_DIR/db_$TIMESTAMP.sql"

# Backup configurations
cp /opt/browser-use/docker-compose.yml "$BACKUP_DIR/"
cp /opt/browser-use/.env "$BACKUP_DIR/env_$TIMESTAMP.bak"

# Clean old backups (keep last 7 days)
find "$BACKUP_DIR" -name "*.tar.gz" -mtime +7 -delete
find "$BACKUP_DIR" -name "*.sql" -mtime +7 -delete
find "$BACKUP_DIR" -name "*.bak" -mtime +7 -delete

echo "Backup completed: $BACKUP_FILE"

Recovery Procedures

# recovery_procedure.sh
#!/bin/bash
# Disaster recovery script

set -e

echo "Starting disaster recovery..."

# Stop services
docker-compose down

# Find latest backup
LATEST_BACKUP=$(ls -t /opt/browser-use/backups/backup_*.tar.gz | head -1)
LATEST_DB=$(ls -t /opt/browser-use/backups/db_*.sql | head -1)

if [ -z "$LATEST_BACKUP" ] || [ -z "$LATEST_DB" ]; then
    echo "No backups found!"
    exit 1
fi

echo "Using backup: $LATEST_BACKUP"
echo "Using database: $LATEST_DB"

# Restore application data
docker run --rm -i \
    -v browser_data:/app/data \
    alpine sh -c "tar xzf - -C /" < "$LATEST_BACKUP"

# Restore database
docker-compose up -d postgres
sleep 10
docker exec -i postgres-prod psql -U browseruser -d browseruse < "$LATEST_DB"

# Start all services
docker-compose up -d

# Health check
sleep 30
curl -f http://localhost:8000/health || echo "Health check failed"

echo "Recovery completed!"

Summary

In this chapter, we've covered:

  • Production Architecture: Scalable Docker deployments with load balancing
  • Reliability: Circuit breakers, retry strategies, and graceful degradation
  • Monitoring: Comprehensive metrics collection and health monitoring
  • Security: Access control, input validation, and secure configurations
  • Performance: Resource pooling, caching, and optimization strategies
  • Backup & Recovery: Automated backup and disaster recovery procedures

Key Takeaways

  1. Scalability First: Design for horizontal scaling from the start
  2. Reliability: Implement comprehensive error handling and recovery
  3. Monitoring: Track performance, health, and usage metrics
  4. Security: Apply defense-in-depth security measures
  5. Performance: Optimize resource usage and implement caching
  6. Backup Strategy: Regular automated backups with tested recovery
  7. Operational Excellence: Implement proper logging, alerting, and runbooks

Conclusion

Deploying Browser Use agents to production requires careful planning and implementation of enterprise-grade practices. The combination of proper containerization, security hardening, monitoring, and scaling strategies ensures your browser automation system is robust, maintainable, and ready for production workloads.

Success in production browser automation depends on:

  • Understanding the unique challenges of browser automation at scale
  • Implementing comprehensive monitoring and observability
  • Building resilient systems with proper error handling and recovery
  • Maintaining security through access controls and input validation
  • Planning for scale with resource optimization and load balancing
  • Establishing operational procedures for backup, recovery, and maintenance

With these practices in place, Browser Use can power reliable, scalable browser automation for enterprise applications.


Congratulations! You've completed the Browser Use Tutorial. You're now ready to deploy production-grade browser automation agents.

Generated for Awesome Code Docs

What Problem Does This Solve?

Most teams struggle here because the hard part is not writing more code, but deciding clear boundaries for self, check, browser so behavior stays predictable as complexity grows.

In practical terms, this chapter helps you avoid three common failures:

  • coupling core logic too tightly to one implementation path
  • missing the handoff boundaries between setup, execution, and validation
  • shipping changes without clear rollback or observability strategy

After working through this chapter, you should be able to reason about Chapter 8: Production Deployment - Scaling, Reliability, and Best Practices as an operating subsystem inside Browser Use Tutorial: AI-Powered Web Automation Agents, with explicit contracts for inputs, state transitions, and outputs.

Use the implementation notes around resource, time, Dict as your checklist when adapting these patterns to your own repository.

How it Works Under the Hood

Under the hood, Chapter 8: Production Deployment - Scaling, Reliability, and Best Practices usually follows a repeatable control path:

  1. Context bootstrap: initialize runtime config and prerequisites for self.
  2. Input normalization: shape incoming data so check receives stable contracts.
  3. Core execution: run the main logic branch and propagate intermediate state through browser.
  4. Policy and safety checks: enforce limits, auth scopes, and failure boundaries.
  5. Output composition: return canonical result payloads for downstream consumers.
  6. Operational telemetry: emit logs/metrics needed for debugging and performance tuning.

When debugging, walk this sequence in order and confirm each stage has explicit success/failure conditions.

Source Walkthrough

Use the following upstream sources to verify implementation details while reading this chapter:

  • Browser Use Repository Why it matters: authoritative reference on Browser Use Repository (github.com).
  • Browser Use Releases Why it matters: authoritative reference on Browser Use Releases (github.com).
  • Browser Use Docs Why it matters: authoritative reference on Browser Use Docs (docs.browser-use.com).
  • Browser Use Cloud Why it matters: authoritative reference on Browser Use Cloud (cloud.browser-use.com).

Suggested trace strategy:

  • search upstream code for self and check to map concrete implementation paths
  • compare docs claims against actual runtime/config code before reusing patterns in production

Chapter Connections