Add visual context to your Claude Code sessions - Capture screenshots and send them to Claude for analysis, debugging, and design feedback.
- πΌοΈ Full Screen Capture -
/visioncommand captures your screen and analyzes it - π― Area Selection -
/vision.areafor focused region capture with graphical or coordinate input - π Auto-Monitoring -
/vision.autofor continuous observation - π Privacy Controls - Configure exclusion zones for sensitive information
- π€ AI Provider Choice - Use Claude API or Google Gemini API with automatic fallback
- π₯οΈ Multi-Monitor Support - Capture from specific monitors with
--monitorflag - π§ Linux Support - Works on X11 and Wayland desktop environments
- β‘ Auto-Detection - Automatically detects and uses available screenshot tools
- π οΈ Diagnostic Tools - Built-in commands for testing and troubleshooting
- Claude Code installed and OAuth configured
- Linux with X11 or Wayland desktop environment
- Python 3.8 or later
- sudo access (for installing screenshot tools)
For X11 (most common):
# Ubuntu/Debian
sudo apt install scrot xrectsel -y
# Fedora/RHEL
sudo dnf install scrot xrectsel -y
# Arch
sudo pacman -S scrot xrectsel --noconfirmFor Wayland:
# Ubuntu/Debian
sudo apt install grim slurp -y
# Fedora/RHEL
sudo dnf install grim slurp -y
# Arch
sudo pacman -S grim slurp --noconfirm# Clone repository
git clone https://github.com/Patrik652/claude-code-vision.git
cd claude-code-vision
# Install with pip
pip install -e .# Generate default config
claude-vision --init
# Run diagnostic check
claude-vision --doctor# Capture full screen and analyze
/vision "What's on my screen?"
# Capture from specific monitor
/vision --monitor 1 "What's on my second screen?"
# Capture specific area (graphical selection)
/vision.area "Analyze this error dialog"
# Capture area with coordinates
/vision.area --coords "100,100,800,600" "What's in this region?"
# Start continuous monitoring
/vision.auto
# Stop monitoring
/vision.stop# Initialize configuration
claude-vision --init
# Run diagnostics
claude-vision --doctor
# List available monitors
claude-vision --list-monitors
# Validate configuration
claude-vision --validate-config
# Test screenshot capture
claude-vision --test-captureClaude Code Vision supports both Claude API and Google Gemini API.
Configure Gemini API (recommended for speed and cost):
- Get API key from: https://aistudio.google.com/apikey
- Edit
~/.config/claude-code-vision/config.yaml:
ai_provider:
provider: gemini # Use 'claude' or 'gemini'
fallback_to_gemini: true # Enable automatic fallback
gemini:
api_key: 'YOUR_GEMINI_API_KEY'
model: gemini-2.0-flash-exp # Fast and efficientOr set via environment variable:
export GEMINI_API_KEY="YOUR_API_KEY"See GEMINI_SETUP.md for detailed setup instructions.
Edit ~/.config/claude-code-vision/config.yaml to configure privacy zones:
privacy:
enabled: true
zones:
- name: "password_manager"
x: 1500
y: 0
width: 420
height: 100- π Quickstart Guide - Complete installation and usage guide
- π Feature Specification - Detailed feature requirements
- ποΈ Implementation Plan - Technical architecture
- π Data Model - Entity definitions
- π API Contracts - Interface specifications
- π Changelog - Recent updates and release notes
- β Production Smoke Test - Deployment validation and rollback
- Language: Python 3.8+
- Key Dependencies: Pillow, PyYAML, requests, click, google-generativeai
- Screenshot Tools: scrot (X11), grim (Wayland), ImageMagick (fallback)
- AI Providers: Claude API (via OAuth) or Google Gemini API
- Integration: Works seamlessly with Claude Code slash commands
# Install development dependencies
pip install -r requirements-dev.txt
# Run tests
pytest
# Run linting
black .
ruff check .
mypy src/src/
βββ models/ # Data entities (Screenshot, Configuration, etc.)
βββ services/ # Core services (capture, processing, API client)
βββ cli/ # CLI commands
βββ lib/ # Utilities (logging, detection, exceptions)
tests/
βββ contract/ # Contract tests for interfaces
βββ integration/ # End-to-end workflow tests
βββ unit/ # Unit tests for components
- Fork the repository
- Create a feature branch
- Follow test-first development (write tests before implementation)
- Ensure all tests pass
- Submit a pull request
MIT License - See LICENSE file for details
- π Issues: GitHub Issues
- π¬ Discussions: GitHub Discussions
- Core screenshot capture (X11/Wayland)
- Claude API integration
- Google Gemini API integration
- Privacy zones
- Area selection with graphical and coordinate input
- Auto-monitoring mode
- Multi-monitor support
- Diagnostic and testing utilities
- Windows support
- macOS support
- Video recording
- Screenshot history
- Browser extension integration
Built with β€οΈ for the Claude Code community. Powered by Claude AI and Anthropic.
Made with Claude Code π€