A Gemini CLI extension that gives the AI the ability to see your screen. Capture full screen or specific application windows on-demand for visual debugging, UI analysis, and context gathering.
- On-Demand Capture: Screenshots are only taken when you ask (e.g., "Look at my screen")
- Window Locking: Focus captures on a specific application window using fuzzy search
- Smart Countdown: 3-second visual countdown for full-screen captures; instant capture for locked windows
- Optimized Images: Automatically resized and compressed for fast response times
- Session History: View previously captured screenshots from the current session
- Multi-Display Support: Capture from any connected display
- Local Privacy: Images are processed locally and saved to a session folder for your reference
- macOS (this extension uses macOS-specific APIs)
- Node.js v18 or later
- Xcode Command Line Tools (for window enumeration)
xcode-select --install
- Screen Recording Permission (you'll be prompted on first use)
gemini extensions install https://github.com/LyalinDotCom/ScreenshotExtensionThe extension will automatically build during installation.
-
Clone the repository:
git clone https://github.com/LyalinDotCom/ScreenshotExtension.git cd ScreenshotExtension -
Install dependencies:
npm install
-
Link to Gemini CLI:
gemini extensions link .
Once installed, Gemini will automatically use the screenshot tool when you ask it to look at your screen.
- "Look at my screen and tell me what error is in the console."
- "Take a screenshot and analyze the layout of this web page."
- "I'm stuck on this screen, what should I click?"
- "What version number is shown in the bottom corner?"
| Command | Description |
|---|---|
/shot |
Take a screenshot of the full screen or locked window |
/list |
List all open visible windows |
/lock <query> |
Lock captures to a window matching the query (e.g., /lock Chrome) |
/clear |
Clear the window lock, revert to full screen capture |
- Use
/listto see all available windows - Use
/lock VS Codeto focus on a specific window - Use
/shotor ask naturally - captures will only show that window - Use
/clearto go back to full screen capture
This extension provides the following tools to the AI:
| Tool | Description |
|---|---|
take_screenshot |
Captures the screen or locked window, returns the image |
view_screenshot_history |
Lists or retrieves previously captured screenshots |
list_windows |
Lists all visible application windows |
focus_window |
Locks capture to a specific window by fuzzy search |
clear_lock |
Removes the window lock |
- macOS only: Uses CoreGraphics APIs and
screencaptureutility - Screen Recording permission required: Must be granted in System Settings
- Xcode CLI tools required: Needed for Swift-based window enumeration
- 3-second countdown: For full-screen captures only (locked windows capture instantly)
On first run, macOS will prompt you to allow screen recording. If you denied it:
- Go to System Settings > Privacy & Security > Screen Recording
- Enable the toggle for your Terminal app (Terminal, iTerm, VS Code, etc.)
- Restart your terminal and the Gemini CLI
Ensure Xcode Command Line Tools are installed:
xcode-select --installThe extension defaults to the primary display (index 0). To capture another display, mention it in your prompt (e.g., "take a screenshot of my second monitor").
Screenshots are saved locally to:
./captures/session-{uuid}/
Each session gets a unique folder. Images are saved as compressed JPEGs (1280px width, 80% quality).
MIT