CoreUI is a cross-platform automation service (Windows, macOS & Linux) that enables control of desktop applications through a combination of window management, region-based screenshots, computer vision (template matching & optional YOLOv8) and low-level input injection.
Workflow Example: Focus on a window (
/focus), take a screenshot of it (/screenshot), and click on a UI element based on a template image (/click) - all programmatically and without having to move the physical mouse. CoreUI-MCP provides both an HTTP API (via FastAPI) and a dedicated server for direct integration with environments such as Claude Desktop (via JSON-RPC).
- 🖥️ Window Management - Find, focus and query window information on Windows, macOS & Linux (
pywinctl). - 📸 Async Screencapture - Efficient capture of entire windows or specific regions (
Base64-encoded). - 🖼️ Image Recognition - Enhanced UI recognition using caching, multi-scaling and template matching (
OpenCV). - 🤖 Extended Detection - Expandable YOLOv8 for more robust UI and element detection (requires
ultralytics). - 🖱️ Cursorless Input - Direct simulation of mouse (click, move, scroll, drag) input without blocking real cursor.
- ⌨️ Keyboard Input - Keyboard input (
SendInputon Windows,CGEventon macOS,xdotool/pynputon Linux). - 📼 Macro Recorder - Recording of mouse and keyboard actions in JSON files and their playback.
- 🚀 FastAPI Web-API - Provides core functions via a modern HTTP interface (
/api/v1/mcp/...). - 💬 MCP Server - Direct integration via
stdin/stdoutusing JSON-RPC for specialized use cases. - ⚙️ Configuration - Customization of API behavior, logging, security settings and more via config.json.
- 📝 Logging Feedback - Detailed logging of all important processes for debugging and monitoring.
- 🛡️ Error Handling - More specific exceptions and error messages cleaner.
~2.2s per operating cycle
The project is organized as follows:
src/- Contains all source code for themcppackage.api/- Handles the FastAPI web API.routes.py- Defines API endpoints and request/response logic.__init__.py- Makesapia Python sub-package.
input/- Low-level input backends for different operating systems.win.py- Windows input backend (usingctypesandSendInput).mac.py- macOS input backend (usingQuartz CGEvent).linux.py- Linux input backend (usingxdotoolorpynput).__init__.py- Selects the appropriate platform-specific backend.
recorder/- Macro recording and playback functionality.recorder.py- Implements the macro recorder and player logic.__init__.py- Makesrecordera Python sub-package.
capture.py- Screenshot and region-handling utilities.logger.py- Centralized logging configuration.main.py- FastAPI application entry point and server startup.mcp_server.py- JSON-RPC server for Claude Desktop and direct MCP integration.vision.py- Element detection utilities (template matching, YOLO).window.py- Cross-platform window management.__init__.py- Main package initializer formcp, sets version and platform specifics.
assets/- Directory for template images (examples provided).logs/- Directory where log files are saved.config.json- Configuration file (created bysetup_mcp.py).pyproject.toml- Poetry project definition, dependencies, and tool configurations.README.md- This file.setup_mcp.py- Setup script for creating directories and default configuration.LICENSE- Project license file (MIT).
- Python 3.12
- Poetry
- For Linux (X11 improved input):
xdotool(e.g.,sudo apt install xdotool)
- Clone or download repository
git clone https://github.com/Kirchlive/CoreUI-mcp.git- Admin PowerShell install poetry
(Invoke-WebRequest -Uri https://install.python-poetry.org -UseBasicParsing).Content | python -
poetry --version- If poetry is not located, restart admin PowerShell and set path
[Environment]::SetEnvironmentVariable("Path", [Environment]::GetEnvironmentVariable("Path", "User") + ";C:\Users\USERNAME\AppData\Roaming\Python\Scripts", "User")
# Add USERNAME to pathIf poetry still not located, restart computer
- Admin PowerShell and setup poetry
cd /path/to/CoreUI-mcp
PS /path/to/CoreUI-mcp> poetry install --extras yolo
PS /path/to/CoreUI-mcp> poetry lockThis creates
/CoreUI-mcp/config.json: customize config/CoreUI-mcp/assets/: your template images here/CoreUI-mcp/Logs/: log files
- Start server
cd /path/to/CoreUI-mcp
PS /path/to/CoreUI-mcp> poetry run python setup_mcp.py
PS /path/to/CoreUI-mcp> poetry run python -m mcp.mainThe server runs on http://127.0.0.1:8000 by default (configurable in config.json).
- New PowerShell window and start poetry config process
PS /path/to/CoreUI-mcp> poetry run pytest --cov=mcpProcess takes 1 minute. Attention mouse and keyboard test on top left screen
Record:
poetry run mcp-recorder record assets/my_macro.json --duration 30
# Records for 30 seconds or until ESC is pressed. (Assumes mcp-recorder script is not set up in pyproject yet)
# If mcp-recorder script is set up in pyproject.toml:
# poetry run mcp-recorder record assets/my_macro.json --duration 30Play:
poetry run mcp-recorder play assets/my_macro.json --speed 1.0
# If mcp-recorder script is set up in pyproject.toml:
# poetry run mcp-recorder play assets/my_macro.json --speed 1.0| Endpoint | Method | Body (JSON) | Result |
|---|---|---|---|
/focus |
POST | {title?: string, window_id? : int} |
204 No Content (success) / 404 Not Found / 500 Internal Server Error |
/screenshot |
POST | {title?: string, window_id?: int} |
200 OK with {image_base64, width, height, format} / 404 / 500 |
/click |
POST | Window Spec: `{title?: string, window_id?: int | str}<br> Click Spec:{template_path: string, threshold?: float}` |
/click/status/{job_id} |
GET | - | 200 OK with job status details / 404 Not Found |
/windows |
GET | - | 200 OK with list of window objects / 500 Internal Server Error |
*Detailed schemas and examples can be found in the Swagger UI (/docs). Note: The /click endpoint takes two JSON objects in the request body if using tools like Swagger UI; for programmatic requests, these are typically combined or sent appropriately by the client library.
The mcp_server.py implements the following tools, which can be called via JSON-RPC:
list_windows: Lists all available and visible windows.focus_window: Focuses a window based on its title.screenshot_window: Creates a screenshot of a window based on its title.click_template: Searches for a template image in a window and clicks on it.
- API access: By default, the FastAPI server listens on 0.0.0.0, making it accessible on the network. Restrict access via firewalls or reverse proxies and consider implementing authentication (e.g., API keys, OAuth2) if you expose CoreUI-MCP in an untrusted environment.
- Template paths: The
template_pathvalidation inClickRequestis designed to prevent path traversal attacks by restricting paths to configuredassetsdirectories. Check the configuration inconfig.json(e.g.,security.allowed_template_dirsif implemented, currently hardcoded to "assets" inroutes.py). - Execute with caution: CoreUI-mcp can simulate any input on the desktop. Run it with the lowest possible privileges and only in trusted automation scenarios.
- Linux Input Permissions: For the Linux backend,
xdotoolmight require appropriate X11 permissions.pynputmight require the process to be run with certain privileges or specific libraries to be installed for full functionality, especially concerning global event listening/posting. Operations requiring/dev/uinput(not directly used by currentlinux.pybut by tools likeydotool) would need root or special group permissions.
Run all PyTest tests with coverage:
poetry run pytest --cov=src
# Or if 'src' is specifically the source directory in pytest.ini:
# poetry run pytest --cov=src```
## 🛠️ Extend CoreUI-MCP
* **Advanced Input Scenarios**: Expose `drag`, `scroll`, direct `keydown`/`keyup`, and `type_text` through the FastAPI and MCP Server interfaces by defining appropriate request schemas and handlers.
* **Enhance Linux Input Backend**:
* Implement more robust Wayland workarounds or specific compositor integrations (very complex).
* Add an abstraction layer for key codes/keysyms if a platform-agnostic key input API is desired.
* **Other Vision Models**: Integrate other object recognition or OCR models in `vision.py`.
* **Advanced Configuration**: Make more parameters controllable via `config.json`.
* **Web UI**: Develop a simple web interface for interacting with the FastAPI service.MIT – see the LICENSE file.