Skip to content

Kirchlive/CoreUI-mcp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

53 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CoreUI-mcp (Model Context Protocol) v0.1.5

CoreUI is a cross-platform automation service (Windows, macOS & Linux) that enables control of desktop applications through a combination of window management, region-based screenshots, computer vision (template matching & optional YOLOv8) and low-level input injection.

Workflow Example: Focus on a window (/focus), take a screenshot of it (/screenshot), and click on a UI element based on a template image (/click) - all programmatically and without having to move the physical mouse. CoreUI-MCP provides both an HTTP API (via FastAPI) and a dedicated server for direct integration with environments such as Claude Desktop (via JSON-RPC).

Functions

  • 🖥️ Window Management - Find, focus and query window information on Windows, macOS & Linux (pywinctl).
  • 📸 Async Screencapture - Efficient capture of entire windows or specific regions (Base64-encoded).
  • 🖼️ Image Recognition - Enhanced UI recognition using caching, multi-scaling and template matching (OpenCV).
  • 🤖 Extended Detection - Expandable YOLOv8 for more robust UI and element detection (requires ultralytics).
  • 🖱️ Cursorless Input - Direct simulation of mouse (click, move, scroll, drag) input without blocking real cursor.
  • ⌨️ Keyboard Input - Keyboard input (SendInput on Windows, CGEvent on macOS, xdotool/pynput on Linux).
  • 📼 Macro Recorder - Recording of mouse and keyboard actions in JSON files and their playback.
  • 🚀 FastAPI Web-API - Provides core functions via a modern HTTP interface (/api/v1/mcp/...).
  • 💬 MCP Server - Direct integration via stdin/stdout using JSON-RPC for specialized use cases.
  • ⚙️ Configuration - Customization of API behavior, logging, security settings and more via config.json.
  • 📝 Logging Feedback - Detailed logging of all important processes for debugging and monitoring.
  • 🛡️ Error Handling - More specific exceptions and error messages cleaner.

~2.2s per operating cycle

📋 Project Structure

The project is organized as follows:

  • src/ - Contains all source code for the mcp package.
    • api/ - Handles the FastAPI web API.
      • routes.py - Defines API endpoints and request/response logic.
      • __init__.py - Makes api a Python sub-package.
    • input/ - Low-level input backends for different operating systems.
      • win.py - Windows input backend (using ctypes and SendInput).
      • mac.py - macOS input backend (using Quartz CGEvent).
      • linux.py - Linux input backend (using xdotool or pynput).
      • __init__.py - Selects the appropriate platform-specific backend.
    • recorder/ - Macro recording and playback functionality.
      • recorder.py - Implements the macro recorder and player logic.
      • __init__.py - Makes recorder a Python sub-package.
    • capture.py - Screenshot and region-handling utilities.
    • logger.py - Centralized logging configuration.
    • main.py - FastAPI application entry point and server startup.
    • mcp_server.py - JSON-RPC server for Claude Desktop and direct MCP integration.
    • vision.py - Element detection utilities (template matching, YOLO).
    • window.py - Cross-platform window management.
    • __init__.py - Main package initializer for mcp, sets version and platform specifics.
  • assets/ - Directory for template images (examples provided).
  • logs/ - Directory where log files are saved.
  • config.json - Configuration file (created by setup_mcp.py).
  • pyproject.toml - Poetry project definition, dependencies, and tool configurations.
  • README.md - This file.
  • setup_mcp.py - Setup script for creating directories and default configuration.
  • LICENSE - Project license file (MIT).

🚀 Start Guide

Prerequisites

  • Python 3.12
  • Poetry
  • For Linux (X11 improved input): xdotool (e.g., sudo apt install xdotool)

Installation

  1. Clone or download repository
git clone https://github.com/Kirchlive/CoreUI-mcp.git
  1. Admin PowerShell install poetry
(Invoke-WebRequest -Uri https://install.python-poetry.org -UseBasicParsing).Content | python -
poetry --version
  1. If poetry is not located, restart admin PowerShell and set path
[Environment]::SetEnvironmentVariable("Path", [Environment]::GetEnvironmentVariable("Path", "User") + ";C:\Users\USERNAME\AppData\Roaming\Python\Scripts", "User")
# Add USERNAME to path

If poetry still not located, restart computer

  1. Admin PowerShell and setup poetry
cd /path/to/CoreUI-mcp
PS /path/to/CoreUI-mcp> poetry install --extras yolo
PS /path/to/CoreUI-mcp> poetry lock

This creates

  • /CoreUI-mcp/config.json: customize config
  • /CoreUI-mcp/assets/: your template images here
  • /CoreUI-mcp/Logs/: log files
  1. Start server
cd /path/to/CoreUI-mcp
PS /path/to/CoreUI-mcp> poetry run python setup_mcp.py
PS /path/to/CoreUI-mcp> poetry run python -m mcp.main

The server runs on http://127.0.0.1:8000 by default (configurable in config.json).

  1. New PowerShell window and start poetry config process
PS /path/to/CoreUI-mcp> poetry run pytest --cov=mcp

Process takes 1 minute. Attention mouse and keyboard test on top left screen

Macro recorder (optional)

Record:

poetry run mcp-recorder record assets/my_macro.json --duration 30
# Records for 30 seconds or until ESC is pressed. (Assumes mcp-recorder script is not set up in pyproject yet)
# If mcp-recorder script is set up in pyproject.toml:
# poetry run mcp-recorder record assets/my_macro.json --duration 30

Play:

poetry run mcp-recorder play assets/my_macro.json --speed 1.0
# If mcp-recorder script is set up in pyproject.toml:
# poetry run mcp-recorder play assets/my_macro.json --speed 1.0

FastAPI API overview (/api/v1/mcp)

Endpoint Method Body (JSON) Result
/focus POST {title?: string, window_id? : int} 204 No Content (success) / 404 Not Found / 500 Internal Server Error
/screenshot POST {title?: string, window_id?: int} 200 OK with {image_base64, width, height, format} / 404 / 500
/click POST Window Spec: `{title?: string, window_id?: int str}<br> Click Spec:{template_path: string, threshold?: float}`
/click/status/{job_id} GET - 200 OK with job status details / 404 Not Found
/windows GET - 200 OK with list of window objects / 500 Internal Server Error

*Detailed schemas and examples can be found in the Swagger UI (/docs). Note: The /click endpoint takes two JSON objects in the request body if using tools like Swagger UI; for programmatic requests, these are typically combined or sent appropriately by the client library.

MCP Server (Claude Desktop) Tools

The mcp_server.py implements the following tools, which can be called via JSON-RPC:

  • list_windows: Lists all available and visible windows.
  • focus_window: Focuses a window based on its title.
  • screenshot_window: Creates a screenshot of a window based on its title.
  • click_template: Searches for a template image in a window and clicks on it.

Security notes

  • API access: By default, the FastAPI server listens on 0.0.0.0, making it accessible on the network. Restrict access via firewalls or reverse proxies and consider implementing authentication (e.g., API keys, OAuth2) if you expose CoreUI-MCP in an untrusted environment.
  • Template paths: The template_path validation in ClickRequest is designed to prevent path traversal attacks by restricting paths to configured assets directories. Check the configuration in config.json (e.g., security.allowed_template_dirs if implemented, currently hardcoded to "assets" in routes.py).
  • Execute with caution: CoreUI-mcp can simulate any input on the desktop. Run it with the lowest possible privileges and only in trusted automation scenarios.
  • Linux Input Permissions: For the Linux backend, xdotool might require appropriate X11 permissions. pynput might require the process to be run with certain privileges or specific libraries to be installed for full functionality, especially concerning global event listening/posting. Operations requiring /dev/uinput (not directly used by current linux.py but by tools like ydotool) would need root or special group permissions.

Tests

Run all PyTest tests with coverage:

poetry run pytest --cov=src
# Or if 'src' is specifically the source directory in pytest.ini:
# poetry run pytest --cov=src```

## 🛠️ Extend CoreUI-MCP

*   **Advanced Input Scenarios**: Expose `drag`, `scroll`, direct `keydown`/`keyup`, and `type_text` through the FastAPI and MCP Server interfaces by defining appropriate request schemas and handlers.
*   **Enhance Linux Input Backend**:
    *   Implement more robust Wayland workarounds or specific compositor integrations (very complex).
    *   Add an abstraction layer for key codes/keysyms if a platform-agnostic key input API is desired.
*   **Other Vision Models**: Integrate other object recognition or OCR models in `vision.py`.
*   **Advanced Configuration**: Make more parameters controllable via `config.json`.
*   **Web UI**: Develop a simple web interface for interacting with the FastAPI service.

License

MIT – see the LICENSE file.

About

CoreUI-mcp Server full automated desktop - Windows / macOS / Linux

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages