AI Desktop Mentor

AI Desktop Mentor is an advanced Python-based desktop automation tool designed to emulate human-like interactions with a computer system. It leverages cutting-edge AI technologies, including YOLOv8 for UI element detection, Vosk for offline speech recognition, and DistilBERT for natural language processing (NLP), to perform tasks such as opening applications, navigating websites, logging in, and processing screenshots. With a Tkinter GUI, it supports voice commands, task scripting, and automated workflows—ideal for business automation and personal productivity.

✨ Features

Automation: Open apps (e.g., Chrome, Notepad), type text, navigate URLs, and log in to websites.
Screenshots: Capture manually (Ctrl+Shift+S) or auto (every 15 minutes).
AI Navigation: YOLOv8 detects UI elements (e.g., login fields); OCR reads screen text.
Task Scripting: Execute sequences defined in tasks.json.
Voice Control: Use offline voice commands via Vosk.
NLP Understanding: Parse natural language with DistilBERT.
Context Awareness: Detect CAPTCHAs/pop-ups with OCR.
Cross-Platform: Works on Windows, macOS, and Linux.

📁 Directory Structure


AIDesktopMentor/
├── automation/
│   └── automation\_tool.py
├── config/
│   └── tasks.json
├── docs/
│   ├── README.md
│   └── requirements.txt
├── models/
│   ├── yolo\_ui\_model.pth
│   └── vosk-model-small-en-us/
├── outputs/
│   └── screenshots/
├── dataset/
│   ├── images/
│   │   ├── train/
│   │   └── val/
│   ├── labels/
│   │   ├── train/
│   │   └── val/
│   └── data.yaml

🔁 Technical Workflow

graph TD
    A[User Input] --> B[GUI Tkinter]
    A --> C[Voice Listener Vosk]
    C --> D[NLP Parser DistilBERT]
    D --> E[Command Processor]
    B --> E
    E --> F[Automation Engine PyAutoGUI]
    E --> G[UI Detection YOLOv8]
    E --> H[Screenshot Module]
    E --> I[OCR Pytesseract]
    E --> J[Check Popups]
    F --> K[OS Interaction]
    H --> L[Save to screenshots/]
    I --> M[Context Feedback]
    K --> N[Screen Output]
    M --> N

📊 Business Workflow

graph TD
    A[Business User] --> B[Define Task]
    B -->|Manual| C[GUI Interaction]
    B -->|Automated| D[Configure tasks.json]
    B -->|Voice| E[Voice Command]
    C --> F[Execute Task]
    D --> F
    E --> F
    F -->|Open App| G[Access System]
    F -->|Login| H[Authenticate]
    F -->|Navigate| I[Access Resource]
    F -->|Screenshot| J[Generate Report]
    H -->|YOLO Detection| I
    I --> K[Perform Business Function]
    J --> L[Save Output]
    K --> M[Business Outcome]
    L --> M

✅ Prerequisites

Python 3.8+
Tesseract OCR
- Windows: Install
- macOS: brew install tesseract
- Linux: sudo apt-get install tesseract-ocr
Vosk Model
- Download vosk-model-small-en-us
- Extract into models/vosk-model-small-en-us/
YOLO Model
- Use yolov8n.pt or custom-trained model saved as yolo_ui_model.pt
Python dependencies
```
pip install -r requirements.txt
```
Microphone access + Permissions (macOS/Linux screen recording/input).

⚙️ Installation

# Clone the repo
git clone https://github.com/moses000/AIDesktopMentor.git
cd AIDesktopMentor

# Set up folder structure
mkdir -p outputs/screenshots dataset/images/train dataset/images/val dataset/labels/train dataset/labels/val

# Install dependencies
pip install -r requirements.txt

Don't forget to install Tesseract OCR, Vosk model, and YOLO model.

🧠 YOLO Setup

Option 1: Pre-trained

from ultralytics import YOLO
model = YOLO("yolov8n.pt")

mv yolov8n.pt models/yolo_ui_model.pt

Accuracy for UI tasks may be limited.

Option 2: Train Your Own

Capture screenshots

import pyautogui, time
for i in range(100):
    pyautogui.screenshot(f"dataset/images/train/login_{i}.png")
    time.sleep(2)

Label with LabelImg

pip install labelImg
labelImg dataset/images/train dataset/labels/train

Create data.yaml

train: dataset/images/train/
val: dataset/images/val/
nc: 3
names: ['username_field', 'password_field', 'login_button']

Train

from ultralytics import YOLO
model = YOLO("yolov8n.pt")
model.train(data="dataset/data.yaml", epochs=50, imgsz=640, batch=16)

Save model

cp runs/train/exp/weights/best.pt models/yolo_ui_model.pt

🚀 Usage

python automation/automation_tool.py

GUI Tasks:

Open Notepad & type
Execute tasks.json
Take screenshot
OCR read screen
Login via GUI
Enable voice commands

Voice Commands:

"open Chrome"
"go to example.com"
"log in to example.com"
"type hello world"
"take screenshot"
"read text"
"execute tasks"
"stop listening"

🧾 Sample `tasks.json`

[
    {
        "action": "open",
        "app": "chrome"
    },
    {
        "action": "navigate",
        "url": "https://example.com"
    },
    {
        "action": "login",
        "url": "https://example.com"
    },
    {
        "action": "screenshot",
        "prefix": "login_task"
    }
]

🛠 Notes

Permissions: macOS/Linux may need screen/microphone/input access.
YOLO: Required for login automation.
Vosk: Ensure correct folder structure in models/.
Performance Tip: Keep automation interval ≥ 5s to avoid resource strain.

📦 Deployment

pip install pyinstaller
pyinstaller --onefile automation/automation_tool.py

🧯 Troubleshooting

YOLO Errors: Check yolo_ui_model.pt & class IDs
Vosk Errors: Confirm model directory/mic permissions
GUI Not Working: Verify Python/Tkinter setup

🧠 Future Improvements

Expand YOLO UI detection classes
Add reinforcement learning for adaptive workflows
CAPTCHA solvers
Larger NLP models (e.g., BERT)
GUI task builder for tasks.json

📜 License

MIT License

🤝 Contributing

Open issues or submit pull requests on GitHub.

📬 Contact

For support, create an issue or email im.imoleayomoses@gmail.com

🙏 Acknowledgements

Ultralytics for YOLOv8
Vosk for speech recognition
Hugging Face for transformers

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
vosk-model-small-en-us		vosk-model-small-en-us
.env		.env
README.md		README.md
automation_tool.py		automation_tool.py
login_button.png		login_button.png
password_field.png		password_field.png
requirements.txt		requirements.txt
tasks.json		tasks.json
username_field.png		username_field.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AI Desktop Mentor

✨ Features

📁 Directory Structure

🔁 Technical Workflow

📊 Business Workflow

✅ Prerequisites

⚙️ Installation

🧠 YOLO Setup

Option 1: Pre-trained

Option 2: Train Your Own

🚀 Usage

GUI Tasks:

Voice Commands:

🧾 Sample `tasks.json`

🛠 Notes

📦 Deployment

🧯 Troubleshooting

🧠 Future Improvements

📜 License

🤝 Contributing

📬 Contact

🙏 Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Languages

moses000/AIDesktopPilot

Folders and files

Latest commit

History

Repository files navigation

AI Desktop Mentor

✨ Features

📁 Directory Structure

🔁 Technical Workflow

📊 Business Workflow

✅ Prerequisites

⚙️ Installation

🧠 YOLO Setup

Option 1: Pre-trained

Option 2: Train Your Own

🚀 Usage

GUI Tasks:

Voice Commands:

🧾 Sample tasks.json

🛠 Notes

📦 Deployment

🧯 Troubleshooting

🧠 Future Improvements

📜 License

🤝 Contributing

📬 Contact

🙏 Acknowledgements

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

🧾 Sample `tasks.json`

Packages