JarvisAI is designed as a Client-Agent System where the local client handles OS interactions and the cloud agent handles intelligence.
The core of JarvisAI is the Decision-Execution-Correction (DEC) loop:
-
Perception:
- Audio: Captured via
speech_recognition, processed by Google Speech API. - Vision: Screenshots captured via
pyautogui, encoded as base64, sent to Gemini 2.0. - Text: Direct input via the GUI search bar.
- Audio: Captured via
-
Decision (Reasoning):
- The orchestrator (
ai_engine.py) maintains ahistoryof the conversation. - It sends a prompt to Gemini 2.0 Flash with a defined Tool Schema.
- Gemini returns a Function Call (JSON) instead of text if an action is required.
- The orchestrator (
-
Execution (Action):
task_manager.pymaps the function name (e.g.,create_file) to a Python callable.- Arguments are validated against safety policies.
- The action is executed (e.g.,
subprocess.run()).
-
Feedback (Correction):
- The output (stdout/stderr) is captured.
- If
stderris non-empty, it is fed back to the LLM as a new user message: "Execution failed with error: ...". - The LLM generates a fix and triggers a new execution.
sequenceDiagram
participant User
participant GUI
participant AIEngine
participant GeminiAPI
participant TaskManager
participant OS
User->>GUI: "Fix the bug on my screen"
GUI->>TaskManager: take_screenshot()
TaskManager-->>GUI: image_path
GUI->>AIEngine: process_command("Fix bug", image_path)
AIEngine->>GeminiAPI: Prompt + Image + Tools
GeminiAPI-->>AIEngine: FunctionCall(read_clipboard)
AIEngine->>TaskManager: read_clipboard()
TaskManager-->>AIEngine: "Error: NullPointer..."
AIEngine->>GeminiAPI: FunctionResult("Error: NullPointer...")
GeminiAPI-->>AIEngine: FunctionCall(create_file, content="fix.py...")
AIEngine->>TaskManager: create_file("fix.py", code)
TaskManager->>OS: Write File
TaskManager-->>AIEngine: "File Created"
AIEngine-->>GUI: "I have applied the fix."
GUI-->>User: "I have applied the fix."
Jarvis exposes the following tools to the LLM. These are defined in ai_engine.py using the Google Generative AI SDK format.
| Tool Name | Arguments | Description |
|---|---|---|
run_python_script |
script_path |
Runs a .py file and returns stdout/stderr. |
install_python_library |
library_name |
Installs a package via pip. |
create_file |
file_path, content |
Creates a text file with content. |
open_application |
app_name |
Intelligent fuzzy search for apps. |
take_screenshot |
None | Returns path to temporary screenshot. |
read_clipboard |
None | Returns clipboard text. |
get_current_time |
None | Returns ISO 8601 timestamp. |
- Whitelist: Writes are technically allowed anywhere except the Blacklist.
- Blacklist:
C:\Windows,C:\Program Files,C:\Program Files (x86). - Implementation:
os.path.abspath()checks against a forbidden prefix list before file creation.
- Isolation: Code runs in a child process, not the main thread.
- Resource Limits:
timeout=10s: Prevents infinite loops.creationflags=subprocess.CREATE_NO_WINDOW: Prevents popup spam (unless explicitly requested viactypes).
- API Keys: Stored in
config.py. Note: In a production environment, these should be moved to Environment Variables (os.environ). - Traffic: All LLM traffic is encrypted via TLS 1.3 (Google API standard).
- Concurrency: The GUI runs on the Main Thread (PyQt). All AI and OS operations are offloaded to
asyncioevent loops or background threads (threading.Thread) to prevent UI freezing. - Memory Footprint: ~150MB RAM (mostly Python runtime + PyQt overhead).
- Latency:
- Voice Processing: ~500ms
- LLM Inference: ~1-2s (Gemini Flash)
- Action Execution: <100ms