Table of Contents
Simple Task Execution:
Complex Agentic Workflow:
Note: Many local IDE Markdown previewers disable embedded video tags for security. Clicking the links above will open the videos in your default media player. When you push this to GitHub, the repository will recognize and natively play these files.
PocketPilot is a revolutionary, open-source Android automation framework that bridges the gap between Large Language Models (LLMs) and on-device actions. Think of it as a conversational mobile RPA (Robotic Process Automation) tool native to your smartphone. If you are looking for agentic AI in mobile or Android RPA solutions, PocketPilot provides an intelligent and adaptable approach.
Instead of dealing with rigid visual scripting or brittle X/Y coordinates that break upon UI updates, PocketPilot relies on strict cognitive automation and mobile AI automation. You talk naturally, and the agent dynamically understands the app's structure, adapts to UI changes on the fly, and figures out how to execute your intent autonomously.
- π£οΈ Natural Language Control: Issue commands like "Post my latest photo on Instagram" or "Turn on the living room AC" and let the AI plan the exact steps to accomplish it.
- π€ Accessibility & Assistive Tech: By enabling full voice-to-action control and bypassing complex visual navigation, PocketPilot acts as a powerful assistive tool, empowering users with motor or visual impairments to interact with any app effortlessly.
- ποΈ Context-Aware Perception: Utilizes deep Android
AccessibilityServiceAPIs to read and map out a comprehensive, node-based DOM of the current screen. - π§ Cognitive AI Planning: Powered by Google's Gemini API, which represents the "brain" navigating unknown or complex screen layouts to safely fulfill tasks.
- β‘ Native Execution Engine: Seamlessly executes node-based actions (taps, scrolls, text-entry) natively, drastically improving reliability over traditional coordinate-based macro tools.
- π Teach & Replay (Phase 2): Train the agent on a specific complex workflow once, and allow the cognitive engine to replay and parameterize those interaction sequences as standalone skills later.
We leverage a hybrid approach, combining the rapid cross-platform UI development of Flutter with the deep OS integrations of native Android.
PocketPilot is driven by an autonomous Observe β Plan β Act cognitive loop. It acts intelligently, rather than following a blinded script.
- Observe: Upon receiving a task, PocketPilot uses its native Android
AccessibilityServiceto quickly dump a full, structured UI tree of the current application. - Plan: This detailed screen state, alongside the user's overarching goal, is streamed securely to the Gemini reasoning engine. Gemini evaluates the layout, selects the target element, and decides the next action (e.g.,
tap_node,type_text,scroll_down). - Act: The Kotlin native core catches this semantic tool call from Dart, finds the corresponding screen node, and physically performs the gesture securely.
- Loop: PocketPilot captures the new screen state and repeats the cycle until the goal is declared complete or human intervention is required.
PocketPilot is built on a decoupled architecture, allowing fluid UI updates and AI reasoning without blocking native OS tasks.
graph TD
A[User Natural Language Prompt] -->|Inputs| B(Flutter App UI)
B --> C{Orchestration Layer Dart}
C -->|Requests Screen State| D[Android Platform Channel Kotlin]
subgraph Native Android Context
D --> E[Accessibility Service]
E -->|Extracts Context| F[Screen UI Tree / Nodes]
end
F -->|Returns Payload| C
C -->|Context + Prompt| G((Gemini AI Engine))
G -->|Reasons Next Step| G
G -->|Returns Formatted Tool Call| C
C -->|Sends Action Request| D
D -->|Executes Action| E
E -->|Modifies UI| H[(Third Party App)]
H -->|Provides New State| F
To get a local copy up and running, follow these simple steps.
- Android Device / Emulator: OS-level accessibility features are strictly required to scrape UI nodes and perform touches. (iOS is currently not supported due to OS sandbox constraints).
- Flutter SDK: Ensure you have the latest stable version of Flutter.
- Gemini API Key: Grab a free API key from Google AI Studio.
- Clone the repository
git clone https://github.com/sumanthnani10/PocketPilot.git
- Open the project in Android Studio or VS Code.
- Install Dart packages
cd PocketPilot flutter pub get - Run the application
flutter run
- Grant Accessibility Permissions (CRITICAL):
The very first time you launch PocketPilot on a device, navigate to your Android settings:
Settings > Accessibility > Installed AppsEnable thePocketPilotAccessibility Service. The app will not work without this explicit permission. - Configure the AI Engine: Launch PocketPilot, navigate to the Settings page, and securely enter your Gemini API Key.
- Phase 1a: Flutter UI structure & platform channels setup
- Phase 1b: Core Android Accessibility Service extraction (Screen parsing)
- Phase 1c: Gemini tool-calling integrations and conversational loop
- Phase 1d: Error handling, loop breaking, and user-intervention boundaries
- Phase 2a: Action recording system (Teach Mode)
- Phase 2b: Skill library UI and local persistence
- Phase 2c: Intelligent skill parameterized replay
See the open issues for a full list of proposed features (and known issues).
Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.
If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag enhancement.
Don't forget to give the project a β! Thanks again!
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature) - Commit your Changes (
git commit -m 'Add some AmazingFeature') - Push to the Branch (
git push origin feature/AmazingFeature) - Open a Pull Request
Distributed under the PolyForm Noncommercial License 1.0.0. See LICENSE for more information and commercial terms.
Sumanth - @sumanthnani10
Project Link: https://github.com/sumanthnani10/PocketPilot

