Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ frontend/.vitepress/dist/
frontend/.vitepress/cache/
frontend/.vitepress/.temp/
.claude/*
.worktrees/

# npm auth
.npmrc
Expand Down
33 changes: 33 additions & 0 deletions frontend/get-started/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,39 @@ For human-auth relay:
- shared relay hub launched by `openpocket human-auth-relay start` does not use separate per-agent relay state or per-agent hub API keys
- in managed mode, agent-local request state still stays under the agent's own `state/`

### Aliyun UI Agent mobile backend

OpenPocket now includes a first-class `aliyun-ui-agent/mobile` model profile.

Key points:

- the profile sets `models.<name>.backend` to `aliyun_ui_agent_mobile`
- runtime routes this backend through the dedicated Aliyun GUI-agent client instead of the default OpenAI-compatible tool-calling path
- image input is delivered as a short-lived screenshot URL from the local relay stack, so the selected agent must have `humanAuth.useLocalRelay=true`
- if Aliyun must fetch screenshots from the public internet, use either:
- the shared relay hub from `openpocket human-auth-relay start`
- or per-agent ngrok via `humanAuth.tunnel.provider=ngrok`

Minimal example:

```json
{
"defaultModel": "aliyun-ui-agent/mobile",
"models": {
"aliyun-ui-agent/mobile": {
"baseUrl": "https://dashscope.aliyuncs.com/api/v2/apps/gui-owl/gui_agent_server",
"model": "pre-gui_owl_7b",
"apiKey": "",
"apiKeyEnv": "DASHSCOPE_API_KEY",
"maxTokens": 4096,
"reasoningEffort": null,
"temperature": null,
"backend": "aliyun_ui_agent_mobile"
}
}
}
```

## Backward Compatibility Keys

Loader maps old keys automatically, including:
Expand Down
15 changes: 15 additions & 0 deletions frontend/ops/troubleshooting.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,21 @@
- verify model supports requested endpoint and multimodal input
- switch model profile and retry

## Aliyun UI Agent cannot fetch screenshot URL

- verify the selected model profile uses `backend: "aliyun_ui_agent_mobile"`
- ensure `humanAuth.useLocalRelay=true`
- if using managed agents, start the shared relay hub with `openpocket human-auth-relay start`
- if Aliyun must fetch over the public internet, verify ngrok/shared public relay URL is reachable from outside your LAN
- inspect logs for `[OpenPocket][human-auth]`, `[OpenPocket][relay-hub]`, and local relay startup failures

## Aliyun UI Agent keeps returning `wait` or unsupported operations

- inspect the selected agent session file for the raw `Operation` string returned by Aliyun
- confirm the task is in `device_type=mobile` scope and the current screen is an Android phone UI, not a secure/blank surface
- if the screen is `FLAG_SECURE` or blacked out, use the human-auth takeover path instead of retrying model calls
- retry after enabling a public screenshot URL path (shared relay hub or ngrok), because stale/unreachable image URLs often degrade action quality

## Channel bot does not respond

- validate token for the selected agent (`channels.<type>.*` or env)
Expand Down
4 changes: 4 additions & 0 deletions frontend/reference/cli-and-gateway.md
Original file line number Diff line number Diff line change
Expand Up @@ -219,14 +219,18 @@ Examples:
openpocket model show
openpocket model list
openpocket model set --name gpt-5.4
openpocket model set --name aliyun-ui-agent/mobile
openpocket --agent review-bot model set --provider google --model gemini-3.1-pro-preview
openpocket --agent review-bot model set --provider aliyun-ui-agent --model pre-gui_owl_7b
```

Notes:

- `model set --name <profile>` switches to an existing profile key
- `model set --provider <provider> --model <model-id>` creates/updates a profile from provider presets and switches the selected agent's default model
- model config is per agent after creation
- `Aliyun UI Agent (Mobile)` is a dedicated backend, not a normal OpenAI-compatible chat profile even though it uses DashScope
- when using `aliyun-ui-agent/mobile`, the selected agent must expose screenshots through the local relay stack; for public internet access, use the shared relay hub or per-agent ngrok

## Channel Commands

Expand Down
19 changes: 19 additions & 0 deletions frontend/reference/config-defaults.md
Original file line number Diff line number Diff line change
Expand Up @@ -295,11 +295,29 @@ Managed agents created later with `openpocket create agent <id>` start from the
"maxTokens": 4096,
"reasoningEffort": null,
"temperature": null
},
"aliyun-ui-agent/mobile": {
"baseUrl": "https://dashscope.aliyuncs.com/api/v2/apps/gui-owl/gui_agent_server",
"model": "pre-gui_owl_7b",
"apiKey": "",
"apiKeyEnv": "DASHSCOPE_API_KEY",
"maxTokens": 4096,
"reasoningEffort": null,
"temperature": null,
"backend": "aliyun_ui_agent_mobile"
}
}
}
```

### Aliyun UI Agent mobile note

- built-in profile key: `aliyun-ui-agent/mobile`
- backend discriminator: `models.<name>.backend = "aliyun_ui_agent_mobile"`
- screenshot delivery depends on the local relay stack, so public Aliyun fetches require either:
- the shared relay hub (`openpocket human-auth-relay start`)
- or per-agent ngrok (`humanAuth.tunnel.provider=ngrok`)

## Managed Agent Overrides

When you run `openpocket create agent <id>`, OpenPocket clones a new agent config and rewrites these defaults:
Expand Down Expand Up @@ -356,6 +374,7 @@ Notes:
- `humanAuth.pollIntervalMs` is clamped to at least `500`.
- `humanAuth.tunnel.provider` accepts only `none|ngrok`.
- `humanAuth.tunnel.ngrok.startupTimeoutSec` is clamped to at least `3`.
- `models.<name>.backend` accepts only `default|aliyun_ui_agent_mobile`; other values fall back to `default`.
- `allowedChatIds` is coerced to numeric array with non-finite values removed.
- model `baseUrl` is normalized for known providers:
- Google Generative Language bare host -> `/v1beta`
Expand Down
6 changes: 6 additions & 0 deletions src/agent/agent-runtime.ts
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,10 @@ import {
import { runRuntimeAttempt } from "./runtime/attempt.js";
import { runRuntimeTask } from "./runtime/run.js";
import type { RunTaskRequest } from "./runtime/types.js";
import { AliyunUiAgentClient } from "./aliyun-ui-agent-client.js";
import { AliyunGuiPlusClient } from "./aliyun-gui-plus-client.js";
import { createPiSessionBridge } from "./pi-session-bridge.js";
import { LocalHumanAuthStack } from "../human-auth/local-stack.js";
import { scaleCoordinates, drawDebugMarker } from "../utils/image-scale.js";
import {
PhoneUseCapabilityProbe,
Expand Down Expand Up @@ -3850,6 +3853,9 @@ export class AgentRuntime {
isPermissionDialogApp: (currentApp) => this.isPermissionDialogApp(currentApp),
autoApprovePermissionDialog: (currentApp) => this.autoApprovePermissionDialog(currentApp),
saveModelInputArtifacts: (params) => this.saveModelInputArtifacts(params),
aliyunUiAgentClientFactory: (options) => new AliyunUiAgentClient(options),
aliyunGuiPlusClientFactory: (options) => new AliyunGuiPlusClient(options),
localHumanAuthStackFactory: (config, log) => new LocalHumanAuthStack(config, log),
},
attemptRequest,
),
Expand Down
Loading
Loading