Skip to content

Add Aliyun GUI-OWL coordinate fix and GUI-Plus model support#112

Merged
tsubasakong merged 2 commits intomainfrom
feat/aliyun-gui-models
Mar 16, 2026
Merged

Add Aliyun GUI-OWL coordinate fix and GUI-Plus model support#112
tsubasakong merged 2 commits intomainfrom
feat/aliyun-gui-models

Conversation

@tsubasakong
Copy link
Collaborator

@tsubasakong tsubasakong commented Mar 16, 2026

Summary

  • Fix GUI-OWL (pre-gui_owl_7b) coordinate rescaling from 1000x1000 normalized space to device pixels
  • Add GUI-Plus (gui-plus) model client with smart_resize coordinate conversion
  • Add Aliyun UI Agent infrastructure: relay signed screenshots, setup wizard, docs
  • Handle model's SWIPE action as alias for SCROLL with direction/amount

Why

The Aliyun GUI-OWL model outputs coordinates in a 1000x1000 normalized space, causing taps to land off-screen on devices with different resolutions (e.g. 720x1600). The newer GUI-Plus model offers a standard DashScope API with improved capabilities but had no integration.

Changes

  • src/agent/aliyun-ui-agent-client.ts — GUI-OWL client with 1000x1000 coordinate rescaling
  • src/agent/aliyun-gui-plus-client.ts — GUI-Plus client with smart_resize coordinate conversion and official Aliyun system prompt
  • src/types.ts — added aliyun_gui_plus backend type
  • src/config/model-provider-presets.ts — GUI-Plus provider preset
  • src/agent/runtime/attempt.ts — GUI-Plus execution loop (inline base64 screenshots, no ngrok needed)
  • src/human-auth/local-stack.ts — signed screenshot URL support for GUI-OWL
  • src/onboarding/setup-wizard.ts — Aliyun provider onboarding flow

Testing

node --test test/aliyun-ui-agent-client.test.mjs  # 4/4 pass
node --test test/model-provider-presets.test.mjs   # pass
node --test test/runtime-seams.test.mjs            # 18/18 pass

Manual testing: verified GUI-OWL swipes and GUI-Plus SCROLL actions execute on 720x1600 physical device.

Checklist

  • I ran relevant tests, or the Testing section explains why I did not.
  • I updated docs, or confirmed no doc changes were needed.
  • I confirmed the PR does not include secrets, credentials, or private data.

Closes #111

- Fix GUI-OWL (pre-gui_owl_7b) coordinate rescaling from 1000x1000
  normalized space to actual device pixel dimensions
- Add GUI-Plus model client with smart_resize coordinate conversion
- Add aliyun_gui_plus backend type and model provider preset
- Wire GUI-Plus execution loop into runtime attempt
- Handle SWIPE as alias for SCROLL in GUI-Plus responses
- Include official Aliyun system prompt adapted for mobile

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@vercel
Copy link

vercel bot commented Mar 16, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
openpocket Ready Ready Preview, Comment Mar 16, 2026 1:36am

Request Review

Add supporting changes required for Aliyun model integration:
- Human-auth relay: signed screenshot URL support for GUI-OWL
- Setup wizard: Aliyun UI Agent provider onboarding flow
- CLI: backend recognition for aliyun models
- Dashboard control-store: Aliyun backend handling
- Frontend docs: configuration and troubleshooting for Aliyun models
- Tests for all the above

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@tsubasakong tsubasakong merged commit a7feb54 into main Mar 16, 2026
5 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Aliyun GUI-OWL coordinate fix and GUI-Plus model support

1 participant