-
Notifications
You must be signed in to change notification settings - Fork 142
Add WebDriver-compatible HTTP API with OpenAPI spec and real browser integration #158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Co-authored-by: galaxyeye <1701451+galaxyeye@users.noreply.github.com>
…hronized list for events Co-authored-by: galaxyeye <1701451+galaxyeye@users.noreply.github.com>
|
@copilot 继续增加 API 支持:
|
Co-authored-by: galaxyeye <1701451+galaxyeye@users.noreply.github.com>
Added the requested API support in commit 80a73af: PerceptiveAgent APIs:
PulsarSession APIs:
All 22 integration tests passing. OpenAPI spec updated. |
|
@copilot 为了完整实现这些接口,并且最终实现 Kotlin/Python/Node 客户端和SDK,还有哪些工作需要做?给出一个计划。 |
Implementation Plan for Full API & Multi-Language SDKsPhase 1: Complete Server Implementation (Backend)1.1 Real Browser Integration
1.2 Enhanced Error Handling
1.3 Authentication & Security
1.4 Performance & Monitoring
Phase 2: OpenAPI & Documentation2.1 Complete OpenAPI Spec
2.2 API Documentation
Phase 3: Kotlin Client SDK3.1 Code Generation
3.2 SDK Features
3.3 Examples & Tests
Phase 4: Python Client SDK4.1 Code Generation
4.2 SDK Features
4.3 Package Distribution
Phase 5: Node.js Client SDK5.1 Code Generation
5.2 SDK Features
5.3 Package Distribution
Phase 6: Testing & Quality6.1 Server Tests
6.2 SDK Tests
Phase 7: CI/CD & Release7.1 Build Pipeline
7.2 Documentation Site
Next Immediate Steps:
Would you like me to start with any specific phase? |
|
@copilot yes, go on from 1 |
Started implementing Phase 1.1 - Real Browser Integration. Progress so far: Completed:
In Progress:
Next Steps:
The implementation maintains backward compatibility - when SessionManager is not available, controllers fall back to mock mode for testing. |
Co-authored-by: galaxyeye <1701451+galaxyeye@users.noreply.github.com>
|
@copilot go on for all next steps |
…n controllers Co-authored-by: galaxyeye <1701451+galaxyeye@users.noreply.github.com>
Completed Phase 1.1, 1.2, and 1.3 in commits ff3dfe6 and 8a5ad3d: Phase 1.1 - PulsarSession Integration ✅
Phase 1.2 - Agent Integration ✅
Phase 1.3 - Navigation Integration ✅
Current Status:
Remaining phases for complete real browser integration:
The API now has full real browser capability for session management, AI agent operations, and page navigation. |
Summary
This PR adds a WebDriver-compatible HTTP API with selector-first extensions for browser automation. Includes PerceptiveAgent and PulsarSession APIs with real browser integration for PulsarSession, Agent, and Navigation endpoints.
API Endpoints
Session:
POST/GET/DELETE /session/{sessionId}- ✅ Real browser session supportNavigation:
POST/GET /session/{sessionId}/url,GET .../documentUri,GET .../baseUri- ✅ Real browser navigationSelector-first operations:⚠️ Mock implementation
POST /session/{sessionId}/selectors/{exists,waitFor,element,elements,click,fill,press,outerHtml,screenshot}-Element by ID:⚠️ Mock implementation
POST .../element/{elementId}/click,POST .../value,GET .../attribute/{name},GET .../text-Script execution:⚠️ Mock implementation
POST .../execute/sync,POST .../execute/async-Control:⚠️ Mock implementation
POST .../control/{delay,pause,stop}-Events:⚠️ Mock implementation
POST/GET .../event-configs,GET .../events,POST .../events/subscribe-PerceptiveAgent APIs ✅ REAL BROWSER INTEGRATION
POST /session/{sessionId}/agent/run- Run autonomous agent task using real BrowserPerceptiveAgentPOST /session/{sessionId}/agent/observe- Observe page and return potential actions from real agentPOST /session/{sessionId}/agent/act- Execute a single action using real agentPOST /session/{sessionId}/agent/extract- Extract structured data from page using real agentPOST /session/{sessionId}/agent/summarize- Summarize page content using real agentPOST /session/{sessionId}/agent/clearHistory- Clear agent history on real agent instancePulsarSession APIs ✅ REAL BROWSER INTEGRATION
POST /session/{sessionId}/normalize- Normalize URL with load arguments using real PulsarSessionPOST /session/{sessionId}/open- Open URL immediately (bypass cache) using real browserPOST /session/{sessionId}/load- Load URL from storage or internet with real cache checkingPOST /session/{sessionId}/submit- Submit URL to crawl pool for asynchronous processingReal Browser Integration (Phases 1.1, 1.2, 1.3 Complete)
SessionManager Service - NEW component for production use:
PulsarSessionandAgenticSessioninstancesDual-Mode Support:
PulsarSession Integration:
PerceptiveAgent Integration:
Navigation Integration:
Files Added/Modified
pulsar-rest/.../service/SessionManager.kt— NEW: Real session lifecycle managementpulsar-rest/.../controller/SessionController.kt— Updated: Dual-mode support (mock/real)pulsar-rest/.../controller/PulsarSessionController.kt— Updated: Real PulsarSession integrationpulsar-rest/.../controller/AgentController.kt— Updated: Real PerceptiveAgent integration ✅pulsar-rest/.../controller/NavigationController.kt— Updated: Real browser navigation ✅pulsar-rest/.../controller/ControllerUtils.kt— Shared utilities with error handlingpulsar-rest/.../dto/ApiDtos.kt— Extended with agent and session DTOspulsar-rest/.../store/InMemoryStore.kt— Mock storage for testingopenapi/openapi.yaml— Updated with new endpoints and schemasImplementation Status
✅ Completed:
/swagger-ui.html){"value": ...}How to Run
Start Server (Mock Mode):
Start Server (Real Browser Mode):
Requires PulsarContext bean configuration. SessionManager will be auto-wired when available.
Run Tests:
./mvnw -pl pulsar-rest testAccess Swagger UI: http://localhost:8182/swagger-ui.html
OpenAPI Spec: http://localhost:8182/openapi.yaml
Example Usage
Create Session (works in both modes):
Navigate (real browser navigation):
curl -X POST http://localhost:8182/session/{sessionId}/url \ -H "Content-Type: application/json" \ -d '{"url": "https://example.com"}'Load URL (real PulsarSession):
curl -X POST http://localhost:8182/session/{sessionId}/load \ -H "Content-Type: application/json" \ -d '{"url": "https://example.com", "args": "-expire 1d"}'Run Agent Task (real PerceptiveAgent):
curl -X POST http://localhost:8182/session/{sessionId}/agent/run \ -H "Content-Type: application/json" \ -d '{"task": "Find all product prices on the page"}'Check Selector Exists (mock):
curl -X POST http://localhost:8182/session/{sessionId}/selectors/exists \ -H "Content-Type: application/json" \ -d '{"selector": "#main-content"}'Click by Selector (mock):
curl -X POST http://localhost:8182/session/{sessionId}/selectors/click \ -H "Content-Type: application/json" \ -d '{"selector": "button.submit"}'Architecture
The implementation uses a layered architecture:
This allows the API to work seamlessly in both testing (mock) and production (real browser) environments without code changes.
Next Steps for Full Production Readiness
Integrate real BrowserPerceptiveAgent for agent endpoints✅ CompleteAdd NavigationController real browser integration✅ CompleteChecklist
This pull request was created as a result of the following prompt from Copilot chat.
> Create a new branch named feature/openapi-webdriver-api based on master in the platonai/Browser4 repository and open a pull request that adds a WebDriver-compatible HTTP API (selector-first extensions) with OpenAPI spec and a Spring Boot Kotlin server skeleton.
>
> Requirements / Deliverables:
> 1) Add OpenAPI spec file at openapi/openapi.yaml matching the previously prepared OpenAPI 3.1 skeleton (paths and components for session, navigation, selectors, element, interactions, screenshot, execute, control, events). Expose the file at /openapi.yaml in the server.
>
> 2) Add a new module pulsar-rest (or reuse existing pulsar-rest module if present) with a Spring Boot Kotlin app containing minimal, in-memory mock implementations:
> - PulsarRestApplication.kt (SpringBootApplication)
> - InMemoryStore.kt for sessions, elements, event-configs, subscriptions
> - DTOs in pulsar-rest/src/main/kotlin/ai/platon/pulsar/rest/api/dto/ApiDtos.kt for NewSessionRequest/Response, SetUrlRequest, SelectorRef, ElementRef, EventConfig, Event, ErrorResponse, and other needed types. Types should map to OpenAPI schemas.
> - Controllers with endpoints and behavior:
> * SessionController: POST /session, DELETE /session/{sessionId}, GET /session/{sessionId}
> * NavigationController: POST /session/{sessionId}/url, GET /session/{sessionId}/url, GET /session/{sessionId}/documentUri, GET /session/{sessionId}/baseUri
> * SelectorController: POST /session/{sessionId}/selectors/exists, /waitFor, /element, /elements, /click, /fill, /press, /outerHtml, /screenshot
> * ElementController: POST /session/{sessionId}/element, /elements; POST /session/{sessionId}/element/{elementId}/click and /value; GET /session/{sessionId}/element/{elementId}/attribute/{name} and /text
> * ScriptController: POST /session/{sessionId}/execute/sync and /async
> * ControlController: POST /session/{sessionId}/control/delay, /pause, /stop
> * EventsController: POST/GET /session/{sessionId}/event-configs, GET /session/{sessionId}/events, POST /session/{sessionId}/events/subscribe
> - Implement basic request validation and return WebDriver-style JSON wrapped responses { "value": ... } matching openapi. Use InMemoryStore to persist session and element mocks. For selector endpoints generate elementId by hashing selector string and return ElementRef accordingly.
>
> 3) Integrate springdoc-openapi UI to provide Swagger UI. Add dependency org.springdoc:springdoc-openapi-ui in pulsar-rest/pom.xml and configure to serve Swagger UI at /swagger-ui.html. Make sure /openapi.yaml is accessible (serve from resources/static or controller endpoint).
>
> 4) Update pulsar-rest/pom.xml to include spring-boot-starter-web, jackson-module-kotlin, springdoc-openapi-ui, spring-boot-starter-test dependencies as needed. Keep versions compatible with repository parent POM. Make minimal changes outside pulsar-rest and openapi/.
>
> 5) Add basic integration tests in pulsar-rest/src/test/kotlin/... using SpringBootTest and TestRestTemplate or MockMvc: tests should create a session, call POST /session/{id}/url, call selectors/exists and /selectors/element, asserting 200 responses and verifying the returned JSON shape contains expected keys (sessionId, value.exists or element reference).
>
> 6) CI and build: Ensure module builds. If build or dependency conflicts block implementation, stop and report the blocking error in the PR description and do not merge.
>
> 7) PR details: Open a PR from feature/openapi-webdriver-api into master with title 'Add WebDriver-compatible API (OpenAPI + server stubs + Swagger UI)'. PR body must include:
> - Summary of implemented endpoints, files added, and mock behavior limitations (no real browser integration yet).
> - How to run tests and start the server.
> - Example curl commands for create session, navigate, selectors/exists and selectors/click.
> - Any blockers or known issues.
>
> Constraints:
> - Keep changes isolated to pulsar-rest and openapi/ files; do not change other modules.
> - Ensure code is Kotlin idiomatic and logs incoming requests at debug level. Add X-Request-Id header in responses when feasible.
> - Maintain WebDriver response wrapping { "value": ... } to stay compatible.
>
> If any required module or build constraint prevents creating the pulsar-rest module, stop and report the blocking issue in the PR. If successful, provide the PR URL and list of changed files.
>
> Do not merge the PR. Create it in an open state for review.
>
> Repository: platonai/Browser4
Original prompt
This pull request was created as a result of the following prompt from Copilot chat.
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.