Skip to content

Puppeteer Task Execution #3

@SharanyaSD

Description

@SharanyaSD

Implement Puppeteer logic to execute actions based on Llama Vision Model outputs.

  1. Use Puppeteer to take screenshots of the browser for vision model input.
  2. Send screenshots to the FastAPI backend and receive actions with coordinates.
  3. Implement Puppeteer functions to perform actions (e.g., click, scroll) based on coordinates.
  4. Test navigation workflows with multiple pages (e.g., clicking a "Next" button).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions