Llama Vision Model Integration for Actions

Integrate the Llama Vision Model to interpret UI screenshots and generate browser actions.  

1. Set up Llama Vision Model in the FastAPI backend to process UI screenshots.  
2. Create an endpoint to accept images, predict coordinates, and return actions (e.g., click, scroll).  
3. Write logic to determine coordinates and actions for Puppeteer based on Llama Vision predictions.  
4. Test basic tasks like clicking buttons or scrolling on sample pages.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Llama Vision Model Integration for Actions #2

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Llama Vision Model Integration for Actions #2

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions