Home page examples show the agent failing at the basics #133
danhallock
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I looked at the home page to see what this project is about. There are examples — great.
First up is a three-part video showing the product in action. The user asks the agent to order them Sensodyne toothpaste on Amazon. The agent browses to the Amazon homepage, and searches for 'Sensodyne toothpaste'. Several toothpastes are returned, but only one Sensodyne, however, the agent says that 'the search results page shows multiple Sensodyne products.' OK…. Anyway, it selects the first result, which is in fact Sensodyne toothpase and adds it to the cart. The agent proceeds to the checkout page, and says it has completed the tasks, then contradicts itself, noting that it needs to go through more checkout flow to finish the order. It does so, and places the order. It then tells the user, incorrectly, that although it added the item to the cart and proceeded to the Pending Order page, it 'cannot complete the order as it requires further user interaction for payment and shipping details' and that 'the order is currently pending.'
The video proceeds to the 'fill out your forms with ease' segment. This one seemed to work as advertised. I'll skip over the ChatGPT sidebar chat.
Okay. Scroll down the home page to the next demonstration. The user asks the agent to compare prices for AirPods Max on Amazon, Walmart and Target. The agent browses to these site and searches. On Amazon, the AirPods Max are $499, but the search results also include Sony WH1000XM6 at $448. At Walmart, there are AirPods Max at $499 and $599 with refurbs at $349. At Target, there are AirPods Max at $549, with refurbs at $394. The agent summarizes this by saying that Amazon has the lowest price at $448 — giving the price for the Sony headphones it found, not the AirPods, and not noting the refurb options at the other two stores.
The next example is to go to the Tesla Model 3 inventory page and extract title, mileage and prices to a Google Doc. Of the six vehicles visible in the results, three of them are represented correctly in the Google Doc that it creates, which has 21 results. Some of the others in the Google Doc match either the miles or the price from one of the displayed results, but not both. (No idea from the little video how the extraction went for vehicles further down the page.)
For the "Research" example, I don't have enough information to judge its output.
I came into this just a bit curious, not knowing much of anything about this project, but are these really meant to represent it at its best?
Beta Was this translation helpful? Give feedback.
All reactions