Hi, @gordonhu608
Thanks for great work.
I tried inference_chat.py to ask some other questions, like the comparison of the locations, but G2VLM seems could not answer properly. I'm not sure if I use model correctly, could you help me figure it? My case:
Fig:

Question:
which one is closer to me? The blue chair or the white shoes
Answer:
The observer is 1.4 meters away from the center of shoes (red point)
It seems can only answer the question about the depth.
Thank you!