Where do the bounding boxes used in creating the AS-V2 dataset come from?

Thank you for the excellent work on ASMv2. 
In the paper, you mention that when creating the AS-V2 dataset, the bounding boxes of objects are used as part of the prompt for GPT-4V. However, the process of obtaining these bounding boxes wasn't explained. 
Could you describe the workflow for acquiring the bounding boxes?