+
+
+
+ | Model |
+ Instance Segmentation |
+ Box Detection |
+
+
+ | LVIS |
+ SA-Co/Gold |
+ LVIS |
+ COCO |
+ SA-Co/Gold |
+
+
+ | cgF1 |
+ AP |
+ cgF1 |
+ cgF1 |
+ AP |
+ AP |
+ APo
+ |
+ cgF1 |
+
+
+
+
+ | Human |
+ - |
+ - |
+ 72.8 |
+ - |
+ - |
+ - |
+ - |
+ 74.0 |
+
+
+ | OWLv2* |
+ 29.3 |
+ 43.4 |
+ 24.6 |
+ 30.2 |
+ 45.5 |
+ 46.1 |
+ 23.9 |
+ 24.5 |
+
+
+ | DINO-X |
+ - |
+ 38.5 |
+ 21.3 |
+ - |
+ 52.4 |
+ 56.0 |
+ - |
+ 22.5 |
+
+
+ | Gemini 2.5 |
+ 13.4 |
+ - |
+ 13.0 |
+ 16.1 |
+ - |
+ - |
+ - |
+ 14.4 |
+
+
+ | SAM 3 |
+ 37.2 |
+ 48.5 |
+ 54.1 |
+ 40.6 |
+ 53.6 |
+ 56.4 |
+ 55.7 |
+ 55.7 |
+
+
+
+
+
* Partially trained on LVIS, APo refers to COCO-O accuracy
+
+
+
+## Video Results
+
+