We propose the Semantic Complex Scenarios Video Object Segmentation (SeCVOS) benchmark, specifically designed to assess a model’s ability to perform high-level semantic reasoning across complex visual narratives. SeCVOS contains 160 carefully curated multi-shot videos characterized by: 1) Highly discontinuous frame sequences, 2) Frequent reappearance of objects across disparate scenes, and 3) Abrupt shot transitions and dynamic camera motion.
| Benchmark | #Videos | Avg. Duration (s) | Disapp. Rate | Avg. #Scene |
|---|---|---|---|---|
| DAVIS | 90 | 2.87 | 16.1% | 1.06 |
| YTVOS | 507 | 4.51 | 13.0% | 1.03 |
| MOSE | 311 | 8.68* | 41.5% | 1.06 |
| SA-V | 155 | 17.24 | 25.5% | 1.09 |
| LVOS | 140 | 78.36 | 7.8% | 1.47 |
| SeCVOS (ours) | 160 | 29.36 | 30.2% | 4.26 |
Our annotations are licensed under a CC BY-NC-SA 4.0 License. They are available strictly for non-commercial research.
We uphold the rights of individuals and copyright holders. If you are featured in any of our video annotations or hold copyright to a video and wish to have its annotation removed from our dataset, please reach out to us. Send an email to zhangzhixiong@pjlab.org.cn with the subject line beginning with SeCVOS, or raise an issue with the same title format. We commit to reviewing your request promptly and taking suitable action.