A scene detection script based on Cotracker2.
The idea is that tracking points' visibility will drop significantly when scene transitions happen.
It produces more reliable results compared to PySceneDetect in our experiments on anime clips.
conda env create -f conda_env.yaml
conda activate videoclip
- dump video file list given
VIDEO_ROOT_DIR
python video2clip.py dump_video_list --video_dir VIDEO_ROOT_DIR --create_metadata
it will print dumped METADATA_PATH and VIDEO_LIST_PATH
- Run the scene detect script
python video2clip.py video2clip \
--video_list VIDEO_LIST_PATH \
--metadata METADATA_PATH \
--video_root_dir VIDEO_ROOT_DIR \
--save_dir SAVE_DIR
or run it in multi-process mode given CUDA_VISIBLE_DEVICES:
python video2clip.py video2clip \
--video_list VIDEO_LIST_PATH \
--metadata METADATA_PATH \
--save_dir SAVE_DIR \
--video_root_dir VIDEO_ROOT_DIR \
--devices 0,1,2,3
More options:
Usage: video2clip.py video2clip [OPTIONS]
Options:
--video_list TEXT
--metadata TEXT
--save_dir TEXT
--start_idx INTEGER
--short_side_max INTEGER frames with short side larger than this value will
be downscaled
--seq_chunk INTEGER
--min_seq_len INTEGER sequence length less than this value will be
discarded
--max_seq_len INTEGER
--ffmpeg TEXT
--seed INTEGER
--devices TEXT
--video_root_dir TEXT
--resume try to resume from last run
--help Show this message and exit.
- The video2clip script will encode output as av1, which might introduce much overhead for both encoding and decoding
- It utilizes optical flow and SSIM to de-duplicate and filter stationary scenes, frames with too much text were also filtered out. Currently, these arguments are not exposed, it might be necessary to modify the source code to suit your needs.
- It will also save tracking results and other infos to annotation files with suffix
.json.gz, to load these annotation files:
from utils.io_utils import json2dict
print(
json2dict("FILE_PATH.json.gz")
)