Extract semantically distinct still images from video.
SceneSplit analyzes video files and extracts representative frames that capture meaningful visual changes. Unlike simple keyframe extraction that relies on codec markers, SceneSplit uses vision embeddings to detect semantic transitions—scene changes, subject shifts, and visual content boundaries.
- Semantic analysis — Uses ResNet50 embeddings to detect meaningful visual changes, not just codec keyframes
- Offline processing — All analysis runs locally after initial model download
- Two-knob configuration — Control output granularity (
--detail) and processing fidelity (--quality) - Auto-download model — ResNet50 ONNX model (~100MB) downloads automatically on first run
- Structured output — Numbered images plus
metadata.jsonwith timestamps and frame indices
- Rust 1.70+ — Install Rust
- OpenCV 4.x — Required for video decoding
- ONNX Runtime — Required for ML inference
brew install opencv onnxruntimesudo apt install libopencv-dev
# ONNX Runtime: download from https://github.com/microsoft/onnxruntime/releasesgit clone https://github.com/your-org/scenesplit.git
cd scenesplit
cargo build --releaseThe binary is at target/release/scenesplit.
# Extract frames with default settings
scenesplit video.mp4
# Minimal output (5-10 frames)
scenesplit -d key video.mp4
# Comprehensive extraction with highest quality
scenesplit -d all -q best -o ./frames video.mp4scenesplit [OPTIONS] <VIDEO>
| Argument | Description |
|---|---|
<VIDEO> |
Path to the input video file |
| Option | Default | Description |
|---|---|---|
-d, --detail <LEVEL> |
summary |
Granularity level |
-q, --quality <PRESET> |
balanced |
Processing quality |
-m, --model <PATH> |
auto-download | Custom ONNX model file |
-o, --output <DIR> |
./scenesplit_output/ |
Output directory |
-s, --quiet |
off | Suppress progress output |
| Level | Frames | Use Case |
|---|---|---|
key |
5-10 | Major scene changes only |
summary |
10-20 | Representative frames |
all |
20-30 | Comprehensive semantic changes |
| Preset | Speed | Fidelity |
|---|---|---|
fast |
Fastest | Lower |
balanced |
Moderate | Good |
best |
Slowest | Highest |
SceneSplit creates a directory containing:
scenesplit_output/
├── frame_001.png
├── frame_002.png
├── frame_003.png
├── ...
└── metadata.json
{
"source_video": "video.mp4",
"detail_level": "summary",
"quality_preset": "balanced",
"frames": [
{
"filename": "frame_001.png",
"frame_index": 0,
"timestamp_seconds": 0.0
},
{
"filename": "frame_002.png",
"frame_index": 45,
"timestamp_seconds": 1.5
}
]
}- MP4
- AVI
- MOV
- MKV
- WebM
Format support depends on your OpenCV build and available codecs.
- Frame extraction — Samples frames from video at a rate determined by quality preset
- Embedding computation — Passes frames through ResNet50 to generate semantic feature vectors
- Similarity analysis — Computes cosine similarity between consecutive embeddings
- Segmentation — Detects boundaries where similarity drops below threshold (controlled by detail level)
- Selection — Chooses representative frames from each segment
MIT