S²M²cinematic

Bidirectional Stereo Matching for Stereoscopic Source Material

First stereo matching model to predict negative disparities from stereoscopic source material pairs — enabling accurate Ground Truth generation for objects placed in front of the screen plane (pop-out / negative parallax effects).

The Problem

Every stereo matching model in the literature — S²M², FoundationStereo, RAFT-Stereo, and all others — predicts only positive disparities. This is correct for automotive and robotics, where objects never appear in front of the camera baseline.

In stereoscopic source material, this assumption fails completely.

Depth compositing artists deliberately place objects in front of the convergence plane (the screen surface) to create pop-out effects. These objects produce negative pixel shifts between the left and right eye views. A stereo matching engine that clips negative disparities to zero silently discards this information — producing systematically wrong Ground Truth for any scene with deliberate pop-out composition.

Screen plane (zero disparity)      ─────────────────────────
Objects behind screen (positive)   ─────────────── ←→ positive shift
Objects in front of screen (pop)   ←─────────────────────────→ negative shift
                                        (clipped to 0 by all existing models)

The Solution

Three minimal changes to S²M² unlock bidirectional matching:

# 1. Remove Optimal Transport positivity mask
use_positivity = False        # was: True  (torch.triu() in OT solver)

# 2. Remove output clamp (monkey-patched in forward())
# was: disp = disp.clamp(min=0)
# now: unclamped — negative disparities flow through

# 3. Bidirectional occlusion mask
# was: corr_x >= 0
# now: (corr_x >= 0) & (corr_x < w)

These changes are necessary but not sufficient. Without fine-tuning, the base model produces noisy, unphysical negative values. SignMagnitudeLoss trains the model to produce negative disparities that correspond to actual physical pop-out geometry, as measured by NCC block-matching.

Key Innovation: SignMagnitudeLoss

SignMagnitudeLoss = 0.30 × L_hinge(sign) + 0.70 × L_regression(magnitude)
                 + w_photo × L_photometric
                 + w_anchor × L_gt_anchor
                 + w_smooth × L_smoothness

Component	Weight	Purpose
Hinge (sign)	30%	Penalizes wrong sign — ensures pop-out regions go negative
Regression (magnitude)	70%	Pulls toward NCC-measured disparity magnitude
Photometric	—	Left-right reconstruction consistency
GT Anchor	—	Prevents collapse of positive disparity quality
Smoothness	—	Spatial regularization

Why magnitude regression matters: A pure sign loss teaches the model that pop-out regions should be negative, but not how negative. The 70% magnitude regression component drives the prediction toward the physically measured depth of each pop-out element — not just its direction.

The NCC block-matcher provides the regression target: actual pixel displacement between left and right eye views, without semantic priors, at 1/2 resolution.

Results — Epoch 1, Phase 1 (Encoder Frozen)

Validation on four reference frames after one complete training epoch, encoder frozen, trained on ~103K GT-anchor frames from 23 original stereoscopic source materials spanning 1953–2018.

Frame	Scene Type	Original S²M²	S²M²cinematic	NegPix	Asym. Ratio
Ref-A	Dramatic work, strong positive parallax	0.0 px	−24.5 px	7.4%	0.87
Ref-B	—	0.0 px	−17.8 px	17.2%	2.66
Ref-C	—	0.2 px	−15.2 px	52.4%	0.57
Ref-D	Large-format work, strong pop-out	0.0 px	−23.3 px	13.0%	0.90

Column notes:

Original S²M²: always ≤ 0.2px — the positivity constraint makes negative values structurally impossible
S²M²cinematic: negative values = objects in front of the screen plane
NegPix: fraction of pixels with negative disparity in this frame
Asymmetry Ratio: left/right disparity balance (1.0 = perfectly symmetric). Original S²M²: 0.09 and 13.05 (severely skewed). S²M²cinematic: 0.57–2.66 (genuinely bidirectional)

Training loss after Epoch 1:

Train = 8.5214    Val = 8.0864    (Val < Train → no overfitting)

Training continues through Phase 2 (unfrozen encoder). Final weights will be stronger — the proof of concept is established after Epoch 1.

Architecture

Input: Full-SBS stereo pair (left + right, side-by-side)
         │
         ▼
  S²M² L backbone (180.7M parameters, pretrained weights from S²M² repository)
  ┌─────────────────────────────────────────────────────┐
  │  use_positivity = False   ← OT positivity mask off  │
  │  clamp(min=0) removed     ← negative output allowed │
  │  bidirectional occlusion  ← symmetric search        │
  └─────────────────────────────────────────────────────┘
         │
         ▼
  SignMagnitudeLoss fine-tuning on stereoscopic source GT
  ┌─────────────────────────────────────────────────────┐
  │  NCC block-matching  → signed magnitude target       │
  │  Hinge loss (30%)    → correct sign enforcement      │
  │  Regression (70%)    → correct magnitude             │
  │  GT Anchor loss      → preserves positive quality    │
  └─────────────────────────────────────────────────────┘
         │
         ▼
  Signed disparity map [H, W]
     > 0  →  object behind screen (standard depth)
     = 0  →  convergence plane (screen surface)
     < 0  →  object in front of screen (pop-out)

Training Phases

Phase	Epochs	Encoder	Resolution	Trainable Params
Phase 1	1–5	Frozen	1920×1056	44.2M
Phase 2	6–50	Unfrozen	960×512	180.7M
Phase 3 (optional)	3–5	Unfrozen	960×512	180.7M

Early stopping in Phase 2: patience=7, min-delta=0.0005.

Repository Contents

s2m2cinematic/
├── finetune.py              Main fine-tuning script (two-phase training)
├── losses.py                SignMagnitudeLoss + PhotometricLoss + GTAnchorLoss
├── dataset.py               StereoDataset (Full-SBS stereo source pairs)
├── quicktest.py             100-step proof-of-concept test on single frames
├── validate.py              Comparison: cinematic vs. original on reference frames
├── prepare_gt.py            Sample GT-anchor disparities from stereoscopic source corpus
├── export_engine.py         ONNX graph surgery + TRT FP16 export
├── translucency_loss.py     TranslucencyLoss for semi-transparent surfaces
└── s2m2translucent_loss.py  Combined loss with translucency awareness

Not included: Pre-trained weights, training frames, GT disparity files. See Note on Weights and Data.

Requirements

Python 3.10+
PyTorch 2.8+ with CUDA 12.x
S²M² codebase
NVIDIA GPU ≥ 24 GB VRAM (tested on Blackwell-architecture GPU)
Full-SBS stereoscopic source material (your own collection)

For TRT export on NVIDIA Blackwell (sm_120 / RTX 5090), export_engine.py includes an automatic ONNX graph surgery workaround for 25 AveragePool nodes incompatible with the sm_120 architecture.

Quick Start

1. Prepare GT anchors

python prepare_gt.py \
    --root /path/to/your/data \
    --disparity_dir /path/to/your/data/gt_anchor \
    --n_per_source 2500

2. Quick proof-of-concept (~5 minutes)

python quicktest.py \
    --root /path/to/your/data \
    --disparity_dir /path/to/your/data/gt_anchor \
    --frames frame_001000 \
    --steps 100

Expected: negative Min disparity on frames with genuine pop-out content.

3. Full fine-tuning

python finetune.py \
    --root /path/to/your/data \
    --disparity_dir /path/to/your/data/gt_anchor \
    --model_type L \
    --p1_image_height 1056 \
    --p1_image_width 1920 \
    --image_height 512 \
    --image_width 960 \
    --restart

4. Validate

python validate.py \
    --root /path/to/your/data \
    --weights checkpoints/s2m2cinematic_best.pth \
    --frames frame_001000 frame_005000 frame_010000

5. Export to TensorRT FP16

python export_engine.py \
    --weights checkpoints/s2m2cinematic_best.pth

Note on Weights and Data

Pre-trained weights are not included.

The model was fine-tuned on original stereoscopic source material (Full-SBS format). The legal status of using such material as AI training data under German copyright law (§44b UrhG) and EU DSM Directive Art. 4 is subject to ongoing review.

You can train your own model using Full-SBS stereoscopic source material from your personal collection. No source-specific data is included in this repository.

Comparison with Existing Models

Model	Negative Disparities	Physical Measurement	Stereo Domain
S²M² (original)	❌ clipped to 0	✅ stereo matching	❌
FoundationStereo	❌ clipped to 0	⚠️ semantic priors	❌
RAFT-Stereo	❌ clipped to 0	✅ stereo matching	❌
MiDaS / DPT	❌ affine-invariant	❌ pseudo-labels	❌
Depth Anything	❌ affine-invariant	❌ pseudo-labels	❌
S²M²cinematic	✅ bidirectional	✅ NCC-anchored	✅

Relation to CCS

S²M²cinematic is the Ground Truth engine for the Creative Cinematic Stereographing (CCS) project, which trains a monocular depth model for 2D-to-3D conversion on physically measured, bidirectional disparities from original stereoscopic source material.

The key insight: any GT engine that clips negative disparities to zero produces systematically wrong Ground Truth for stereoscopic source material. Pop-out elements — the most deliberate depth decisions in a depth compositing artist's work — are silently converted to zero disparity. A depth model trained on this GT learns an incomplete picture of compositional depth.

Citation

@misc{artoist2026s2m2cinematic,
  author       = {DepthAuteur},
  title        = {S2M2cinematic: Bidirectional Stereo Matching
                  for Stereoscopic Source Material},
  year         = {2026},
  month        = {March},
  address      = {},
  howpublished = {\url{https://github.com/DepthAuteur/s2m2cinematic}},
  note         = {Fine-tuning infrastructure for bidirectional stereo
                  matching. Pre-trained weights not included.}
}

If you use this work, please also cite the original S²M²:

@inproceedings{min2025s2m2,
  title     = {S2M2: Stereo Matching on Stereo Matching},
  author    = {Min, Juhyung and others},
  booktitle = {ICCV},
  year      = {2025}
}

Acknowledgments

Built on S²M² by Min et al. (ICCV 2025).

Architect: DepthAuteur Architectural review: DepthAuteur, Claude Opus 4.6 (Anthropic). Implementation: DepthAuteur, Claude Sonnet 4.6 (Anthropic).

License

Code: Apache 2.0 — see LICENSE.

Pre-trained weights (when released): CC BY-NC 4.0.

DepthAuteur — Cinephile · March 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

S²M²cinematic

Bidirectional Stereo Matching for Stereoscopic Source Material

The Problem

The Solution

Key Innovation: SignMagnitudeLoss

Results — Epoch 1, Phase 1 (Encoder Frozen)

Architecture

Training Phases

Repository Contents

Requirements

Quick Start

1. Prepare GT anchors

2. Quick proof-of-concept (~5 minutes)

3. Full fine-tuning

4. Validate

5. Export to TensorRT FP16

Note on Weights and Data

Comparison with Existing Models

Relation to CCS

Citation

Acknowledgments

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
CITATION.md		CITATION.md
CONCEPT.md		CONCEPT.md
LICENSE		LICENSE
README.md		README.md
RESULTS.md		RESULTS.md
dataset.py		dataset.py
export_engine.py		export_engine.py
finetune.py		finetune.py
losses.py		losses.py
prepare_gt.py		prepare_gt.py
quicktest.py		quicktest.py
s2m2translucent_loss.py		s2m2translucent_loss.py
translucency_loss.py		translucency_loss.py
validate.py		validate.py

Folders and files

Latest commit

History

Repository files navigation

S²M²cinematic

Bidirectional Stereo Matching for Stereoscopic Source Material

The Problem

The Solution

Key Innovation: SignMagnitudeLoss

Results — Epoch 1, Phase 1 (Encoder Frozen)

Architecture

Training Phases

Repository Contents

Requirements

Quick Start

1. Prepare GT anchors

2. Quick proof-of-concept (~5 minutes)

3. Full fine-tuning

4. Validate

5. Export to TensorRT FP16

Note on Weights and Data

Comparison with Existing Models

Relation to CCS

Citation

Acknowledgments

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages