Fix normalization in vision transformer outputs #178

Golbstein · 2025-12-20T16:24:10Z

When cat_token=True, the code manually splits the output tensor to fit self.norm (which expects embed_dim). It appears the normalization step was accidentally skipped for the first half (local features), while the second half is correctly normalized.

This results in a single output tensor containing mixed scales, which contradicts the behavior of cat_token=False where the full signal is normalized.

When cat_token=True, the code manually splits the output tensor to fit self.norm (which expects embed_dim). It appears the normalization step was accidentally skipped for the first half (local features), while the second half is correctly normalized. This results in a single output tensor containing mixed scales, which contradicts the behavior of cat_token=False where the full signal is normalized.

Golbstein · 2025-12-20T16:41:25Z

because then in dual dpt you take the concat signal with one normalized and other unnormalized and normalize it:

            x = feats[take_idx][:, patch_start_idx:]
            x = self.norm(x)

here:

Depth-Anything-3/src/depth_anything_3/model/dualdpt.py

Line 208 in 2c21ea8

def _forward_impl(

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix normalization in vision transformer outputs #178

Fix normalization in vision transformer outputs #178

Uh oh!

Golbstein commented Dec 20, 2025

Uh oh!

Golbstein commented Dec 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Fix normalization in vision transformer outputs #178

Are you sure you want to change the base?

Fix normalization in vision transformer outputs #178

Uh oh!

Conversation

Golbstein commented Dec 20, 2025

Uh oh!

Golbstein commented Dec 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant