Skip to content

Conversation

@Golbstein
Copy link
Contributor

When cat_token=True, the code manually splits the output tensor to fit self.norm (which expects embed_dim). It appears the normalization step was accidentally skipped for the first half (local features), while the second half is correctly normalized.

This results in a single output tensor containing mixed scales, which contradicts the behavior of cat_token=False where the full signal is normalized.

When cat_token=True, the code manually splits the output tensor to fit self.norm (which expects embed_dim). It appears the normalization step was accidentally skipped for the first half (local features), while the second half is correctly normalized.

This results in a single output tensor containing mixed scales, which contradicts the behavior of cat_token=False where the full signal is normalized.
@Golbstein
Copy link
Contributor Author

because then in dual dpt you take the concat signal with one normalized and other unnormalized and normalize it:

            x = feats[take_idx][:, patch_start_idx:]
            x = self.norm(x)

here:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant