Fix: Align auxiliary output features with input view order #183

diankun-wu · 2025-12-23T16:36:59Z

Summary

This PR fixes a misalignment issue in aux_output (exported intermediate features) within _get_intermediate_layers_not_chunked.

Problem

Currently, when the model selects a dynamic reference view and reorders the input tensor x (placing the reference view at index 0), the main output is correctly restored to the original input order using restore_original_order.

However, the auxiliary features (aux_output) append x directly in its reordered state. This causes a mismatch where the order of features in output.aux does not correspond to the order of the input images (specifically when exporting features for layers where i >= alt_start), breaking downstream tasks that rely on index alignment (e.g., visualization, consistency checks).

Changes

Replicated the view restoration logic used for the main output in the auxiliary output block.
Added checks to ensure restore_original_order is only called when reordering actually occurred (checking b_idx and thresholds).

diankun-wu · 2025-12-23T17:02:34Z

Verification

To verify the fix, I visualized the exported intermediate features (layers 0-39) for an 8-frame sequence. I applied PCA to the high-dimensional features to convert them into RGB visualization.

Observation:

Before Fix: The features exhibited a sudden misalignment starting from the alt_start layer.
- Layers < alt_start: Features were correctly aligned with the input index.
- Layers >= alt_start: Once the model's view-reordering logic was triggered, aux_output became misaligned.
After Fix: The aux_output features remain correctly aligned with the input image order across all layers.

Visual Comparison

Frame Index	Before Fix (Misaligned)	After Fix (Correctly Aligned)
Frame 0 Layer0->39
Frame 1 Layer0->39

Click to expand Reproduction/Visualization Script

import glob, os, torch
from depth_anything_3.api import DepthAnything3
import numpy as np
from src.depth_anything_3.utils.pca_utils import pca_to_rgb_4d_bf16_percentile
import imageio

# Setup paths and model
example_path = "assets/examples/arkitscenes_41069025_8frames"
images = sorted(glob.glob(os.path.join(example_path, "*.png")))
device = torch.device("cuda")
model = DepthAnything3.from_pretrained("checkpoints/DA3NESTED-GIANT-LARGE-1.1")
model = model.to(device=device)

# Inference with feature export
prediction = model.inference(
    images,
    export_feat_layers=[i for i in range(40)],  # export all 40 layers
    process_res=640,
)

# Process features
# prediction.aux is a dict where keys are layer indices
all_features = [torch.from_numpy(prediction.aux[key]) for key in prediction.aux]   # list of (layer_idx, feature)
all_layer_feature = torch.stack(all_features)  # [Layers, Frames, H, W, D]
all_layer_feature = all_layer_feature.permute(1, 0, 2, 3, 4)  # [Frames, Layers, H, W, D]
T, L, H, W, D = all_layer_feature.shape

# Save visualization
output_dir = "DA3_feature_vis_verification"
os.makedirs(output_dir, exist_ok=True)

for frame_idx in range(all_layer_feature.shape[0]):
    layer_feature = all_layer_feature[frame_idx] # [Layers, H, W, D]
    video_feats = layer_feature.numpy()

    # PCA visualization (Layer by Layer progression)
    rgb_video_list = []
    for i in range(video_feats.shape[0]):
        rgb_video_list.append(pca_to_rgb_4d_bf16_percentile(
            video_feats[i:i+1],
            device='cuda',
            return_uint8=True,
        ))
    rgb_video = np.concatenate(rgb_video_list, axis=0)

    # Resize for better visibility
    rgb_tensor = torch.from_numpy(rgb_video).permute(0, 3, 1, 2).float()
    rgb_tensor = torch.nn.functional.interpolate(rgb_tensor, size=(H*14, W*14), mode='bilinear', align_corners=False)
    rgb_video = rgb_tensor.permute(0, 2, 3, 1).byte().numpy()
    
    # Save as MP4
    imageio.mimwrite(
        os.path.join(output_dir, f"frame_{frame_idx}_feat_evolution.mp4"),
        rgb_video,
        fps=2
    )
    print(f"Saved visualization for Frame {frame_idx}")

fix: align aux_output features with original input view order

657b13b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix: Align auxiliary output features with input view order #183

Fix: Align auxiliary output features with input view order #183

diankun-wu commented Dec 23, 2025

Uh oh!

diankun-wu commented Dec 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Fix: Align auxiliary output features with input view order #183

Are you sure you want to change the base?

Fix: Align auxiliary output features with input view order #183

Conversation

diankun-wu commented Dec 23, 2025

Summary

Problem

Changes

Uh oh!

diankun-wu commented Dec 23, 2025

Verification

Visual Comparison

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant