Skip to content

Conversation

@diankun-wu
Copy link

Summary

This PR fixes a misalignment issue in aux_output (exported intermediate features) within _get_intermediate_layers_not_chunked.

Problem

Currently, when the model selects a dynamic reference view and reorders the input tensor x (placing the reference view at index 0), the main output is correctly restored to the original input order using restore_original_order.

However, the auxiliary features (aux_output) append x directly in its reordered state. This causes a mismatch where the order of features in output.aux does not correspond to the order of the input images (specifically when exporting features for layers where i >= alt_start), breaking downstream tasks that rely on index alignment (e.g., visualization, consistency checks).

Changes

  • Replicated the view restoration logic used for the main output in the auxiliary output block.
  • Added checks to ensure restore_original_order is only called when reordering actually occurred (checking b_idx and thresholds).

@diankun-wu
Copy link
Author

Verification

To verify the fix, I visualized the exported intermediate features (layers 0-39) for an 8-frame sequence. I applied PCA to the high-dimensional features to convert them into RGB visualization.

arkitscenes_41069025_8frames

Observation:

  • Before Fix: The features exhibited a sudden misalignment starting from the alt_start layer.
    • Layers < alt_start: Features were correctly aligned with the input index.
    • Layers >= alt_start: Once the model's view-reordering logic was triggered, aux_output became misaligned.
  • After Fix: The aux_output features remain correctly aligned with the input image order across all layers.

Visual Comparison

Frame Index Before Fix (Misaligned) After Fix (Correctly Aligned)
Frame 0 Layer0->39 frame_0 frame_0
Frame 1 Layer0->39 frame_1 frame_1
Click to expand Reproduction/Visualization Script
import glob, os, torch
from depth_anything_3.api import DepthAnything3
import numpy as np
from src.depth_anything_3.utils.pca_utils import pca_to_rgb_4d_bf16_percentile
import imageio

# Setup paths and model
example_path = "assets/examples/arkitscenes_41069025_8frames"
images = sorted(glob.glob(os.path.join(example_path, "*.png")))
device = torch.device("cuda")
model = DepthAnything3.from_pretrained("checkpoints/DA3NESTED-GIANT-LARGE-1.1")
model = model.to(device=device)

# Inference with feature export
prediction = model.inference(
    images,
    export_feat_layers=[i for i in range(40)],  # export all 40 layers
    process_res=640,
)

# Process features
# prediction.aux is a dict where keys are layer indices
all_features = [torch.from_numpy(prediction.aux[key]) for key in prediction.aux]   # list of (layer_idx, feature)
all_layer_feature = torch.stack(all_features)  # [Layers, Frames, H, W, D]
all_layer_feature = all_layer_feature.permute(1, 0, 2, 3, 4)  # [Frames, Layers, H, W, D]
T, L, H, W, D = all_layer_feature.shape

# Save visualization
output_dir = "DA3_feature_vis_verification"
os.makedirs(output_dir, exist_ok=True)

for frame_idx in range(all_layer_feature.shape[0]):
    layer_feature = all_layer_feature[frame_idx] # [Layers, H, W, D]
    video_feats = layer_feature.numpy()

    # PCA visualization (Layer by Layer progression)
    rgb_video_list = []
    for i in range(video_feats.shape[0]):
        rgb_video_list.append(pca_to_rgb_4d_bf16_percentile(
            video_feats[i:i+1],
            device='cuda',
            return_uint8=True,
        ))
    rgb_video = np.concatenate(rgb_video_list, axis=0)

    # Resize for better visibility
    rgb_tensor = torch.from_numpy(rgb_video).permute(0, 3, 1, 2).float()
    rgb_tensor = torch.nn.functional.interpolate(rgb_tensor, size=(H*14, W*14), mode='bilinear', align_corners=False)
    rgb_video = rgb_tensor.permute(0, 2, 3, 1).byte().numpy()
    
    # Save as MP4
    imageio.mimwrite(
        os.path.join(output_dir, f"frame_{frame_idx}_feat_evolution.mp4"),
        rgb_video,
        fps=2
    )
    print(f"Saved visualization for Frame {frame_idx}")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant