Depth seg new op #862

archernsy · 2025-12-22T07:26:53Z

No description provided.

gemini-code-assist · 2025-12-22T07:27:41Z

Summary of Changes

Hello @archernsy, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the video processing functionalities by integrating two advanced computer vision operators. The DepthAnythingV2Mapper enables the generation of detailed depth maps from video content, providing crucial 3D scene understanding. Concurrently, the SAMAutoMaskMapper introduces a robust automatic object segmentation pipeline, utilizing SAM1 for initial object detection on keyframes and SAM2 for efficient, temporally consistent mask propagation throughout the video. These additions broaden the scope of video data enrichment and analysis within the framework.

Highlights

New Depth Estimation Operator: Introduces DepthAnythingV2Mapper for generating detailed depth maps from video frames using the Depth Anything V2 model, enhancing 3D scene understanding capabilities.
New Automatic Mask Generation Operator: Adds SAMAutoMaskMapper which leverages SAM1 for initial mask detection on keyframes and SAM2 for efficient, temporally consistent mask propagation throughout video sequences.
Integration of Advanced Computer Vision Models: Incorporates complex and state-of-the-art model architectures such as DINOv2, DPT, SAM1, and SAM2, along with their respective utility components and custom CUDA kernels, into the data processing framework.
Enhanced Video Processing Capabilities: Significantly expands the library's functionalities to include sophisticated video analysis tasks, providing tools for both depth perception and robust object segmentation.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

archernsy · 2025-12-22T07:31:50Z

This PR introduces two new operators:

DepthAnythingv2: For single-frame depth extraction from videos.
Autoseg2: A fully automatic video segmentation tool (based on SAM1 and SAM2). It can automatically segment all content in a video, dividing frames into different objects or regions without requiring manual prompts.
All feedback and suggestions are welcome

gemini-code-assist

Code Review

This pull request introduces two new operators, DepthAnythingV2Mapper and SAMAutoMaskMapper, for depth estimation and automatic video segmentation. The changes are extensive and involve adding a significant amount of model code, seemingly from external repositories. My review focuses on the integration of these models as data_juicer operators. I've identified several critical and high-severity issues related to dependency management (cloning git repos at runtime), hardcoded absolute paths, performance (inefficient model loading), and potential bugs (incorrect sorting). These issues could affect the portability, maintainability, and performance of the new operators and should be addressed.

gemini-code-assist · 2025-12-22T07:34:39Z

data_juicer/ops/mapper/sam_auto_mask_mapper.py

+        self,
+        sam1_checkpoint_path: str = "./checkpoints/sam1/sam_vit_h_4b8939.pth",
+        sam2_checkpoint_path: str = "./checkpoints/sam2/sam2_hiera_large.pt",
+        sam2_model_cfg: str = "/mnt/workspace/junyuanxiao/code/AutoSeg-SAM2/sam2_configs/sam2_hiera_l.yaml",


The default value for sam2_model_cfg is a hardcoded absolute path to a specific user's workspace (/mnt/workspace/junyuanxiao/code/AutoSeg-SAM2/sam2_configs/sam2_hiera_l.yaml). This will cause the code to fail on any other machine. This should be changed to a relative path, a path configurable via an environment variable, or downloaded to a cache directory. The logic in _load_models to find this config file also contains several hardcoded paths and is brittle. A more robust way to handle model configurations and dependencies is needed.

gemini-code-assist · 2025-12-22T07:34:39Z

data_juicer/ops/mapper/depth_anything_v2/dpt.py

+    def infer_image(self, raw_image, input_size=518):
+        image, (h, w) = self.image2tensor(raw_image, input_size)
+
+        depth = self.forward(image)
+
+        depth = F.interpolate(depth[:, None], (h, w), mode="bilinear", align_corners=True)[0, 0]
+
+        return depth.cpu().numpy()


To fix the hardcoded device issue in image2tensor, this method should determine the model's device and pass it down.

Suggested change

def infer_image(self, raw_image, input_size=518):

image, (h, w) = self.image2tensor(raw_image, input_size)

depth = self.forward(image)

depth = F.interpolate(depth[:, None], (h, w), mode="bilinear", align_corners=True)[0, 0]

return depth.cpu().numpy()

def infer_image(self, raw_image, input_size=518):

device = next(self.parameters()).device

image, (h, w) = self.image2tensor(raw_image, input_size, device=device)

depth = self.forward(image)

depth = F.interpolate(depth[:, None], (h, w), mode="bilinear", align_corners=True)[0, 0]

return depth.cpu().numpy()

gemini-code-assist · 2025-12-22T07:34:39Z

data_juicer/ops/mapper/depth_anything_v2_mapper.py

+            from .depth_anything_v2.dpt import DepthAnythingV2
+
+            # Load model
+            depth_anything = DepthAnythingV2(**self.model_configs[self.encoder])
+            model_path = os.path.join(self.checkpoint_path, f"depth_anything_v2_{self.encoder}.pth")
+
+            if not os.path.exists(model_path):
+                raise FileNotFoundError(f"Model checkpoint not found at {model_path}")
+
+            depth_anything.load_state_dict(torch.load(model_path, map_location='cpu'))
+
+            device = "cuda" if self.use_cuda() else "cpu"
+            if rank is not None and self.use_cuda():
+                device = f"cuda:{rank}"
+
+            depth_anything = depth_anything.to(device).eval()


The model is loaded within the process_single method. This is inefficient as it will be reloaded for every sample processed by this worker if the result is not already cached. This can significantly slow down processing. The model should be loaded once per worker, for example in the __init__ method, and stored as an instance attribute (e.g., self.model). A helper method can be used for lazy loading on the first call if you want to avoid loading the model during initialization of all workers.

gemini-code-assist · 2025-12-22T07:34:39Z

data_juicer/ops/mapper/depth_anything_v2/dpt.py

+    def image2tensor(self, raw_image, input_size=518):        
+        transform = Compose([
+            Resize(
+                width=input_size,
+                height=input_size,
+                resize_target=False,
+                keep_aspect_ratio=True,
+                ensure_multiple_of=14,
+                resize_method='lower_bound',
+                image_interpolation_method=cv2.INTER_CUBIC,
+            ),
+            NormalizeImage(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
+            PrepareForNet(),
+        ])
+
+        h, w = raw_image.shape[:2]
+
+        image = cv2.cvtColor(raw_image, cv2.COLOR_BGR2RGB) / 255.0
+
+        image = transform({'image': image})['image']
+        image = torch.from_numpy(image).unsqueeze(0)
+
+        DEVICE = 'cuda' if torch.cuda.is_available() else 'mps' if torch.backends.mps.is_available() else 'cpu'
+        image = image.to(DEVICE)
+
+        return image, (h, w)


The image2tensor method hardcodes the device selection. This can cause device mismatches if the model is on a different device (e.g., a specific GPU like cuda:1). The device should be passed in as an argument from the caller. This makes the component more modular and prevents potential runtime errors.

def image2tensor(self, raw_image, input_size=518, device='cpu'): transform = Compose([ Resize( width=input_size, height=input_size, resize_target=False, keep_aspect_ratio=True, ensure_multiple_of=14, resize_method='lower_bound', image_interpolation_method=cv2.INTER_CUBIC, ), NormalizeImage(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]), PrepareForNet(), ]) h, w = raw_image.shape[:2] image = cv2.cvtColor(raw_image, cv2.COLOR_BGR2RGB) / 255.0 image = transform({'image': image})['image'] image = torch.from_numpy(image).unsqueeze(0) image = image.to(device) return image, (h, w)

gemini-code-assist · 2025-12-22T07:34:39Z

data_juicer/ops/mapper/depth_anything_v2_mapper.py

+        if not os.path.exists(depth_anything_v2_repo_path):
+            import subprocess
+            subprocess.run([
+                "git",
+                "clone",
+                "https://github.com/DepthAnything/Depth-Anything-V2.git",
+                depth_anything_v2_repo_path,
+            ], check=True)


This operator clones a git repository (Depth-Anything-V2) into a cache directory during initialization. This introduces several potential issues:

Side Effects: It's an unexpected side effect for an operator to perform network operations and write to the filesystem outside of its designated output directories.

Dependencies: It creates a dependency on git being installed and network access to GitHub.

Fragility: The clone might fail due to network issues, permissions, or if the repository is moved or deleted.
It would be more robust to either vendor the necessary code within this repository or treat it as a proper dependency that users must install beforehand, with clear instructions in the documentation.

gemini-code-assist · 2025-12-22T07:34:39Z

data_juicer/ops/mapper/depth_anything_v2/dinov2.py

+        # w0, h0 = w0 + 0.1, h0 + 0.1
+
+        sqrt_N = math.sqrt(N)
+        sx, sy = float(w0) / sqrt_N, float(h0) / sqrt_N
+        patch_pos_embed = nn.functional.interpolate(
+            patch_pos_embed.reshape(1, int(sqrt_N), int(sqrt_N), dim).permute(0, 3, 1, 2),
+            scale_factor=(sx, sy),
+            # (int(w0), int(h0)), # to solve the upsampling shape issue


There are commented-out lines of code here that seem to be leftovers from development or debugging. They should be removed to improve code clarity.

gemini-code-assist · 2025-12-22T07:34:39Z

data_juicer/ops/mapper/depth_anything_v2/dinov2_layers/block.py

+            )
+        elif self.training and self.sample_drop_ratio > 0.0:
+            x = x + self.drop_path1(attn_residual_func(x))
+            x = x + self.drop_path1(ffn_residual_func(x))  # FIXME: drop_path2


There's a FIXME comment here, and it seems correct. You are using self.drop_path1 for the FFN residual path, but it should probably be self.drop_path2. Using the same drop path instance might not be intended and could be a bug if drop probabilities are different (though here they are the same). It's better to use the correct one for clarity and correctness.

Suggested change

x = x + self.drop_path1(ffn_residual_func(x)) # FIXME: drop_path2

x = x + self.drop_path2(ffn_residual_func(x))

gemini-code-assist · 2025-12-22T07:34:39Z

data_juicer/ops/mapper/depth_anything_v2_mapper.py

+        checkpoint_path: str = "./checkpoints",
+        encoder: str = "vitl",
+        input_size: int = 518,
+        frame_num: int = 10,  # 改为 int 类型，默认值改为正整数


This comment is in Chinese, while the rest of the docstring and code is in English. This is likely a leftover development note. For consistency and clarity, it should be removed as the docstring already explains the frame_num parameter.

Suggested change

frame_num: int = 10, # 改为 int 类型，默认值改为正整数

frame_num: int = 10,

gemini-code-assist · 2025-12-22T07:34:39Z

data_juicer/ops/mapper/sam_auto_mask_mapper.py

+
+        # Add the repository path to sys.path
+        import sys
+        sys.path.append(auto_seg_sam2_repo_path)


Modifying sys.path at runtime is generally discouraged as it can lead to unpredictable behavior and makes dependency management difficult. It's better to structure the project so that such modifications are not necessary, for example by vendoring the dependency or installing it as a proper package.

gemini-code-assist · 2025-12-22T07:34:40Z

data_juicer/ops/mapper/sam_auto_mask_mapper.py

+        """
+        准备符合 SAM2 要求的帧文件（创建临时目录并复制重命名文件）
+        """


The docstring for this method is in Chinese, which is inconsistent with the English used in the rest of the file. Please translate it for better maintainability.

Suggested change

"""

准备符合 SAM2 要求的帧文件（创建临时目录并复制重命名文件）

"""

"""

Prepare frame files in a format required by SAM2 (create a temporary

directory and copy/rename files).

"""

archernsy added 2 commits December 22, 2025 12:59

depthanythingv2-autoseg-new-ops

e17c1a9

depthanythingv2-autoseg-new-ops

79c83eb

Copilot AI review requested due to automatic review settings December 22, 2025 07:26

archernsy requested a deployment to Testing December 22, 2025 07:26 — with GitHub Actions Waiting

Copilot started reviewing on behalf of archernsy December 22, 2025 07:27 View session

yxdyc requested review from Cathy0908 and Qirui-jiao and removed request for Copilot December 22, 2025 07:29

yxdyc added the dj:op issues/PRs about some specific OPs label Dec 22, 2025

gemini-code-assist bot reviewed Dec 22, 2025

View reviewed changes

	x = x + self.drop_path1(ffn_residual_func(x)) # FIXME: drop_path2
	x = x + self.drop_path2(ffn_residual_func(x))

	frame_num: int = 10, # 改为 int 类型，默认值改为正整数
	frame_num: int = 10,

-        """
-        准备符合 SAM2 要求的帧文件（创建临时目录并复制重命名文件）
-        """
+        """
+        Prepare frame files in a format required by SAM2 (create a temporary
+        directory and copy/rename files).
+        """

Depth seg new op #862

Are you sure you want to change the base?

Depth seg new op #862

Uh oh!

Conversation

archernsy commented Dec 22, 2025

Uh oh!

gemini-code-assist bot commented Dec 22, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

archernsy commented Dec 22, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Dec 22, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 22, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 22, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 22, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 22, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 22, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 22, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 22, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 22, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 22, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants