-
Notifications
You must be signed in to change notification settings - Fork 307
Depth seg new op #862
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Depth seg new op #862
Conversation
Summary of ChangesHello @archernsy, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the video processing functionalities by integrating two advanced computer vision operators. The Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
This PR introduces two new operators: DepthAnythingv2: For single-frame depth extraction from videos. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces two new operators, DepthAnythingV2Mapper and SAMAutoMaskMapper, for depth estimation and automatic video segmentation. The changes are extensive and involve adding a significant amount of model code, seemingly from external repositories. My review focuses on the integration of these models as data_juicer operators. I've identified several critical and high-severity issues related to dependency management (cloning git repos at runtime), hardcoded absolute paths, performance (inefficient model loading), and potential bugs (incorrect sorting). These issues could affect the portability, maintainability, and performance of the new operators and should be addressed.
| self, | ||
| sam1_checkpoint_path: str = "./checkpoints/sam1/sam_vit_h_4b8939.pth", | ||
| sam2_checkpoint_path: str = "./checkpoints/sam2/sam2_hiera_large.pt", | ||
| sam2_model_cfg: str = "/mnt/workspace/junyuanxiao/code/AutoSeg-SAM2/sam2_configs/sam2_hiera_l.yaml", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default value for sam2_model_cfg is a hardcoded absolute path to a specific user's workspace (/mnt/workspace/junyuanxiao/code/AutoSeg-SAM2/sam2_configs/sam2_hiera_l.yaml). This will cause the code to fail on any other machine. This should be changed to a relative path, a path configurable via an environment variable, or downloaded to a cache directory. The logic in _load_models to find this config file also contains several hardcoded paths and is brittle. A more robust way to handle model configurations and dependencies is needed.
| def infer_image(self, raw_image, input_size=518): | ||
| image, (h, w) = self.image2tensor(raw_image, input_size) | ||
|
|
||
| depth = self.forward(image) | ||
|
|
||
| depth = F.interpolate(depth[:, None], (h, w), mode="bilinear", align_corners=True)[0, 0] | ||
|
|
||
| return depth.cpu().numpy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To fix the hardcoded device issue in image2tensor, this method should determine the model's device and pass it down.
| def infer_image(self, raw_image, input_size=518): | |
| image, (h, w) = self.image2tensor(raw_image, input_size) | |
| depth = self.forward(image) | |
| depth = F.interpolate(depth[:, None], (h, w), mode="bilinear", align_corners=True)[0, 0] | |
| return depth.cpu().numpy() | |
| def infer_image(self, raw_image, input_size=518): | |
| device = next(self.parameters()).device | |
| image, (h, w) = self.image2tensor(raw_image, input_size, device=device) | |
| depth = self.forward(image) | |
| depth = F.interpolate(depth[:, None], (h, w), mode="bilinear", align_corners=True)[0, 0] | |
| return depth.cpu().numpy() |
| from .depth_anything_v2.dpt import DepthAnythingV2 | ||
|
|
||
| # Load model | ||
| depth_anything = DepthAnythingV2(**self.model_configs[self.encoder]) | ||
| model_path = os.path.join(self.checkpoint_path, f"depth_anything_v2_{self.encoder}.pth") | ||
|
|
||
| if not os.path.exists(model_path): | ||
| raise FileNotFoundError(f"Model checkpoint not found at {model_path}") | ||
|
|
||
| depth_anything.load_state_dict(torch.load(model_path, map_location='cpu')) | ||
|
|
||
| device = "cuda" if self.use_cuda() else "cpu" | ||
| if rank is not None and self.use_cuda(): | ||
| device = f"cuda:{rank}" | ||
|
|
||
| depth_anything = depth_anything.to(device).eval() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The model is loaded within the process_single method. This is inefficient as it will be reloaded for every sample processed by this worker if the result is not already cached. This can significantly slow down processing. The model should be loaded once per worker, for example in the __init__ method, and stored as an instance attribute (e.g., self.model). A helper method can be used for lazy loading on the first call if you want to avoid loading the model during initialization of all workers.
| def image2tensor(self, raw_image, input_size=518): | ||
| transform = Compose([ | ||
| Resize( | ||
| width=input_size, | ||
| height=input_size, | ||
| resize_target=False, | ||
| keep_aspect_ratio=True, | ||
| ensure_multiple_of=14, | ||
| resize_method='lower_bound', | ||
| image_interpolation_method=cv2.INTER_CUBIC, | ||
| ), | ||
| NormalizeImage(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]), | ||
| PrepareForNet(), | ||
| ]) | ||
|
|
||
| h, w = raw_image.shape[:2] | ||
|
|
||
| image = cv2.cvtColor(raw_image, cv2.COLOR_BGR2RGB) / 255.0 | ||
|
|
||
| image = transform({'image': image})['image'] | ||
| image = torch.from_numpy(image).unsqueeze(0) | ||
|
|
||
| DEVICE = 'cuda' if torch.cuda.is_available() else 'mps' if torch.backends.mps.is_available() else 'cpu' | ||
| image = image.to(DEVICE) | ||
|
|
||
| return image, (h, w) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The image2tensor method hardcodes the device selection. This can cause device mismatches if the model is on a different device (e.g., a specific GPU like cuda:1). The device should be passed in as an argument from the caller. This makes the component more modular and prevents potential runtime errors.
def image2tensor(self, raw_image, input_size=518, device='cpu'):
transform = Compose([
Resize(
width=input_size,
height=input_size,
resize_target=False,
keep_aspect_ratio=True,
ensure_multiple_of=14,
resize_method='lower_bound',
image_interpolation_method=cv2.INTER_CUBIC,
),
NormalizeImage(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
PrepareForNet(),
])
h, w = raw_image.shape[:2]
image = cv2.cvtColor(raw_image, cv2.COLOR_BGR2RGB) / 255.0
image = transform({'image': image})['image']
image = torch.from_numpy(image).unsqueeze(0)
image = image.to(device)
return image, (h, w)| if not os.path.exists(depth_anything_v2_repo_path): | ||
| import subprocess | ||
| subprocess.run([ | ||
| "git", | ||
| "clone", | ||
| "https://github.com/DepthAnything/Depth-Anything-V2.git", | ||
| depth_anything_v2_repo_path, | ||
| ], check=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This operator clones a git repository (Depth-Anything-V2) into a cache directory during initialization. This introduces several potential issues:
- Side Effects: It's an unexpected side effect for an operator to perform network operations and write to the filesystem outside of its designated output directories.
- Dependencies: It creates a dependency on
gitbeing installed and network access to GitHub. - Fragility: The clone might fail due to network issues, permissions, or if the repository is moved or deleted.
It would be more robust to either vendor the necessary code within this repository or treat it as a proper dependency that users must install beforehand, with clear instructions in the documentation.
| # w0, h0 = w0 + 0.1, h0 + 0.1 | ||
|
|
||
| sqrt_N = math.sqrt(N) | ||
| sx, sy = float(w0) / sqrt_N, float(h0) / sqrt_N | ||
| patch_pos_embed = nn.functional.interpolate( | ||
| patch_pos_embed.reshape(1, int(sqrt_N), int(sqrt_N), dim).permute(0, 3, 1, 2), | ||
| scale_factor=(sx, sy), | ||
| # (int(w0), int(h0)), # to solve the upsampling shape issue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ) | ||
| elif self.training and self.sample_drop_ratio > 0.0: | ||
| x = x + self.drop_path1(attn_residual_func(x)) | ||
| x = x + self.drop_path1(ffn_residual_func(x)) # FIXME: drop_path2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a FIXME comment here, and it seems correct. You are using self.drop_path1 for the FFN residual path, but it should probably be self.drop_path2. Using the same drop path instance might not be intended and could be a bug if drop probabilities are different (though here they are the same). It's better to use the correct one for clarity and correctness.
| x = x + self.drop_path1(ffn_residual_func(x)) # FIXME: drop_path2 | |
| x = x + self.drop_path2(ffn_residual_func(x)) |
| checkpoint_path: str = "./checkpoints", | ||
| encoder: str = "vitl", | ||
| input_size: int = 518, | ||
| frame_num: int = 10, # 改为 int 类型,默认值改为正整数 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is in Chinese, while the rest of the docstring and code is in English. This is likely a leftover development note. For consistency and clarity, it should be removed as the docstring already explains the frame_num parameter.
| frame_num: int = 10, # 改为 int 类型,默认值改为正整数 | |
| frame_num: int = 10, |
|
|
||
| # Add the repository path to sys.path | ||
| import sys | ||
| sys.path.append(auto_seg_sam2_repo_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| """ | ||
| 准备符合 SAM2 要求的帧文件(创建临时目录并复制重命名文件) | ||
| """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docstring for this method is in Chinese, which is inconsistent with the English used in the rest of the file. Please translate it for better maintainability.
| """ | |
| 准备符合 SAM2 要求的帧文件(创建临时目录并复制重命名文件) | |
| """ | |
| """ | |
| Prepare frame files in a format required by SAM2 (create a temporary | |
| directory and copy/rename files). | |
| """ |
No description provided.