Skip to content

Propose fix perceptual loss sqrt nan#8414

Open
cvbourne wants to merge 6 commits intoProject-MONAI:devfrom
cvbourne:propose-fix-perceptual-loss-sqrt-nan
Open

Propose fix perceptual loss sqrt nan#8414
cvbourne wants to merge 6 commits intoProject-MONAI:devfrom
cvbourne:propose-fix-perceptual-loss-sqrt-nan

Conversation

@cvbourne
Copy link

@cvbourne cvbourne commented Apr 7, 2025

Fixes # 8412

Description

This PR fixes a numerical stability issue in the PerceptualLoss implementation where the normalize_tensor function can produce NaN gradients when the input values are very small.

  • Moved epsilon inside the square root calculation instead of after it
  • Increased default from 1e-10 to 1e-8 for better stability
  • Added test

Types of changes

  • Non-breaking change (fix or new feature that would not break existing functionality).
  • New tests added to cover the changes.
  • Integration tests passed locally by running ./runtests.sh -f -u --net --coverage.
  • Quick tests passed locally by running ./runtests.sh --quick --unittests --disttests.

@cvbourne cvbourne mentioned this pull request Apr 7, 2025
@KumoLiu KumoLiu requested a review from marksgraham April 8, 2025 15:14
@KumoLiu
Copy link
Contributor

KumoLiu commented Apr 8, 2025

Thanks for the update, the changes looks fine to me.
Could you please help fix the failed checks then I could trigger the blossom tests? Thanks.

@KumoLiu KumoLiu requested review from ericspod and virginiafdez April 8, 2025 15:16
Comment on lines +274 to 276
def normalize_tensor(x: torch.Tensor, eps: float = 1e-8) -> torch.Tensor:
norm_factor = torch.sqrt(torch.sum(x**2, dim=1, keepdim=True) + eps)
return x / (norm_factor + eps)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def normalize_tensor(x: torch.Tensor, eps: float = 1e-8) -> torch.Tensor:
norm_factor = torch.sqrt(torch.sum(x**2, dim=1, keepdim=True) + eps)
return x / (norm_factor + eps)
def normalize_tensor(x: torch.Tensor, eps: float = 1e-8) -> torch.Tensor:
norm_factor = torch.sqrt(torch.sum(x**2, dim=1, keepdim=True) + eps)
return x / norm_factor

Do we want to remove eps from the denominator? As proposed eps will contribute twice to the final result.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Will remove.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file should go into an appropriate subdirectory in the tests directory. We've changed the directory structure there recently so probably tests/losses.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Roger.

# Create tensor
x = torch.zeros(2, 3, 10, 10, requires_grad=True)

optimizer = optim.Adam([x], lr=0.01)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the optimizer is needed for this test?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed, will remove.

x = torch.zeros(2, 3, 10, 10, requires_grad=True)

optimizer = optim.Adam([x], lr=0.01)
x_scaled = x * scale
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since x is all 0, x_scaled is always going to be 0 unless you're expected float imprecision to create values here. If so, I would add a comment to mention this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add a comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the point of this test with regards to the next one; instead of a zeros tensor, couldn't it be a random one which will be then multiplied by a really small number?

Copy link
Contributor

@virginiafdez virginiafdez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look good to me. I'd modify one of the tests, but the rest is fine.

x = torch.zeros(2, 3, 10, 10, requires_grad=True)

optimizer = optim.Adam([x], lr=0.01)
x_scaled = x * scale
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the point of this test with regards to the next one; instead of a zeros tensor, couldn't it be a random one which will be then multiplied by a really small number?

Copy link
Contributor

@virginiafdez virginiafdez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides my comment about the point of one of the tests, I think this PR can be merged, as long as the errors happening on the automatic tests are fixed.

@KumoLiu
Copy link
Contributor

KumoLiu commented May 9, 2025

Hi @cvbourne, could you please help resolve the DCO issue and also help take a look at the failed pipeline? Thanks.

@ericspod ericspod moved this to Backlog in MONAI v1.6 Feb 24, 2026
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 1, 2026

📝 Walkthrough

Walkthrough

Numerical stability improvements to the normalize_tensor function in the perceptual loss module by increasing default epsilon from 1e-10 to 1e-8 and repositioning epsilon inside the squared sum calculation before the square root operation. A new test file validates gradient stability with small-valued and zero tensor inputs.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed Title references the main fix (perceptual loss sqrt NaN issue) but is awkwardly phrased with 'Propose fix' rather than stating the fix directly.
Description check ✅ Passed Description covers the issue, key changes, and test addition, but omits docstring and documentation updates mentioned in the template.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
tests/test_perceptual_loss_stability.py (1)

36-39: ⚠️ Potential issue | 🟡 Minor

Zero tensor negates scale parameter.

torch.zeros * scale is always zeros. To test small values, use random tensor:

Proposed fix
-        x = torch.zeros(2, 3, 10, 10, requires_grad=True)
-
-        optimizer = optim.Adam([x], lr=0.01)
-        x_scaled = x * scale
+        x = torch.randn(2, 3, 10, 10, requires_grad=True)
+        x_scaled = x * scale

This also addresses the unused name parameter warning (ARG002) since parameterized tests require the name argument for test identification.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_perceptual_loss_stability.py` around lines 36 - 39, The test
currently creates x = torch.zeros(...) then computes x_scaled = x * scale which
is always zero; replace the zero tensor with a small random tensor (e.g.,
torch.randn(...) * small_factor or torch.empty(...).normal_(mean=0,
std=small_value)) so scaling actually affects values, and ensure the
parameterized test's name parameter is used (or include it in the test id) to
avoid the unused-name warning; update references to x, x_scaled, optimizer in
the test_perceptual_loss_stability test accordingly.
🧹 Nitpick comments (2)
tests/test_perceptual_loss_stability.py (1)

17-17: Remove unused import.

torch.optim import can be removed when optimizer is deleted.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_perceptual_loss_stability.py` at line 17, Remove the now-unused
import "import torch.optim as optim" from
tests/test_perceptual_loss_stability.py because the optimizer was removed;
search for the symbol "optim" and delete the import statement so the file no
longer contains an unused dependency.
monai/losses/perceptual.py (1)

277-279: Missing docstring.

Per coding guidelines, add a docstring describing parameters (x, eps) and return value.

Proposed docstring
 def normalize_tensor(x: torch.Tensor, eps: float = 1e-8) -> torch.Tensor:
+    """Normalize tensor across channel dimension.
+
+    Args:
+        x: Input tensor with shape (B, C, ...).
+        eps: Small constant for numerical stability.
+
+    Returns:
+        Channel-normalized tensor with same shape as input.
+    """
     norm_factor = torch.sqrt(torch.sum(x**2, dim=1, keepdim=True) + eps)
-    return x / (norm_factor + eps)
+    return x / norm_factor
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@monai/losses/perceptual.py` around lines 277 - 279, Add a proper docstring to
the normalize_tensor function that documents parameters and return value:
describe param x (torch.Tensor) as the input tensor with channel dimension at
dim=1, param eps (float) as the small numerical epsilon used to stabilize the
norm computation, and describe the return as a torch.Tensor containing the input
scaled by the L2 norm per-channel (with shape preserved). Also mention behavior
with shapes and that eps prevents division by zero and the dtype of the returned
tensor matches the input.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@monai/losses/perceptual.py`:
- Around line 277-279: The normalize_tensor function currently applies eps twice
(inside the sqrt and again added to norm_factor), so change normalize_tensor to
only add eps inside the sqrt: compute norm_factor = torch.sqrt(torch.sum(x**2,
dim=1, keepdim=True) + eps) and return x / norm_factor (remove the extra + eps
in the denominator) to avoid double-counting the epsilon while keeping numerical
stability.

---

Duplicate comments:
In `@tests/test_perceptual_loss_stability.py`:
- Around line 36-39: The test currently creates x = torch.zeros(...) then
computes x_scaled = x * scale which is always zero; replace the zero tensor with
a small random tensor (e.g., torch.randn(...) * small_factor or
torch.empty(...).normal_(mean=0, std=small_value)) so scaling actually affects
values, and ensure the parameterized test's name parameter is used (or include
it in the test id) to avoid the unused-name warning; update references to x,
x_scaled, optimizer in the test_perceptual_loss_stability test accordingly.

---

Nitpick comments:
In `@monai/losses/perceptual.py`:
- Around line 277-279: Add a proper docstring to the normalize_tensor function
that documents parameters and return value: describe param x (torch.Tensor) as
the input tensor with channel dimension at dim=1, param eps (float) as the small
numerical epsilon used to stabilize the norm computation, and describe the
return as a torch.Tensor containing the input scaled by the L2 norm per-channel
(with shape preserved). Also mention behavior with shapes and that eps prevents
division by zero and the dtype of the returned tensor matches the input.

In `@tests/test_perceptual_loss_stability.py`:
- Line 17: Remove the now-unused import "import torch.optim as optim" from
tests/test_perceptual_loss_stability.py because the optimizer was removed;
search for the symbol "optim" and delete the import statement so the file no
longer contains an unused dependency.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 1e3d29b and 6d2c886.

📒 Files selected for processing (2)
  • monai/losses/perceptual.py
  • tests/test_perceptual_loss_stability.py

Comment on lines +277 to 279
def normalize_tensor(x: torch.Tensor, eps: float = 1e-8) -> torch.Tensor:
norm_factor = torch.sqrt(torch.sum(x**2, dim=1, keepdim=True) + eps)
return x / (norm_factor + eps)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Double epsilon still present.

Per past review discussion, eps contributes twice: once inside sqrt() and again in the denominator. Author agreed to remove the second one.

Proposed fix
 def normalize_tensor(x: torch.Tensor, eps: float = 1e-8) -> torch.Tensor:
     norm_factor = torch.sqrt(torch.sum(x**2, dim=1, keepdim=True) + eps)
-    return x / (norm_factor + eps)
+    return x / norm_factor
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@monai/losses/perceptual.py` around lines 277 - 279, The normalize_tensor
function currently applies eps twice (inside the sqrt and again added to
norm_factor), so change normalize_tensor to only add eps inside the sqrt:
compute norm_factor = torch.sqrt(torch.sum(x**2, dim=1, keepdim=True) + eps) and
return x / norm_factor (remove the extra + eps in the denominator) to avoid
double-counting the epsilon while keeping numerical stability.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Backlog

Development

Successfully merging this pull request may close these issues.

4 participants