Propose fix perceptual loss sqrt nan by cvbourne · Pull Request #8414 · Project-MONAI/MONAI

cvbourne · 2025-04-07T15:24:53Z

Fixes # 8412

Description

This PR fixes a numerical stability issue in the PerceptualLoss implementation where the normalize_tensor function can produce NaN gradients when the input values are very small.

Moved epsilon inside the square root calculation instead of after it
Increased default from 1e-10 to 1e-8 for better stability
Added test

Types of changes

Non-breaking change (fix or new feature that would not break existing functionality).
New tests added to cover the changes.
Integration tests passed locally by running ./runtests.sh -f -u --net --coverage.
Quick tests passed locally by running ./runtests.sh --quick --unittests --disttests.

KumoLiu · 2025-04-08T15:16:02Z

Thanks for the update, the changes looks fine to me.
Could you please help fix the failed checks then I could trigger the blossom tests? Thanks.

ericspod · 2025-04-08T15:27:26Z

monai/losses/perceptual.py

+def normalize_tensor(x: torch.Tensor, eps: float = 1e-8) -> torch.Tensor:
+    norm_factor = torch.sqrt(torch.sum(x**2, dim=1, keepdim=True) + eps)
    return x / (norm_factor + eps)


Suggested change

def normalize_tensor(x: torch.Tensor, eps: float = 1e-8) -> torch.Tensor:

norm_factor = torch.sqrt(torch.sum(x**2, dim=1, keepdim=True) + eps)

return x / (norm_factor + eps)

def normalize_tensor(x: torch.Tensor, eps: float = 1e-8) -> torch.Tensor:

norm_factor = torch.sqrt(torch.sum(x**2, dim=1, keepdim=True) + eps)

return x / norm_factor

Do we want to remove eps from the denominator? As proposed eps will contribute twice to the final result.

Agreed. Will remove.

ericspod · 2025-04-08T15:28:37Z

tests/test_perceptual_loss_stability.py

This file should go into an appropriate subdirectory in the tests directory. We've changed the directory structure there recently so probably tests/losses.

ericspod · 2025-04-08T15:29:30Z

tests/test_perceptual_loss_stability.py

+        # Create tensor
+        x = torch.zeros(2, 3, 10, 10, requires_grad=True)
+
+        optimizer = optim.Adam([x], lr=0.01)


I don't think the optimizer is needed for this test?

Not needed, will remove.

ericspod · 2025-04-08T15:30:54Z

tests/test_perceptual_loss_stability.py

+        x = torch.zeros(2, 3, 10, 10, requires_grad=True)
+
+        optimizer = optim.Adam([x], lr=0.01)
+        x_scaled = x * scale


Since x is all 0, x_scaled is always going to be 0 unless you're expected float imprecision to create values here. If so, I would add a comment to mention this.

Will add a comment.

I don't understand the point of this test with regards to the next one; instead of a zeros tensor, couldn't it be a random one which will be then multiplied by a really small number?

virginiafdez

The changes look good to me. I'd modify one of the tests, but the rest is fine.

virginiafdez · 2025-04-09T07:26:31Z

tests/test_perceptual_loss_stability.py

+        x = torch.zeros(2, 3, 10, 10, requires_grad=True)
+
+        optimizer = optim.Adam([x], lr=0.01)
+        x_scaled = x * scale


I don't understand the point of this test with regards to the next one; instead of a zeros tensor, couldn't it be a random one which will be then multiplied by a really small number?

virginiafdez

Besides my comment about the point of one of the tests, I think this PR can be merged, as long as the errors happening on the automatic tests are fixed.

KumoLiu · 2025-05-09T14:10:24Z

Hi @cvbourne, could you please help resolve the DCO issue and also help take a look at the failed pipeline? Thanks.

coderabbitai · 2026-03-01T11:54:20Z

📝 Walkthrough

Walkthrough

Numerical stability improvements to the normalize_tensor function in the perceptual loss module by increasing default epsilon from 1e-10 to 1e-8 and repositioning epsilon inside the squared sum calculation before the square root operation. A new test file validates gradient stability with small-valued and zero tensor inputs.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	Title references the main fix (perceptual loss sqrt NaN issue) but is awkwardly phrased with 'Propose fix' rather than stating the fix directly.
Description check	✅ Passed	Description covers the issue, key changes, and test addition, but omits docstring and documentation updates mentioned in the template.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

tests/test_perceptual_loss_stability.py (1)

36-39: ⚠️ Potential issue | 🟡 Minor

Zero tensor negates scale parameter.

torch.zeros * scale is always zeros. To test small values, use random tensor:

Proposed fix

-        x = torch.zeros(2, 3, 10, 10, requires_grad=True)
-
-        optimizer = optim.Adam([x], lr=0.01)
-        x_scaled = x * scale
+        x = torch.randn(2, 3, 10, 10, requires_grad=True)
+        x_scaled = x * scale

This also addresses the unused name parameter warning (ARG002) since parameterized tests require the name argument for test identification.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/test_perceptual_loss_stability.py` around lines 36 - 39, The test
currently creates x = torch.zeros(...) then computes x_scaled = x * scale which
is always zero; replace the zero tensor with a small random tensor (e.g.,
torch.randn(...) * small_factor or torch.empty(...).normal_(mean=0,
std=small_value)) so scaling actually affects values, and ensure the
parameterized test's name parameter is used (or include it in the test id) to
avoid the unused-name warning; update references to x, x_scaled, optimizer in
the test_perceptual_loss_stability test accordingly.

🧹 Nitpick comments (2)

tests/test_perceptual_loss_stability.py (1)

17-17: Remove unused import.

torch.optim import can be removed when optimizer is deleted.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/test_perceptual_loss_stability.py` at line 17, Remove the now-unused
import "import torch.optim as optim" from
tests/test_perceptual_loss_stability.py because the optimizer was removed;
search for the symbol "optim" and delete the import statement so the file no
longer contains an unused dependency.

monai/losses/perceptual.py (1)

277-279: Missing docstring.

Per coding guidelines, add a docstring describing parameters (x, eps) and return value.

Proposed docstring

 def normalize_tensor(x: torch.Tensor, eps: float = 1e-8) -> torch.Tensor:
+    """Normalize tensor across channel dimension.
+
+    Args:
+        x: Input tensor with shape (B, C, ...).
+        eps: Small constant for numerical stability.
+
+    Returns:
+        Channel-normalized tensor with same shape as input.
+    """
     norm_factor = torch.sqrt(torch.sum(x**2, dim=1, keepdim=True) + eps)
-    return x / (norm_factor + eps)
+    return x / norm_factor

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@monai/losses/perceptual.py` around lines 277 - 279, Add a proper docstring to
the normalize_tensor function that documents parameters and return value:
describe param x (torch.Tensor) as the input tensor with channel dimension at
dim=1, param eps (float) as the small numerical epsilon used to stabilize the
norm computation, and describe the return as a torch.Tensor containing the input
scaled by the L2 norm per-channel (with shape preserved). Also mention behavior
with shapes and that eps prevents division by zero and the dtype of the returned
tensor matches the input.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@monai/losses/perceptual.py`:
- Around line 277-279: The normalize_tensor function currently applies eps twice
(inside the sqrt and again added to norm_factor), so change normalize_tensor to
only add eps inside the sqrt: compute norm_factor = torch.sqrt(torch.sum(x**2,
dim=1, keepdim=True) + eps) and return x / norm_factor (remove the extra + eps
in the denominator) to avoid double-counting the epsilon while keeping numerical
stability.

---

Duplicate comments:
In `@tests/test_perceptual_loss_stability.py`:
- Around line 36-39: The test currently creates x = torch.zeros(...) then
computes x_scaled = x * scale which is always zero; replace the zero tensor with
a small random tensor (e.g., torch.randn(...) * small_factor or
torch.empty(...).normal_(mean=0, std=small_value)) so scaling actually affects
values, and ensure the parameterized test's name parameter is used (or include
it in the test id) to avoid the unused-name warning; update references to x,
x_scaled, optimizer in the test_perceptual_loss_stability test accordingly.

---

Nitpick comments:
In `@monai/losses/perceptual.py`:
- Around line 277-279: Add a proper docstring to the normalize_tensor function
that documents parameters and return value: describe param x (torch.Tensor) as
the input tensor with channel dimension at dim=1, param eps (float) as the small
numerical epsilon used to stabilize the norm computation, and describe the
return as a torch.Tensor containing the input scaled by the L2 norm per-channel
(with shape preserved). Also mention behavior with shapes and that eps prevents
division by zero and the dtype of the returned tensor matches the input.

In `@tests/test_perceptual_loss_stability.py`:
- Line 17: Remove the now-unused import "import torch.optim as optim" from
tests/test_perceptual_loss_stability.py because the optimizer was removed;
search for the symbol "optim" and delete the import statement so the file no
longer contains an unused dependency.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 1e3d29b and 6d2c886.

📒 Files selected for processing (2)

monai/losses/perceptual.py
tests/test_perceptual_loss_stability.py

coderabbitai · 2026-03-01T11:56:53Z

monai/losses/perceptual.py

+def normalize_tensor(x: torch.Tensor, eps: float = 1e-8) -> torch.Tensor:
+    norm_factor = torch.sqrt(torch.sum(x**2, dim=1, keepdim=True) + eps)
    return x / (norm_factor + eps)


⚠️ Potential issue | 🟡 Minor

Double epsilon still present.

Per past review discussion, eps contributes twice: once inside sqrt() and again in the denominator. Author agreed to remove the second one.

Proposed fix

def normalize_tensor(x: torch.Tensor, eps: float = 1e-8) -> torch.Tensor: norm_factor = torch.sqrt(torch.sum(x**2, dim=1, keepdim=True) + eps) - return x / (norm_factor + eps) + return x / norm_factor

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@monai/losses/perceptual.py` around lines 277 - 279, The normalize_tensor function currently applies eps twice (inside the sqrt and again added to norm_factor), so change normalize_tensor to only add eps inside the sqrt: compute norm_factor = torch.sqrt(torch.sum(x**2, dim=1, keepdim=True) + eps) and return x / norm_factor (remove the extra + eps in the denominator) to avoid double-counting the epsilon while keeping numerical stability.

cvbourne added 2 commits April 7, 2025 11:05

Fix perceptualLoss normalize_tensor to prevent NaN errors

3d7dfc6

Import Spacing

097263a

cvbourne mentioned this pull request Apr 7, 2025

PerceptualLoss Bug #8412

Open

Merge branch 'dev' into propose-fix-perceptual-loss-sqrt-nan

d78dc56

KumoLiu requested a review from marksgraham April 8, 2025 15:14

KumoLiu requested review from ericspod and virginiafdez April 8, 2025 15:16

ericspod reviewed Apr 8, 2025

View reviewed changes

virginiafdez reviewed Apr 9, 2025

View reviewed changes

virginiafdez approved these changes Apr 25, 2025

View reviewed changes

ericspod and others added 2 commits May 8, 2025 23:17

Merge branch 'dev' into propose-fix-perceptual-loss-sqrt-nan

ec0e3ec

Merge branch 'dev' into propose-fix-perceptual-loss-sqrt-nan

c736a9d

KumoLiu requested review from KumoLiu and Nic-Ma as code owners January 30, 2026 07:12

ericspod added this to MONAI v1.6 Feb 24, 2026

ericspod moved this to Backlog in MONAI v1.6 Feb 24, 2026

Merge branch 'dev' into propose-fix-perceptual-loss-sqrt-nan

6d2c886

coderabbitai bot reviewed Mar 1, 2026

View reviewed changes

Conversation

cvbourne commented Apr 7, 2025

Description

Types of changes

Uh oh!

KumoLiu commented Apr 8, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

virginiafdez left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

virginiafdez left a comment

Choose a reason for hiding this comment

Uh oh!

KumoLiu commented May 9, 2025

Uh oh!

coderabbitai bot commented Mar 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

coderabbitai bot commented Mar 1, 2026 •

edited

Loading