Skip to content

Conversation

@justincdavis
Copy link

Summary

This PR adds the CV-CUDA backend kernel for the Normalize transform.

How to use

import cvcuda
import torchvision.transforms.v2.functional as F

cvc_tensor = cvcuda.Tensor((1, 224, 224, 3), cvcuda.Type.F32, cvcuda.TensorLayout.NHWC)
# Dispatches to F.normalize_cvcuda
normalized_tensor = F.normalize(cvc_tensor, [0.485, 0.456, 0.406], [0.229, 0.224, 0.225])

Run unit tests

pytest test/test_transforms_v2.py::TestNormalizeCVCUDA
...
60 passed in 0.59s

@pytorch-bot
Copy link

pytorch-bot bot commented Nov 19, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/vision/9279

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 974ffca with merge base 1e53952 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla
Copy link

meta-cla bot commented Nov 19, 2025

Hi @justincdavis!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks!

Copy link
Member

@AntoineSimoulin AntoineSimoulin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @justincdavis, thanks for submitting the PR, this is looking good:) I left some minor changes. I think we mainly need to make sure the tests are passing when cvcuda is not installed!

(F.normalize_video, tv_tensors.Video),
pytest.param(
F._misc._normalize_cvcuda,
_import_cvcuda().Tensor,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@justincdavis it seems that _import_cvcuda().Tensor is still raising an error if cvcuda is not installed. Maybe we can just use cvcuda.Tensor here and see if this works better?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for pointing this out! I replaced the actual cvcuda.Tensor type with the string "cvcuda.Tensor", then inside the function we resolve the cvcuda.Tensor type if we have the corresponding string. LMK if this looks like a reasonable solution!

@justincdavis
Copy link
Author

justincdavis commented Nov 24, 2025

Following up from my comment in the _normalize_cvcuda function itself. CV-CUDA requires that the mean and scale tensors be on-device when we call cvcuda.normalize. This means that a host->device memcpy must occur twice for each normalize call when using CV-CUDA backend. We could attempt to reduce the impact of this by having a helper function which creates the tuple[cvcuda.Tensor, cvcuda.Tensor] from the mean/std parameters. Based on what I see in the codebase, this seems like it would be a new feature present in torchvision for a functional transform.

# CV-CUDA requires float32 tensors for the mean/std parameters
# at small batchs, this is costly relative to normalize operation
# if CV-CUDA is known to be a backend, could optimize this
# For Normalize class:
# by creating tensors at class initialization time
# For functional API:
# by storing cached tensors in helper function with functools.lru_cache (would it even be worth it?)
# Since CV-CUDA is 1) not default backend, 2) only strictly faster at large batch size, ignore

@AntoineSimoulin
Copy link
Member

Hey @justincdavis, looking good to me. I don't think the failing test is related to this PR. Seems like a false positive alert to me! Can you sign our Contributor License Agreement (c.f. meta-cla bot comment in the discussion)?

@meta-cla meta-cla bot added the cla signed label Dec 2, 2025
@justincdavis justincdavis force-pushed the feat/normalize_cvcuda branch from 778ad32 to d3ef0bd Compare December 4, 2025 19:03
@justincdavis justincdavis force-pushed the feat/normalize_cvcuda branch from 6b7dd65 to 0f8910e Compare December 4, 2025 19:07
Copy link
Contributor

@zy1git zy1git left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments. Some of them are referred to the merged PRs in the past three weeks.

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the PR @justincdavis , I left a review for @zy1git to address.

Comment on lines 5663 to 5666
if is_cvcuda:
assert_close(actual, expected, rtol=0, atol=1e-6)
else:
assert_equal(actual, expected)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised atol=1e-6 is needed, I thought it was the default for float32. Let's try without it and see if the CI is happy?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remove the rtol = 0, atol=1e-6 and the test passed. However, the default for assert_close is rtol = 1.3e-6, atol=1e-5, thus if the original implementation rtol=0, atol=1e-6 can pass the test, the default version definitely can pass.
I think atol = 1e-6 is more restricted threshold. I will change it to default in a new commit. Please let me know if you think we need to use the original implementation.

if is_cvcuda:
image = F.cvcuda_to_tensor(image)[0].cpu()

expected = self._reference_normalize_image(image, mean=mean, std=std)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for self: double-checking that this is doing the right conversion. image is not a tensor and we're using _reference_normalize_image as the ref, we'll be comparing our cvcuda-tensor to this reference tensor. Seems OK.

Comment on lines 5651 to 5652
if is_cvcuda and dtype != torch.float32:
pytest.skip("CVCUDA only supports float32 for normalize")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of a skip, this could be an xfail instead. See https://docs.pytest.org/en/stable/how-to/skipping.html

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed pytest.skip to pytest.xfail

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for self: need to check that all the relevant tests have been properly parametrized.

# torchvision only supports uint and float, right now CV-CUDA doesnt expose float16, so only check 32
# in the future add float16 once exposed in CV-CUDA
if not (image.dtype == cvcuda.Type.F32):
raise ValueError(f"Input tensor should be a float tensor. Got {image.dtype}.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have a test that asserts non-float32 leads to an error and check the error message

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@zy1git zy1git left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I addressed comments by pushing a new commit. Feel free to take a look.

Comment on lines 5663 to 5666
if is_cvcuda:
assert_close(actual, expected, rtol=0, atol=1e-6)
else:
assert_equal(actual, expected)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remove the rtol = 0, atol=1e-6 and the test passed. However, the default for assert_close is rtol = 1.3e-6, atol=1e-5, thus if the original implementation rtol=0, atol=1e-6 can pass the test, the default version definitely can pass.
I think atol = 1e-6 is more restricted threshold. I will change it to default in a new commit. Please let me know if you think we need to use the original implementation.

# torchvision only supports uint and float, right now CV-CUDA doesnt expose float16, so only check 32
# in the future add float16 once exposed in CV-CUDA
if not (image.dtype == cvcuda.Type.F32):
raise ValueError(f"Input tensor should be a float tensor. Got {image.dtype}.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 5651 to 5652
if is_cvcuda and dtype != torch.float32:
pytest.skip("CVCUDA only supports float32 for normalize")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed pytest.skip to pytest.xfail

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants