[core] Refactor hub attn kernels #12475

sayakpaul · 2025-10-13T10:14:52Z

What does this PR do?

Refactors how we load the attention kernels from the Hub.

Currently, when a user specifies the DIFFUSERS_ENABLE_HUB_KERNELS env var, we always download the supported kernel. Currently, we have FA3, but we have ongoing PRs that support FA and SAGE: #12387 and #12439. So, we will download ALL of them even when they're not required. This is not good.

This PR makes it so that only the relevant kernel gets downloaded without breaking torch.compile compliance (fullgraph and no recompilation triggers).

Cc: @MekkCyber

HuggingFaceDocBuilderDev · 2025-10-13T10:22:42Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

MekkCyber

Very good to only load the invoked attention implementation ! Thanks for adding this

src/diffusers/models/attention_dispatch.py

DN6 · 2025-10-23T08:42:17Z

I think the control flow here is a bit difficult to follow. We should aim to minimize the number of new objects/concepts introduced in this module since there's already quite a lot of routing going on in here.

My recommendations:

Add a _set_attention_backend function to attention_dispatch.py that handles checking requirements and downloading kernels if the are set. Call this function from modeling_utils when setting the backend

def _set_attention_backend(backend: AttentionBackendName) -> None:
    _check_attention_backend_requirements(backend)
    _maybe_download_kernel_for_backend(backend)

In attention dispatch, let's create a _HubKernelConfig and _HubKernelRegistry

@dataclass
class _HubKernelConfig:
    """Configuration for downloading and using a hub-based attention kernel."""

    repo_id: str
    function_attr: str
    revision: Optional[str] = None
    kernel_fn: Optional[Callable] = None


# Registry for hub-based attention kernels
_HUB_KERNELS_REGISTRY: Dict["AttentionBackendName", _HubKernelConfig] = {
    AttentionBackendName._FLASH_3_HUB: _HubKernelConfig(
        repo_id="kernels-community/flash-attn3", function_attr="flash_attn_func", revision="fake-ops-return-probs"
    )
}

_maybe_download_kernel_for_backend(backend) would download the kernel and set the kernel_fn for a given backend if the backend is supported otherwise it's a no-op.

Then in your hub function, fetch the downloaded kernel from the registry

    func = _HUB_KERNELS_REGISTRY[AttentionBackendName._FLASH_3_HUB]._kernel_fn
    out = func(
        q=query,
        ....

We shouldn't attempt kernel downloads from the dispatch function. It should already be downloaded/available before hand.

sayakpaul · 2025-10-23T09:13:02Z

Okay but

I would maintain a separate _set_attention_backend() function within set_attention_backend()? Does it not complicate things further? Would rather have _check_attention_backend_requirements(backend) and _maybe_download_kernel_for_backend(backend) directly in set_attention_backend(). That's better no?
How does _maybe_download_kernel_for_backend(backend) interact with the proposed registry? After downloading the kernel from the given config spec, we set the _kernel_fn?

DN6 · 2025-10-23T09:40:28Z

I would maintain a separate _set_attention_backend() function within set_attention_backend()? Does it not complicate things further? Would rather have _check_attention_backend_requirements(backend) and _maybe_download_kernel_for_backend(backend) directly in set_attention_backend(). That's better no?

This is also good 👍🏽

How does _maybe_download_kernel_for_backend(backend) interact with the proposed registry? After downloading the kernel from the given config spec, we set the _kernel_fn?

Something like

def _maybe_download_kernel_for_backend(backend: AttentionBackendName) -> None:
    if backend not in _HUB_KERNELS_REGISTRY:
        return
    config = _HUB_KERNELS_REGISTRY[backend]

    if config._kernel_fn is not None:
        return

    try:
        from kernels import get_kernel

        kernel_module = get_kernel(config.repo_id, revision=config.revision)
        kernel_func = getattr(kernel_module, config.function_attr)

        # Cache the downloaded kernel function in the config object
        config._kernel_fn = kernel_func

    except Exception as e:
        raise

Co-authored-by: Dhruv Nair <dhruv@huggingface.co>

Co-authored-by: dn6 <dhruv@huggingface.co>

sayakpaul · 2025-10-25T07:15:35Z

@DN6 check now. Your feedback should have been addressed. I was able to completely get rid of kernels_utils.py as a consequence of that. I also decided to get rid of the DIFFUSERS_ENABLE_HUB_KERNELS environment var.

MekkCyber

Nice clean up! thanks

MekkCyber · 2025-10-28T09:54:03Z

src/diffusers/models/attention_dispatch.py

+_HUB_KERNELS_REGISTRY: Dict["AttentionBackendName", _HubKernelConfig] = {
+    # TODO: temporary revision for now. Remove when merged upstream into `main`.
+    AttentionBackendName._FLASH_3_HUB: _HubKernelConfig(
+        repo_id="kernels-community/flash-attn3", function_attr="flash_attn_func", revision="fake-ops-return-probs"
+    )
+}


what about the other backends ?

This will be populated as we incorporate others.

I mean shouldn't FA2 be here already ?

PRs are not merged yet:

[core] support flash attention through kernels #12387

[core] support sage attention through kernels #12439

ah okay sounds good !

sayakpaul added 2 commits October 13, 2025 15:14

refactor how attention kernels from hub are used.

ef6a483

up

7fd26bc

sayakpaul requested a review from DN6 October 13, 2025 10:14

MekkCyber reviewed Oct 13, 2025

View reviewed changes

src/diffusers/models/attention_dispatch.py Outdated Show resolved Hide resolved

Merge branch 'main' into refactor-hub-attn-kernels

40baf7d

sayakpaul mentioned this pull request Oct 22, 2025

[core] support sage attention through kernels #12439

Open

Merge branch 'main' into refactor-hub-attn-kernels

cb33adb

sayakpaul and others added 6 commits October 24, 2025 19:46

Merge branch 'main' into refactor-hub-attn-kernels

75a8046

refactor according to Dhruv's ideas.

f48ec46

Co-authored-by: Dhruv Nair <dhruv@huggingface.co>

empty

eed79ac

Co-authored-by: Dhruv Nair <dhruv@huggingface.co>

empty

7036bc3

Co-authored-by: Dhruv Nair <dhruv@huggingface.co>

empty

52eace1

Co-authored-by: dn6 <dhruv@huggingface.co>

up

df9fb6b

sayakpaul requested a review from MekkCyber October 25, 2025 18:54

Merge branch 'main' into refactor-hub-attn-kernels

9db988a

MekkCyber approved these changes Oct 28, 2025

View reviewed changes

up

66704ac

Uh oh!

[core] Refactor hub attn kernels #12475

Are you sure you want to change the base?

[core] Refactor hub attn kernels #12475

Conversation

sayakpaul commented Oct 13, 2025

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Oct 13, 2025

Uh oh!

MekkCyber left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

DN6 commented Oct 23, 2025

Uh oh!

sayakpaul commented Oct 23, 2025

Uh oh!

DN6 commented Oct 23, 2025

Uh oh!

sayakpaul commented Oct 25, 2025

Uh oh!

MekkCyber left a comment

Choose a reason for hiding this comment

Uh oh!

MekkCyber Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

sayakpaul Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

MekkCyber Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

sayakpaul Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

MekkCyber Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants