Decouple FSDP all-gather tagging from activation checkpointing by fmassa · Pull Request #336 · meta-pytorch/autoparallel

fmassa · 2026-03-05T15:26:10Z

The enable_ac flag in AutoParallel was a temporary workaround from when PyTorch's tracing couldn't capture user-specified activation checkpointing. PyTorch now properly propagates user AC annotations (node.meta["recompute"] and node.meta["ac_graph_id"]) during AOT tracing, so AutoParallel no longer needs to apply its own AC staging policy.

The only non-user-AC concern bundled into enable_ac was tagging FSDP all-gather collectives for recomputation/saving, which is a parallelism strategy decision that should happen unconditionally based on reshard_after_forward.

This PR:

Renames ac_joint_pass to tag_fsdp_collectives_for_recomputation, keeping only the FSDP tagging logic
Removes the AC staging/policy functions (mark_nodes_as_must_save_to_stage_recomputation, _apply_ac_policy, _mark_nodes_as_must_save, and related helpers) that are now handled by user-specified torch.utils.checkpoint.checkpoint()
Removes enable_ac and ac_stage_size_in_GiB parameters from AutoParallel.init
Makes FSDP collective tagging unconditional (no longer gated behind enable_ac)

Authored with Claude.

The enable_ac flag in AutoParallel was a temporary workaround from when PyTorch's tracing couldn't capture user-specified activation checkpointing. PyTorch now properly propagates user AC annotations (node.meta["recompute"] and node.meta["ac_graph_id"]) during AOT tracing, so AutoParallel no longer needs to apply its own AC staging policy. The only non-user-AC concern bundled into enable_ac was tagging FSDP all-gather collectives for recomputation/saving, which is a parallelism strategy decision that should happen unconditionally based on reshard_after_forward. This PR: - Renames ac_joint_pass to tag_fsdp_collectives_for_recomputation, keeping only the FSDP tagging logic - Removes the AC staging/policy functions (mark_nodes_as_must_save_to_stage_recomputation, _apply_ac_policy, _mark_nodes_as_must_save, and related helpers) that are now handled by user-specified torch.utils.checkpoint.checkpoint() - Removes enable_ac and ac_stage_size_in_GiB parameters from AutoParallel.__init__ - Makes FSDP collective tagging unconditional (no longer gated behind enable_ac) Authored with Claude.

wconstab

lgtm

fmassa requested review from sanketpurandare and wconstab March 5, 2026 15:26

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Mar 5, 2026

fmassa marked this pull request as draft March 6, 2026 15:15

wconstab approved these changes Mar 7, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decouple FSDP all-gather tagging from activation checkpointing#336

Decouple FSDP all-gather tagging from activation checkpointing#336
fmassa wants to merge 1 commit intomainfrom
fmassa/ac_fsdp_tagging

fmassa commented Mar 5, 2026

Uh oh!

wconstab left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

fmassa commented Mar 5, 2026

Uh oh!

wconstab left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants