-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Pull requests: NVIDIA/Megatron-LM
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
Update setup.py
community-request
complexity: low
Expert Review
Apply this label to indicate that your PR is ready for expert review.
add the logic to enable chunked mlp during training
community-request
#3656
opened Mar 2, 2026 by
pengdurice
Loading…
3 of 6 tasks
add mix_hidden_states option in conversion
Expert Review
Apply this label to indicate that your PR is ready for expert review.
Final Review
PR is in the "final review" stage
Prevent double serialization inside Flask server
Expert Review
Apply this label to indicate that your PR is ready for expert review.
feat(checkpoint): zero-copy storage sharing in CheckpointWithoutOutput
complexity: low
Final Review
PR is in the "final review" stage
#3649
opened Mar 2, 2026 by
Victarry
Loading…
5 tasks done
Fix TransformerConfig validation for mixed dense/MoE upcycling
community-request
#3647
opened Mar 1, 2026 by
rkteddy
Loading…
1 of 6 tasks
Fix upcycling state dict conversion for mixed dense/MoE models
community-request
Expert Review
Apply this label to indicate that your PR is ready for expert review.
#3646
opened Mar 1, 2026 by
rkteddy
Loading…
1 of 6 tasks
Enhance and fix NVTX for training
complexity: low
Expert Review
Apply this label to indicate that your PR is ready for expert review.
#3642
opened Feb 28, 2026 by
yaox12
Loading…
6 tasks
refactor to support emerging optimizers beyond muon
#3638
opened Feb 27, 2026 by
FDecaYed
Loading…
6 tasks
[Draft][main] enable manual_dgrad_release for tst
Run MBridge tests
Attach this for testing this PR against MBridge main
[Draft][dev] enable manual_dgrad_release for tst
Run MBridge tests
Attach this for testing this PR against MBridge main
feat: support unified model bagel with generation and understanding
community-request
#3635
opened Feb 27, 2026 by
sophiayyya
•
Draft
1 of 6 tasks
Fix illegal memory access with mamba inference
Expert Review
Apply this label to indicate that your PR is ready for expert review.
fix: handle zero-size tensors in MoE token dispatchers
community-request
Expert Review
Apply this label to indicate that your PR is ready for expert review.
needs-follow-up
Issue needs follow-up
#3626
opened Feb 26, 2026 by
callum-ward-inflection
Loading…
6 tasks done
Correctly generate state dict in MultiTokenPredictionBlock
Final Review
PR is in the "final review" stage
Upgrade GitHub Actions to latest versions
community-request
needs-follow-up
Issue needs follow-up
#3609
opened Feb 26, 2026 by
salmanmkc
Loading…
Fix cp and not per token loss calculation in schedules.py
#3607
opened Feb 26, 2026 by
wplf
Loading…
6 tasks
[feature] MegaScope Tensor Tracer
community-request
#3606
opened Feb 26, 2026 by
superay-a
Loading…
4 of 6 tasks
Previous Next
ProTip!
Mix and match filters to narrow down what you’re looking for.