-
Couldn't load subscription status.
- Fork 269
Open
Labels
bugSomething isn't workingSomething isn't working
Description
⚙️ Your current environment
The output of python collect_env.py
### Environment Information ###
Operating System: `Linux-5.4.250-4-velinux1u1-amd64-x86_64-with-glibc2.35`
Python Version: `3.11.13 (main, Jun 5 2025, 13:12:00) [GCC 11.2.0]`
llm-compressor Version: `0.7.1`
compressed-tensors Version: `0.11.0`
transformers Version: `4.55.2`
torch Version: `2.8.0`
CUDA Devices: `['NVIDIA H20', 'NVIDIA H20', 'NVIDIA H20', 'NVIDIA H20', 'NVIDIA H20', 'NVIDIA H20', 'NVIDIA H20', 'NVIDIA H20']`
AMD Devices: `None`
🐛 Describe the bug
When executing examples/quantizing_moe/deepseek_r1_example.py, an OOM (Out of Memory) error occurs at step 235 (model.layers.3.mlp.experts.226.down_proj), with the process taking approximately 1 hour. However, when using version v0.7.1, the memory usage is normal, and it only takes 25 minutes to reach step 235.
🛠️ Steps to reproduce
No response
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working