-
Notifications
You must be signed in to change notification settings - Fork 2k
Open
Labels
AutoDeploy<NV> AutoDeploy Backend<NV> AutoDeploy BackendLow PrecisionLower-precision formats (INT8/INT4/FP8) for TRTLLM quantization (AWQ, GPTQ).Lower-precision formats (INT8/INT4/FP8) for TRTLLM quantization (AWQ, GPTQ).feature requestNew feature or request. This includes new model, dtype, functionality supportNew feature or request. This includes new model, dtype, functionality support
Description
🚀 The feature, motivation and pitch
Minimax checkpoint uses dynamic fp8 quantization using the HF-native format, see https://huggingface.co/MiniMaxAI/MiniMax-M2/blob/main/config.json#L96-L109.
Let's add support for this in our quantization sources. see https://nvidia.slack.com/archives/C08T55LHSG4/p1767814621927509
Alternatives
No response
Additional context
No response
Before submitting a new issue...
- Make sure you already searched for relevant issues, and checked the documentation and examples for answers to frequently asked questions.
Metadata
Metadata
Assignees
Labels
AutoDeploy<NV> AutoDeploy Backend<NV> AutoDeploy BackendLow PrecisionLower-precision formats (INT8/INT4/FP8) for TRTLLM quantization (AWQ, GPTQ).Lower-precision formats (INT8/INT4/FP8) for TRTLLM quantization (AWQ, GPTQ).feature requestNew feature or request. This includes new model, dtype, functionality supportNew feature or request. This includes new model, dtype, functionality support
Type
Projects
Status
Ready