Skip to content

[Feature]: AutoDeploy: support HF-native fp8 dynamic quantization #10519

@lucaslie

Description

@lucaslie

🚀 The feature, motivation and pitch

Minimax checkpoint uses dynamic fp8 quantization using the HF-native format, see https://huggingface.co/MiniMaxAI/MiniMax-M2/blob/main/config.json#L96-L109.

Let's add support for this in our quantization sources. see https://nvidia.slack.com/archives/C08T55LHSG4/p1767814621927509

Alternatives

No response

Additional context

No response

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and checked the documentation and examples for answers to frequently asked questions.

Metadata

Metadata

Assignees

Labels

AutoDeploy<NV> AutoDeploy BackendLow PrecisionLower-precision formats (INT8/INT4/FP8) for TRTLLM quantization (AWQ, GPTQ).feature requestNew feature or request. This includes new model, dtype, functionality support

Type

Projects

Status

Ready

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions