-
Notifications
You must be signed in to change notification settings - Fork 39
[Quantization Args] Add scale and zp dtype #508
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Dipika todo: should try a w4a16 with zp to make sure it is saved correctly |
|
Im unsure about the zp_dtype = None meaning symmetric quantization if we're going to leave symmetric as it's own field. Feels like either symmetric should be deprecated or zp_dtype should be ignored when symmetric is true. I strongly dislike scale_dtype = None meaning dynamic quantization, that seems entirely unintuitive. While zp_dtype=None could be understood as 'there is no zp'-> symmetric quant, scale_dtype=None has no such logical progression to dynamic quant. It also has the same issue as above with duplicating the information in the dynamic field. |
The point is to make it clear in the metadata what is compressed on disk. When doing asymmetric quantization or dynamic quantization, neither the scale or zp are saved or set in the checkpoint. Having them set in the config would be extremely confusing. You can also run dynamic generations with any fp dtype, depending on how you load your model as it will just match the dtype of the activations. So having it defined in the config doesn't make a lot of sense. In the case of the zp_dtype, it is ignored if symmetric. It is set as None in the config. |
src/compressed_tensors/compressors/quantized_compressors/fp4_quantized.py
Outdated
Show resolved
Hide resolved
src/compressed_tensors/compressors/quantized_compressors/fp4_quantized.py
Show resolved
Hide resolved
brian-dellabetta
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One question on skip scale
dc235db
Summary
is_fp4and some of the fp4 specific functionality that was tied closely to the global scale generation- We are not applying this logic for now but would like to discuss with the team to gather thoughts:
zp_dtypetoNoneif running symmetric quantization.scale_dtypetoNoneif running dynamic or local quantization.round_to_quantized_type--->round_to_quantized_type_argsand add clamping functionality to this methodround_to_quantized_type_dtypewhich similarly clamps and rounds if given a dtype as an input, not a set of qargsQuestion:
Example Updates:
KV Cache Scheme:
NVFP4:
FP8 Dynamic:
W4A16 + Asym