-
Notifications
You must be signed in to change notification settings - Fork 199
GPTQ Lite implementation #555
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
850ef67
ccfcfeb
9992c08
32ada2b
a183f34
1587293
c6e1f07
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -1111,6 +1111,39 @@ class SVDQuantConfig(QuantizeAlgorithmConfig): | |
| ) | ||
|
|
||
|
|
||
| class GPTQLiteConfig(QuantizeAlgorithmConfig): | ||
| """The config for GPTQ lite. | ||
|
|
||
| GPTQ lite is a variant of GPTQ that does not exactly follow the official GPTQ implementation. | ||
|
|
||
| GPTQ lite does not perform sequential quantization of layers. This means that the updated | ||
| activations are not used to process the next layer. | ||
|
|
||
| GPTQ lite also uses dynamic scales computed during the weight update phase. The original GPTQ | ||
| implementation uses static scales computed on the weights before beginning blockwise update. | ||
|
|
||
| """ | ||
|
|
||
| method: Literal["gptq_lite"] = ModeloptField("gptq_lite") | ||
| percdamp: float | None = ModeloptField( | ||
| default=0.01, | ||
| gt=0.0, | ||
| le=1.0, | ||
| title="Percentage damping factor.", | ||
| description="The percentage of average Hessian diagonal used for damping.", | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. if you have a reference from the original paper about what these are, could you also share the link too?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could you also add some instructions here, so users can know what's the impact of increasing/decreasing this parameter? |
||
| ) | ||
| block_size: int | None = ModeloptField( | ||
| default=128, | ||
| title="Block size for GPTQ weight update.", | ||
| description="The block size for GPTQ weight update.", | ||
| ) | ||
|
Comment on lines
+1135
to
+1139
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This should be the multiple of block_size used in quantization. We should explain it in the description as well. |
||
| hessian_state_path: str | None = ModeloptField( | ||
| default=None, | ||
| title="Path to the Hessian state file.", | ||
| description="The path to the Hessian state file.", | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe state: if the path exists, we load the hessian from the path instead of re-computing them. |
||
| ) | ||
|
|
||
|
|
||
| QuantizeQuantCfgType = dict[ | ||
| str | Callable, | ||
| QuantizerAttributeConfig | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you estimate how much effort is needed if we need to add this constraint? I am thinking if we can have a quick test to see what's the accuracy impact.