GPTQ Lite implementation #555

sugunav14 · 2025-11-13T22:43:59Z

What does this PR do?

Type of change: New feature

Overview: Adds support for GPTQ algorithm. This PR implements a modified version of the official GPTQ algorithm; the key differences are

The updated activations from each layer are not used for hessian computation
QDQ updates are made based on quantizer block_sizes

Usage

Modify "algorithm" field in quant_cfg to "gptq_lite".

Note: Does not currently work with AWQ

# Add a code snippet demonstrating how to use this

Testing

Added unit tests to test helper functions + e2e flow
Reproduce results from the original paper

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed.
Is this change backward compatible?: Yes
Did you write any new necessary tests?: Yes
Did you add or update any necessary documentation?: No
Did you update Changelog?: No

Additional Information

Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

copy-pr-bot · 2025-11-13T22:44:11Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

codecov · 2025-11-13T22:56:58Z

Codecov Report

❌ Patch coverage is 18.23899% with 130 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.92%. Comparing base (e74a468) to head (c6e1f07).
⚠️ Report is 24 commits behind head on main.

Files with missing lines	Patch %	Lines
modelopt/torch/quantization/model_calib.py	11.26%	126 Missing ⚠️
modelopt/torch/utils/perf.py	20.00%	4 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #555      +/-   ##
==========================================
- Coverage   74.36%   73.92%   -0.45%     
==========================================
  Files         182      182              
  Lines       18216    18395     +179     
==========================================
+ Hits        13547    13598      +51     
- Misses       4669     4797     +128

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

cjluo-nv · 2025-11-18T17:28:20Z

modelopt/torch/quantization/config.py

+        gt=0.0,
+        le=1.0,
+        title="Percentage damping factor.",
+        description="The percentage of average Hessian diagonal used for damping.",


if you have a reference from the original paper about what these are, could you also share the link too?

cjluo-nv · 2025-11-18T17:29:43Z

modelopt/torch/quantization/model_calib.py

+    batch_size = input.shape[0]
+
+    # Incremental averaging: scale down old hessian
+    hessian *= n_samples / (n_samples + batch_size)


what's the dtype of hessian? Do we need to up cast to fp32 for this division?

cjluo-nv · 2025-11-18T17:33:52Z

modelopt/torch/quantization/model_calib.py

+        hessian, n_samples = update_hessian(input[0], state["hessian"], state["n_samples"])
+        hessian_state[module.name] = {"hessian": hessian, "n_samples": n_samples}
+        torch.cuda.empty_cache()
+        gc.collect()


do we have to do gc.collect() here? It's going to be very slow

cjluo-nv · 2025-11-18T17:35:27Z

modelopt/torch/quantization/model_calib.py

+
+    # Phase 1: Collect statistics for quantizers
+    enable_stats_collection(model)
+    max_calibrate(model, forward_loop)


do you need forward_loop here? Is this for weight amax calib only?

cjluo-nv · 2025-11-18T17:36:28Z

modelopt/torch/quantization/model_calib.py

+        state = hessian_state[module.name]
+        hessian = state["hessian"].to(module.weight.device)
+        blockwise_weight_update(module, hessian, block_size, percdamp)
+        torch.cuda.empty_cache()


maybe you can del the hessian after applying blockwise_weight_update?

cjluo-nv · 2025-11-18T17:38:21Z

modelopt/torch/quantization/config.py

+    hessian_state_path: str | None = ModeloptField(
+        default=None,
+        title="Path to the Hessian state file.",
+        description="The path to the Hessian state file.",


Maybe state: if the path exists, we load the hessian from the path instead of re-computing them.

sugunav14 added 3 commits November 11, 2025 18:45

implemented gptq lite

850ef67

Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

updated for supporting INT4

ccfcfeb

Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

updated config

9992c08

Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

sugunav14 requested review from a team as code owners November 13, 2025 22:44

sugunav14 requested a review from RalphMao November 13, 2025 22:44

sugunav14 marked this pull request as draft November 13, 2025 22:44

sugunav14 self-assigned this Nov 15, 2025

sugunav14 requested review from kaix-nv, meenchen and realAsma November 15, 2025 01:26

sugunav14 added 3 commits November 17, 2025 18:12

added tests

32ada2b

Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

minr update

a183f34

Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

update unit tests

1587293

Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

sugunav14 requested a review from cjluo-nv November 17, 2025 23:59

sugunav14 marked this pull request as ready for review November 17, 2025 23:59

update

c6e1f07

Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>

cjluo-nv reviewed Nov 18, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GPTQ Lite implementation #555

GPTQ Lite implementation #555

Uh oh!

sugunav14 commented Nov 13, 2025 •

edited

Loading

Uh oh!

copy-pr-bot bot commented Nov 13, 2025

Uh oh!

codecov bot commented Nov 13, 2025 •

edited

Loading

Uh oh!

cjluo-nv Nov 18, 2025

Uh oh!

cjluo-nv Nov 18, 2025

Uh oh!

cjluo-nv Nov 18, 2025

Uh oh!

cjluo-nv Nov 18, 2025

Uh oh!

cjluo-nv Nov 18, 2025

Uh oh!

cjluo-nv Nov 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

GPTQ Lite implementation #555

Are you sure you want to change the base?

GPTQ Lite implementation #555

Uh oh!

Conversation

sugunav14 commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Uh oh!

copy-pr-bot bot commented Nov 13, 2025

Uh oh!

codecov bot commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

cjluo-nv Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

cjluo-nv Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

cjluo-nv Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

cjluo-nv Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

cjluo-nv Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

cjluo-nv Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sugunav14 commented Nov 13, 2025 •

edited

Loading

codecov bot commented Nov 13, 2025 •

edited

Loading