Skip to content

Conversation

@sugunav14
Copy link
Contributor

@sugunav14 sugunav14 commented Nov 13, 2025

What does this PR do?

Type of change: New feature

Overview: Adds support for GPTQ algorithm. This PR implements a modified version of the official GPTQ algorithm; the key differences are

  1. The updated activations from each layer are not used for hessian computation
  2. QDQ updates are made based on quantizer block_sizes

Usage

Modify "algorithm" field in quant_cfg to "gptq_lite".

Note: Does not currently work with AWQ

# Add a code snippet demonstrating how to use this

Testing

  • Added unit tests to test helper functions + e2e flow
  • Reproduce results from the original paper

Before your PR is "Ready for review"

  • Make sure you read and follow Contributor guidelines and your commits are signed.
  • Is this change backward compatible?: Yes
  • Did you write any new necessary tests?: Yes
  • Did you add or update any necessary documentation?: No
  • Did you update Changelog?: No

Additional Information

Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>
Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>
Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>
@sugunav14 sugunav14 requested review from a team as code owners November 13, 2025 22:44
@sugunav14 sugunav14 requested a review from RalphMao November 13, 2025 22:44
@sugunav14 sugunav14 marked this pull request as draft November 13, 2025 22:44
@copy-pr-bot
Copy link

copy-pr-bot bot commented Nov 13, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@codecov
Copy link

codecov bot commented Nov 13, 2025

Codecov Report

❌ Patch coverage is 18.23899% with 130 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.92%. Comparing base (e74a468) to head (c6e1f07).
⚠️ Report is 24 commits behind head on main.

Files with missing lines Patch % Lines
modelopt/torch/quantization/model_calib.py 11.26% 126 Missing ⚠️
modelopt/torch/utils/perf.py 20.00% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #555      +/-   ##
==========================================
- Coverage   74.36%   73.92%   -0.45%     
==========================================
  Files         182      182              
  Lines       18216    18395     +179     
==========================================
+ Hits        13547    13598      +51     
- Misses       4669     4797     +128     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@sugunav14 sugunav14 self-assigned this Nov 15, 2025
Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>
Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>
Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>
@sugunav14 sugunav14 requested a review from cjluo-nv November 17, 2025 23:59
@sugunav14 sugunav14 marked this pull request as ready for review November 17, 2025 23:59
Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>
gt=0.0,
le=1.0,
title="Percentage damping factor.",
description="The percentage of average Hessian diagonal used for damping.",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you have a reference from the original paper about what these are, could you also share the link too?

batch_size = input.shape[0]

# Incremental averaging: scale down old hessian
hessian *= n_samples / (n_samples + batch_size)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the dtype of hessian? Do we need to up cast to fp32 for this division?

hessian, n_samples = update_hessian(input[0], state["hessian"], state["n_samples"])
hessian_state[module.name] = {"hessian": hessian, "n_samples": n_samples}
torch.cuda.empty_cache()
gc.collect()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have to do gc.collect() here? It's going to be very slow


# Phase 1: Collect statistics for quantizers
enable_stats_collection(model)
max_calibrate(model, forward_loop)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you need forward_loop here? Is this for weight amax calib only?

state = hessian_state[module.name]
hessian = state["hessian"].to(module.weight.device)
blockwise_weight_update(module, hessian, block_size, percdamp)
torch.cuda.empty_cache()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe you can del the hessian after applying blockwise_weight_update?

hessian_state_path: str | None = ModeloptField(
default=None,
title="Path to the Hessian state file.",
description="The path to the Hessian state file.",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe state: if the path exists, we load the hessian from the path instead of re-computing them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants