-
Notifications
You must be signed in to change notification settings - Fork 265
[Torch FX] Compress PT2E Support #3663
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
anzr299
wants to merge
125
commits into
openvinotoolkit:develop
Choose a base branch
from
anzr299:an/fx/compress_pt2e
base: develop
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+14,189
−63
Open
Changes from all commits
Commits
Show all changes
125 commits
Select commit
Hold shift + click to select a range
190f9d5
init
anzr299 c52fcca
fixes
anzr299 4e56cb5
add message for unsupported external quantizers
anzr299 9651ceb
add algorithm
anzr299 14daeb5
impotr openvino quantizer from nncf instead of executorch
anzr299 3746815
Add observers and openvino quantizer to nncf
anzr299 0815dc5
fix
anzr299 1b8d940
minor fix
anzr299 7d35374
fix
anzr299 427ebc2
fix some more bugs; observers was importing from torchao. causing mis…
anzr299 24dbfb6
add compress pt2e to init
anzr299 4bb8c1a
fix quantizer init file. Remove extra code.
anzr299 8902842
small fix for the big problem:)
anzr299 3842538
fix quantizer preset definition
anzr299 2e70c2e
fix openvino quantizer for ptq. call _algo instead of legacy _min_max…
anzr299 b1c9aad
fix quantizer defaults
anzr299 33fe01c
microfix
anzr299 d8e1006
precommit fix
anzr299 88a8472
revert openvino quantizer to old
anzr299 7a8e51a
create ovquantizer in executorch dir
anzr299 fed5052
update executorch quantizer location.
anzr299 2866473
check if openvino quantizer has weight compression in openvino adapter
anzr299 7171d56
review comments
anzr299 3e3b067
revert ignored scope changes; make sensitivity metric None to check i…
anzr299 5b7b210
precommit fix
anzr299 71a479f
pre commit format
anzr299 b24a59c
rename executorch quantizer to test_quantizer
anzr299 d12225a
fix last precommit
anzr299 9870ee2
remove unused mypy ignore
anzr299 8015629
get the mode as struct
anzr299 0804218
fix algorithm
anzr299 1f1fda3
remove quantizer and observers from nncf. Instead import from executorch
anzr299 623ce46
rework wc algorithm so that get_weight_comrpession_params becomes mor…
anzr299 d14a6eb
fix bugs; use sensitivity metric instead of mixed precision algo
anzr299 e91b455
update algorithm with new reworking
anzr299 448bf84
changes
anzr299 8e23572
review changes
anzr299 36ddf53
change WeightsCompressionPT2E to ExperimentalWeightsCompression
anzr299 07b730b
change ExperimentalWeightsCompression to WeightsCompression
anzr299 d5dd422
add comments
anzr299 076a76b
add typehints
anzr299 2ce9eec
add docstrings
anzr299 1bebf3e
add typehint for quantize pt2e
anzr299 ea81cfd
Merge branch 'openvinotoolkit:develop' into an/fx/compress_pt2e
anzr299 e82920f
return original develop branch changes
anzr299 82cc10b
update typehints and docs
anzr299 beae508
format
anzr299 8bd95df
update type hinting of openvino adapter
anzr299 aac9d3f
add test
anzr299 4278cfd
update reference graphs; use more samples for calibration dataset. Th…
anzr299 6fd5216
remove groupsize values as return statement from get_weight_compressi…
anzr299 118b611
update algorithm
anzr299 e9f3cd4
change WeightCompression to OriginalWeightCompression in experimental…
anzr299 a969e58
update docstrings as discussed offline
anzr299 71d0597
revert torchaoadapter code
anzr299 bf671ff
precommit fix
anzr299 5f1c2de
rename test_quantizer to test_quantizer_compression.py
anzr299 6f81879
review changes
anzr299 eb0ff16
review changes
anzr299 8afeb9d
precommit fix
anzr299 f491c8d
update quantizer test to include scales; remve sensitivity metric fro…
anzr299 b9f3eff
update test and references
anzr299 09dabf6
minor
anzr299 68316a5
add workflow for executorch test
anzr299 58b8992
update workflow and makefile
anzr299 e7bae1f
update execiutorch test requirements.
anzr299 4b0d8ea
Merge branch 'openvinotoolkit:develop' into an/fx/compress_pt2e
anzr299 d4da34f
fix precommit
anzr299 2b91658
override constraint in executorch workflow
anzr299 93c3f19
minor fix
anzr299 932b296
update workflow for fix
anzr299 67ab135
update workflow file
anzr299 a23acaf
install executorch after pytorch
anzr299 6462284
install torch nightly
anzr299 cf7e8d3
update requirements and revert workflow changes
anzr299 a07dc07
fix minor workflow file issue
anzr299 0506bca
install with no build isolation
anzr299 f8675ad
include executorch requirements
anzr299 52a7d5a
include openvino in requirements
anzr299 9e02948
fix
anzr299 a578fce
fix
anzr299 8ae6a80
update requirements
anzr299 c7210b8
add conftest and __init__
anzr299 2f8b296
use older pytorch commit
anzr299 75ccdcb
change torch versions to 2.10.0.dev20250922+cpu
anzr299 75cc255
install executorch directly from requirements txt
anzr299 3cdfe74
comments
anzr299 e4f9286
seperate executorch installation
anzr299 009c587
precommit fix
anzr299 6e379c8
conftest precommit
anzr299 e45f796
update ref location for executorch
anzr299 f2ece8c
define ratio in compress_pt2e API and not Quantizer itself; Update test
anzr299 387d69c
Apply suggestion from @daniil-lyakhov
anzr299 4ace0df
Merge branch 'openvinotoolkit:develop' into an/fx/compress_pt2e
anzr299 00c8897
pre-commit fix
anzr299 dd34b9b
Apply suggestions from code review
anzr299 b9509bc
review changes
anzr299 d960d9a
add mypy; review changes
anzr299 758bd67
precommit fix; seperate mixed precision algorithm application from th…
anzr299 193c404
add credit for transformers
anzr299 8807c10
all optional arguemnts are keyword-only
anzr299 d86a90a
review changes
anzr299 f2f01f2
review changes
anzr299 0dc7f64
executorch fix
anzr299 f6d3739
remove extra comments
anzr299 7ffa572
fix duplication of tests
anzr299 64803c3
avoid square complexity
anzr299 9d25bed
review changes
anzr299 859328e
change private methods to public which are used externally
anzr299 33a4b77
review changes
anzr299 ffd601d
review changes
anzr299 8484f1a
remove extra function
anzr299 5aadf1c
minor fix
anzr299 f93eed2
remove private var assignments from experimeental WC algo init
anzr299 a5bb632
review changes
anzr299 5665bed
add description for MP and validation methods in algo
anzr299 ee86a20
review changes
anzr299 b31a1ab
update docstring
anzr299 c6557f6
fix error
anzr299 ee9a2de
review changes
anzr299 b1fcfa9
review changes
anzr299 6c56b91
remove extra kwarg
anzr299 e5ea21b
fix executorch test
anzr299 7bf3c78
review changes
anzr299 f2d9968
review changes
anzr299 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
10 changes: 10 additions & 0 deletions
10
src/nncf/experimental/quantization/algorithms/weight_compression/__init__.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,10 @@ | ||
| # Copyright (c) 2025 Intel Corporation | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. |
141 changes: 141 additions & 0 deletions
141
src/nncf/experimental/quantization/algorithms/weight_compression/algorithm.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,141 @@ | ||
| # Copyright (c) 2025 Intel Corporation | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
|
|
||
| from typing import Iterable, Optional | ||
|
|
||
| import torch | ||
|
|
||
| from nncf import AdvancedCompressionParameters | ||
| from nncf import CompressionFormat | ||
| from nncf import CompressWeightsMode | ||
| from nncf import Dataset | ||
| from nncf import SensitivityMetric | ||
| from nncf.common.graph.graph import NNCFGraph | ||
| from nncf.common.graph.graph import NNCFNode | ||
| from nncf.common.logging import nncf_logger | ||
| from nncf.common.tensor_statistics.statistic_point import StatisticPointsContainer | ||
| from nncf.common.utils.backend import BackendType | ||
| from nncf.experimental.quantization.quantizer import Quantizer | ||
| from nncf.quantization.algorithms.algorithm import Algorithm | ||
| from nncf.quantization.algorithms.weight_compression.algorithm import WeightCompression as OriginalWeightCompression | ||
|
|
||
|
|
||
| class WeightsCompression(Algorithm): | ||
| """ | ||
| Post-training Weight Compression algorithm implementation. | ||
|
|
||
| Compresses weights of Linear and Embedding layers to 8-bit integer or | ||
| to 4-bit integer/float depending on mode, ratio and group size. | ||
| """ | ||
|
|
||
| def __init__( | ||
| self, | ||
| quantizer: Quantizer, | ||
| ratio: float, | ||
| subset_size: int, | ||
| awq: bool, | ||
| scale_estimation: bool, | ||
| gptq: bool, | ||
| lora_correction: bool, | ||
| sensitivity_metric: SensitivityMetric, | ||
| compression_format: CompressionFormat, | ||
| advanced_parameters: AdvancedCompressionParameters, | ||
| ) -> torch.fx.GraphModule: | ||
| """ | ||
| :param quantizer: Quantizer to use in WeightCompression algorithm. | ||
daniil-lyakhov marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| :param ratio: the ratio between primary and backup precisions (e.g. 0.9 means 90% of layers specified as | ||
| `ratio_defining_params` by the quantizer are quantized to INT4 | ||
| :param subset_size: Number of data samples to calculate activation statistics used for assigning different | ||
| quantization precision. | ||
| :param awq: determines whether to use or not modified AWQ algorithm. | ||
| :param scale_estimation: determines whether to use or not scale estimation for 4 bit layers. | ||
| :param gptq: determines whether to use or not GPTQ algorithm. | ||
| :param lora_correction: determines whether to use or not LoRA Correction algorithm. | ||
| :param sensitivity_metric: The sensitivity metric for assigning quantization precision to layers. In order to | ||
| preserve the accuracy of the model, the more sensitive layers receives a higher precision. | ||
| :param compression_format: Describes the format in which the model is saved after weight compression. | ||
| :param advanced_parameters: advanced parameters for algorithms in compression pipeline. | ||
| """ | ||
| self._quantizer = quantizer | ||
| wc_config = quantizer.get_weight_compression_config() | ||
|
|
||
| mode = wc_config.get("mode", CompressWeightsMode.INT8_ASYM) | ||
|
|
||
| self._algo = OriginalWeightCompression( | ||
| mode=CompressWeightsMode(mode), | ||
| ratio=ratio, | ||
| group_size=wc_config.get("group_size", None), | ||
| ignored_scope=None, | ||
| all_layers=wc_config.get("all_layers", None), | ||
| sensitivity_metric=sensitivity_metric, | ||
| awq=awq, | ||
| subset_size=subset_size, | ||
| scale_estimation=scale_estimation, | ||
| gptq=gptq, | ||
| lora_correction=lora_correction, | ||
| backup_mode=wc_config.get("backup_mode", None), | ||
| compression_format=compression_format, | ||
| advanced_parameters=advanced_parameters, | ||
| ) | ||
|
|
||
| def available_backends(self) -> list[BackendType]: | ||
| return [BackendType.TORCH_FX] | ||
|
|
||
| def apply( | ||
| self, | ||
| model: torch.fx.GraphModule, | ||
| graph: NNCFGraph, | ||
| statistic_points: Optional[StatisticPointsContainer] = None, | ||
| dataset: Optional[Dataset] = None, | ||
| ) -> torch.fx.GraphModule: | ||
| self._algo.set_backend_entity(model) | ||
|
|
||
| all_weight_params, ratio_defining_params, skipped_weight_params = ( | ||
| self._quantizer.get_weight_compression_parameters(model, graph) | ||
| ) | ||
| # Collect statistics for the weights compression | ||
| statistics, statistic_points = self._algo.collect_statistics_and_statistic_points( | ||
| model, graph, statistic_points, dataset, ratio_defining_params, all_weight_params | ||
| ) | ||
| # Apply Mixed precision algorithm to ratio defining parameters | ||
| self._algo.apply_mixed_precision(ratio_defining_params, model, graph, statistic_points) | ||
| self._algo.validate_group_size(ratio_defining_params) | ||
|
|
||
| # Print statistics | ||
| nncf_logger.info( | ||
| self._algo.get_bitwidth_distribution_str(all_weight_params, ratio_defining_params, skipped_weight_params) | ||
| ) | ||
|
|
||
| # Filter all_weight_params by excluding nodes that should remain in their original floating-point precision | ||
| all_weight_params = [w_params for w_params in all_weight_params if w_params.compression_config is not None] | ||
| return self._algo.apply_with_parameters( | ||
| model, | ||
| graph, | ||
| dataset, | ||
| statistics, | ||
| all_weight_params, | ||
| ) | ||
|
|
||
| def get_statistic_points( | ||
| self, | ||
| model: torch.fx.GraphModule, | ||
| graph: NNCFGraph, | ||
| nodes_and_port_ids: Iterable[tuple[NNCFNode, int]], | ||
| ) -> StatisticPointsContainer: | ||
| """ | ||
| Returns statistic points, for which StatisticsCollector should collect statistics. | ||
|
|
||
| :param model: Model for statistics collection. | ||
| :param graph: Model graph. | ||
| :param nodes_and_port_ids: Nodes and port ids for which statistics should be collected. | ||
| :return: Statistic points, for which StatisticsCollector should collect statistics. | ||
| """ | ||
| return self._algo.get_statistic_points(model, graph, nodes_and_port_ids) | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.