-
Notifications
You must be signed in to change notification settings - Fork 357
Description
Update: Our team will evaluate this more before outsourcing the migration to more people in the community
Context:
Previously we use AffineQuantizedTensor for many of our use cases including int4, float8, intx, floatx. It introduces some complicated abstractions like Layout, people have been saying it's a bit hard to understand, and there are many indirections in the code.
As an effort simplify the code base and make it easier to contribute to, we have been adding new features with a different structure in mind. Now we want to structure Tensors by "dtype" and "packing_format", e.g. we'll have Int4PreshuffledTensor, Int8Tensor, Float8Tensor, instead of having AffineQuantizedTensor and multiple layouts.
Please check out our updated docs for the new tensor subclass organization structure and guide for design:
- quantization overview: https://docs-preview.pytorch.org/pytorch/ao/2723/quantization_overview.html
- contributor guide: https://docs-preview.pytorch.org/pytorch/ao/2723/contributor_guide.html
- Examples of tensor subclasses following new design: https://github.com/pytorch/ao/tree/main/torchao/quantization/quantize_/workflows
List of things to migrate:
INT8
- [move to prototype] https://github.com/pytorch/ao/blob/main/torchao/dtypes/uintx/block_sparse_layout.py @jainapurva
- [migrate] https://github.com/pytorch/ao/blob/main/torchao/dtypes/uintx/plain_layout.py @namgyu-youn introduce new int8 quantization API #3241
- [move to prototype] https://github.com/pytorch/ao/blob/main/torchao/dtypes/uintx/semi_sparse_layout.py @namgyu-youn Introduce new W8A8-FP-CSR quantitzation API #3258 (no need to migrate to new tensor structure)
[migration done, TODO: delete old path after all migration is done] INT4 weight only
- https://github.com/pytorch/ao/blob/main/torchao/dtypes/uintx/int4_cpu_layout.py @Xia-Weiwen https://github.com/pytorch/ao/blob/main/torchao/quantization/quantize_/workflows/int4/int4_opaque_tensor.py
- https://github.com/pytorch/ao/blob/main/torchao/dtypes/uintx/int4_xpu_layout.py @liangan1 Add Int4PlainInt32Tensor #2845
- https://github.com/pytorch/ao/blob/main/torchao/dtypes/uintx/marlin_sparse_layout.py @liangel-02 Int4 sparse marlin tensor #2771
- https://github.com/pytorch/ao/blob/main/torchao/dtypes/uintx/tensor_core_tiled_layout.py @jerryzh168 Add Int4TilePackedTo4dTensor #2791
- HQQ support for tensor core tiled layout @jerryzh168 Add hqq support for Int4TilePackedTo4dTensor #2912
[move to prototype] INT4 weight + int8 activation
- https://github.com/pytorch/ao/blob/main/torchao/dtypes/uintx/cutlass_int4_packed_layout.py
- https://github.com/pytorch/ao/blob/main/torchao/dtypes/uintx/dyn_int8_act_int4_wei_cpu_layout.py
- https://github.com/pytorch/ao/blob/main/torchao/dtypes/uintx/marlin_qqq_tensor.py
UINTx Weight Only
- [move to protoype or migrate (check with Hicham)] https://github.com/pytorch/ao/blob/main/torchao/dtypes/uintx/gemlite_layout.py
- [move to protoype] https://github.com/pytorch/ao/blob/main/torchao/dtypes/uintx/uintx_layout.py
[migration done, TODO: delete old path after all migration is done] Int8DynamicActivationIntxWeightConfig
- https://github.com/pytorch/ao/blob/main/torchao/dtypes/uintx/packed_linear_int8_dynamic_activation_intx_weight_layout.py @metascroy Introduce IntxOpaqueTensor to replace PackedInt8DynamicActivationIntxWeightLayout in AQT #2742
- https://github.com/pytorch/ao/blob/main/torchao/dtypes/uintx/q_dq_layout.py @metascroy Add IntxUnpackedTensor #2732
FP8
- [migrate] https://github.com/pytorch/ao/blob/main/torchao/dtypes/floatx/cutlass_semi_sparse_layout.py @namgyu-youn Introduce new W8A8-FP-CSR quantitzation API #3258 and @bbeckca [WIP] Move float8 cutlass sparse layout to Float8SemiSparseTensor #3182
FPx