Name	Name	Last commit message	Last commit date
parent directory ..
README.md	README.md
data_utils.py	data_utils.py
eval_utils.py	eval_utils.py
gptaq_utils.py	gptaq_utils.py
gptq_utils.py	gptq_utils.py
hadamard_utils.py	hadamard_utils.py
main.py	main.py
model_utils.py	model_utils.py
monkeypatch.py	monkeypatch.py
quant_utils.py	quant_utils.py
requirements.txt	requirements.txt
rotation_utils.py	rotation_utils.py
run_llama.sh	run_llama.sh
utils.py	utils.py

Name

Last commit message

Last commit date

README.md

Fake Quantization with QuaRot

This is code is developed based on QuaRot: Outlier-Free 4-Bit Inference in Rotated LLMs

Installation

We recommend installing the python envrionments with original codebase's requirements:

conda create -n quarot python=3.9
conda activate quarot
pip install -r requirements.txt

Additionally, to apply Hadamard transformation, build fast-hadamard-transform package from source.

Language Generation and Zero-Shot Evaluations

Currently, this code supports LLaMa-2 and LLaMA-3 models (We did not test OPT and LLaMA-1).
You can simply run the main.py to reproduce the results in the paper. The important arguments are:

--model: the model name (or path to the weights)
--bsz: the batch size for PPL evaluation
--rotate: whether we want to rotate the model
--lm_eval: whether we want to run LM-Eval for Zero-Shot tasks
--tasks: the tasks for LM-Eval
--cal_dataset: the calibration dataset for GPTQ quantization
--a_bits: the number of bits for activation quantization
--w_bits: the number of bits for weight quantization
--v_bits: the number of bits for value quantization
--k_bits: the number of bits for key quantization
--w_clip: Whether we want to clip the weights
--a_clip_ratio: The ratio of clipping for activation
--k_clip_ratio: The ratio of clipping for key
--v_clip_ratio: The ratio of clipping for value
--w_asym: Whether we want to use asymmetric quantization for weights
--a_asym: Whether we want to use asymmetric quantization for activation
--v_asym: Whether we want to use asymmetric quantization for value
--k_asym: Whether we want to use asymmetric quantization for key
--a_groupsize: The group size for activation quantization
--w_groupsize: The group size for weight quantization
--v_groupsize: The group size for value quantization
--k_groupsize: The group size for key quantization
--use_v2: Turn on GPTQv2 quantization (recommened)
--enable_aq_calibration: Activation quantization during calibration (recommened)

We provide a script run_llama.sh to reproduce the results.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Fake Quantization with QuaRot

Installation

Language Generation and Zero-Shot Evaluations

FilesExpand file tree

fake_quant

Directory actions

More options

Directory actions

More options

Latest commit

History

fake_quant

Folders and files

parent directory

README.md

Fake Quantization with QuaRot

Installation

Language Generation and Zero-Shot Evaluations