How to use quantized model on inference

I have successfully quantized the facebook/opt-125m model using the opt.py script with the following command:

`CUDA_VISIBLE_DEVICES=0 python opt.py facebook/opt-125m c4 --wbits 4 --quant ldlq --incoh_processing --save quantized_model`

This command generates a quantized model named quantized_model. My question is, should I replace the original weights from https://huggingface.co/facebook/opt-125m/tree/main with the weights from quantized_model to run the 2-bit model on inference?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to use quantized model on inference #4

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

How to use quantized model on inference #4

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions