I have successfully quantized the facebook/opt-125m model using the opt.py script with the following command:
CUDA_VISIBLE_DEVICES=0 python opt.py facebook/opt-125m c4 --wbits 4 --quant ldlq --incoh_processing --save quantized_model
This command generates a quantized model named quantized_model. My question is, should I replace the original weights from https://huggingface.co/facebook/opt-125m/tree/main with the weights from quantized_model to run the 2-bit model on inference?