Model file size (~1.44 GB) larger than expected after W4A16 quantization (meta-llama/Llama-3.2-1B)

<html>
<body>
<html><head></head><body>
Hello,
I’m using LLM Compressor (v0.x) to quantize the model <code inline="">meta-llama/Llama-3.2-1B</code> with the following recipe:
<pre><code class="language-python">AWQModifier(
 ignore=["lm_head"],
 scheme="W4A16",
 targets=["Linear"],
)
</code></pre>
After running:
<pre><code class="language-python">oneshot(...)
</code></pre>
and saving with:
<pre><code class="language-python">model.save_pretrained(
 SAVE_DIR,
 safe_serialization=True,
 save_compressed=True,
 state_dict=state_dict,
)
</code></pre>
I obtained two output files:
<ul>
<li>
<code inline="">model.safetensors</code>
</li>
<li>
<code inline="">pytorch_model.bin</code>
</li>
</ul>
Both files are roughly ~1.44 GB in size.
However, when I run AutoAWQ on the same model using the W4A16 quantization scheme, the resulting file is only about 1 000 MB (~1 GB).
<hr>
<h3>Main question</h3>
Is this file size difference (~1.44 GB with LLM Compressor vs. ~1 GB with AutoAWQ) expected for W4A16 quantization?
Or could it indicate that:
<ul>
<li>
the weights were not fully packed into 4-bit format,
</li>
<li>
additional metadata or full-precision tensors were saved alongside compressed weights,
</li>
<li>
or that <code inline="">save_pretrained</code> reverted to storing redundant copies (e.g., both FP and quantized states)?
</li>
</ul>
<h3>Additional notes</h3>
The quantized model reloads and generates correctly, but I want to confirm whether the ~1.44 GB file size is normal for a 1B-parameter model at W4A16, or if it suggests that the compression is only partial (e.g., still storing FP16 weights somewhere).
Thanks a lot for your time and for maintaining this great project! 🙏
<hr>
</body></html>
</body>
</html>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Model file size (~1.44 GB) larger than expected after W4A16 quantization (meta-llama/Llama-3.2-1B) #1969

Main question

Additional notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Model file size (~1.44 GB) larger than expected after W4A16 quantization (meta-llama/Llama-3.2-1B) #1969

Description

Main question

Additional notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions