Commit 245ab4e
committed
Fixing linear_activation_tensor dynamic quant (#622)
Summary: dynamic quant was broken for generate due to no repr function
Test Plan: sh benchmarks.sh
20240806170037, tok/s= 9.54, mem/s= 63.14 GB/s, peak_mem= 8.61 GB, model_size= 6.62 GB quant: int8dq, mod: Llama-2-7b-chat-hf, kv_quant: False, compile: True, compile_prefill: False, dtype: torch.bfloat16, device: cuda repro: python generate.py --quantization int8dq --checkpoint_path ../../../checkpoints/meta-llama/Llama-2-7b-chat-hf/model.pth --device cuda --precision torch.bfloat16 --compile --num_samples 5 --max_new_tokens 200 --top_k 200 --temperature 0.8
Reviewers:
Subscribers:
Tasks:
Tags:1 parent febeaac commit 245ab4e
1 file changed
+3
-0
lines changedLines changed: 3 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
39 | 39 | | |
40 | 40 | | |
41 | 41 | | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
42 | 45 | | |
43 | 46 | | |
44 | 47 | | |
| |||
0 commit comments