Thank you for your excellent work!
However, we are having trouble reproducing the results. We used the meta-llama/Llama-2-13b-chat-hf model and ran the original eval_llama.sh script for reproduction. However, we observed that the speed-up ratio is only 1.05, and the acceptance ratio is 0.69. Could there be an issue with our settings?
Any guidance on this would be greatly appreciated!