-
Notifications
You must be signed in to change notification settings - Fork 59
Description
系统环境:
(Orion) PS D:\Huggin face\Orion-14B-App-Demo-CN\demo> nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Wed_Feb__8_05:53:42_Coordinated_Universal_Time_2023
Cuda compilation tools, release 12.1, V12.1.66
Build cuda_12.1.r12.1/compiler.32415258_0
demo.py
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("\Huggin face\Orion-14B-Chat-Int4", trust_remote_code=True,use_safetensors=True)
model = AutoModelForCausalLM.from_pretrained("\Huggin face\Orion-14B-Chat-Int4", torch_dtype=torch.bfloat16,device_map="auto", trust_remote_code=True,use_safetensors=True)
messages = [{"role": "user", "content": "hi,who are you?"}]
response = model.chat(tokenizer, messages, streaming=False)
print(response)
(Orion) PS D:\Huggin face\Orion-14B-App-Demo-CN\demo> python demo.py
bin D:\Users\Administrator\anaconda3\envs\Orion\Lib\site-packages\bitsandbytes\libbitsandbytes_cuda121.dll
鲯榅鲯鲯榅 mathemat鲯鲯榅榅鲯鲯鲯鲯榅鲯榅鲯鲯鲯鲯鲯鲯鲯鲯鲯鲯鲯鲯鲯鲯鲯鲯鲯鲯鲯鲯鲯鲯鲯鲯鲯鲯鲯鲯榅鲯榅榅榅鲯鲯鲯鲯鲯鲯榅鲯鲯鲯鲯鲯鲯鲯鲯鲯鲯鲯鲯鲯鲯鲯鲯鲯鲯榅鲯鲯鲯鲯鲯鲯鲯鲯鲯鲯鲯鲯鲯鲯鲯榅鲯鲯榅鲯鲯榅鲯榅鲯榅鲯鲯榅榅榅榅鲯榅鲯鲯鲯榅鲯鲯鲯鲯鲯榅鲯鲯鲯鲯鲯鲯榅鲯鲯鲯鲯榅鲯鲯鲯鲯榅鲯鲯榅鲯鲯鲯鲯鲯鲯鲯鲯榅鲯鲯鲯榅鲯鲯鲯鲯鲯鲯鲯鲯鲯鲯榅榅鲯鲯鲯鲯鲯榅鲯鲯鲯榅 鲯鲯鲯鲯鲯鲯鲯榅鲯鲯鲯鲯鲯鲯鲯鲯鲯鲯榅鲯鲯鲯鲯鲯榅鲯榅鲯鲯鲯鲯鲯鲯鲯鲯鲯鲯榅鲯鲯鲯榅鲯鲯榅鲯鲯鲯鲯鲯榅鲯榅榅鲯鲯榅鲯鲯鲯鲯榅鲯鲯鲯鲯鲯鲯鲯榅鲯鲯鲯榅榅鲯鲯鲯鲯鲯鲯鲯鲯鲯鲯鲯鲯榅鲯
任何提问,模型回答都是这种胡言乱语。
另外,从hugginface直接下的OrionStarAI/Orion-14B-Chat-Int4 模型文件(safetensors格式)的,需要手工执行quant.py脚本量化后才能正常推理吗?请高人解答下,谢谢!