Skip to content

Orion-14B-Chat-Int4 私有化部署问题,求解答 #38

@mailtobshen

Description

@mailtobshen

系统环境:
(Orion) PS D:\Huggin face\Orion-14B-App-Demo-CN\demo> nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Wed_Feb__8_05:53:42_Coordinated_Universal_Time_2023
Cuda compilation tools, release 12.1, V12.1.66
Build cuda_12.1.r12.1/compiler.32415258_0

demo.py

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("\Huggin face\Orion-14B-Chat-Int4", trust_remote_code=True,use_safetensors=True)
model = AutoModelForCausalLM.from_pretrained("\Huggin face\Orion-14B-Chat-Int4", torch_dtype=torch.bfloat16,device_map="auto", trust_remote_code=True,use_safetensors=True)

messages = [{"role": "user", "content": "hi,who are you?"}]
response = model.chat(tokenizer, messages, streaming=False)
print(response)

(Orion) PS D:\Huggin face\Orion-14B-App-Demo-CN\demo> python demo.py
bin D:\Users\Administrator\anaconda3\envs\Orion\Lib\site-packages\bitsandbytes\libbitsandbytes_cuda121.dll
鲯榅鲯鲯榅 mathemat鲯鲯榅榅鲯鲯鲯鲯榅鲯榅鲯鲯鲯鲯鲯鲯鲯鲯鲯鲯鲯鲯鲯鲯鲯鲯鲯鲯鲯鲯鲯鲯鲯鲯鲯鲯鲯鲯榅鲯榅榅榅鲯鲯鲯鲯鲯鲯榅鲯鲯鲯鲯鲯鲯鲯鲯鲯鲯鲯鲯鲯鲯鲯鲯鲯鲯榅鲯鲯鲯鲯鲯鲯鲯鲯鲯鲯鲯鲯鲯鲯鲯榅鲯鲯榅鲯鲯榅鲯榅鲯榅鲯鲯榅榅榅榅鲯榅鲯鲯鲯榅鲯鲯鲯鲯鲯榅鲯鲯鲯鲯鲯鲯榅鲯鲯鲯鲯榅鲯鲯鲯鲯榅鲯鲯榅鲯鲯鲯鲯鲯鲯鲯鲯榅鲯鲯鲯榅鲯鲯鲯鲯鲯鲯鲯鲯鲯鲯榅榅鲯鲯鲯鲯鲯榅鲯鲯鲯榅 鲯鲯鲯鲯鲯鲯鲯榅鲯鲯鲯鲯鲯鲯鲯鲯鲯鲯榅鲯鲯鲯鲯鲯榅鲯榅鲯鲯鲯鲯鲯鲯鲯鲯鲯鲯榅鲯鲯鲯榅鲯鲯榅鲯鲯鲯鲯鲯榅鲯榅榅鲯鲯榅鲯鲯鲯鲯榅鲯鲯鲯鲯鲯鲯鲯榅鲯鲯鲯榅榅鲯鲯鲯鲯鲯鲯鲯鲯鲯鲯鲯鲯榅鲯

任何提问,模型回答都是这种胡言乱语。

另外,从hugginface直接下的OrionStarAI/Orion-14B-Chat-Int4 模型文件(safetensors格式)的,需要手工执行quant.py脚本量化后才能正常推理吗?请高人解答下,谢谢!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions