Skip to content
This repository was archived by the owner on Aug 30, 2024. It is now read-only.
This repository was archived by the owner on Aug 30, 2024. It is now read-only.

Once upon a time, a little NE_ASSERT: /root/w0/workspace/neuralspeed-wheel-build/nlp_repo/neural_speed/core/ne_layers.c:2651: ne_nelements(a) == ne0 * ne1 * ne2 #326

@zwx109473

Description

@zwx109473

An error is reported during Qwen7B model inference when neural-speed 1.0 and intel-extension-for-transformers 1.4.2 are used.

model_quantize_internal: model size = 29454.52 MB
model_quantize_internal: quant size = 5006.17 MB
/root/miniconda3/envs/zl_cpu/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:1601: FutureWarning: clean_up_tokenization_spaces was not set. It will be set to True by default. This behavior will be depracted in transformers v4.45, and will be then set to False by default. For more details check this issue: huggingface/transformers#31884
warnings.warn(
AVX:1 AVX2:1 AVX512F:1 AVX_VNNI:0 AVX512_VNNI:1 AMX_INT8:0 AMX_BF16:0 AVX512_BF16:0 AVX512_FP16:0
beam_size: 1, do_sample: 0, top_k: 40, top_p: 0.950, continuous_batching: 0, max_request_num: 1, early_stopping: 0, scratch_size_ratio: 1.000
model.cpp: loading model from runtime_outs/ne_qwen_q_int4_bestla_cint8_g32.bin
Loading the bin file with NE format...
load_ne_hparams 0.hparams.n_vocab = 151936
load_ne_hparams 1.hparams.n_embd = 4096
load_ne_hparams 2.hparams.n_mult = 22016
load_ne_hparams 3.hparams.n_head = 32
load_ne_hparams 4.hparams.n_head_kv = 0
load_ne_hparams 5.hparams.n_layer = 32
load_ne_hparams 6.hparams.n_rot = 128
load_ne_hparams 7.hparams.ftype = 0
load_ne_hparams 8.hparams.max_seq_len = 32768
load_ne_hparams 9.hparams.alibi_bias_max = 0.000
load_ne_hparams 10.hparams.clip_qkv = 0.000
load_ne_hparams 11.hparams.par_res = 0
load_ne_hparams 12.hparams.word_embed_proj_dim = 0
load_ne_hparams 13.hparams.do_layer_norm_before = 0
load_ne_hparams 14.hparams.multi_query_group_num = 0
load_ne_hparams 15.hparams.ffn_hidden_size = 11008
load_ne_hparams 16.hparams.inner_hidden_size = 0
load_ne_hparams 17.hparams.n_experts = 0
load_ne_hparams 18.hparams.n_experts_used = 0
load_ne_hparams 19.hparams.n_embd_head_k = 0
load_ne_hparams 20.hparams.norm_eps = 0.000001
load_ne_hparams 21.hparams.freq_base = 10000.000
load_ne_hparams 22.hparams.freq_scale = 1.000
load_ne_hparams 23.hparams.rope_scaling_factor = 0.000
load_ne_hparams 24.hparams.original_max_position_embeddings = 0
load_ne_hparams 25.hparams.use_yarn = 0
load_ne_vocab 26.vocab.bos_token_id = 151643
load_ne_vocab 27.vocab.eos_token_id = 151643
load_ne_vocab 28.vocab.pad_token_id = -1
load_ne_vocab 29.vocab.sep_token_id = -1
init: n_vocab = 151936
init: n_embd = 4096
init: n_mult = 22016
init: n_head = 32
init: n_head_kv = 0
init: n_layer = 32
init: n_rot = 128
init: ftype = 0
init: max_seq_len= 32768
init: n_ff = 11008
init: n_parts = 1
load: ctx size = 5006.31 MB
load: scratch0 = 4096.00 MB
load: scratch1 = 2048.00 MB
load: scratch2 = 4096.00 MB
load: mem required = 15246.31 MB (+ memory per state)
.......................................................................................
model_init_from_file: support_bestla_kv = 0
model_init_from_file: kv self size = 256.00 MB
Once upon a time, a little NE_ASSERT: /root/w0/workspace/neuralspeed-wheel-build/nlp_repo/neural_speed/core/ne_layers.c:2651: ne_nelements(a) == ne0 * ne1 * ne2
Aborted (core dumped)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions