Skip to content

Commit 2da64a2

Browse files
authored
models : fix backend assignment for Granite/Nemotron graphs (ggml-org#18599)
* models : fix backend assignment for Granite/Nemotron graphs * cont : add ref * cont : move call to build_inp_embd()
1 parent b37124d commit 2da64a2

File tree

1 file changed

+4
-0
lines changed

1 file changed

+4
-0
lines changed

src/llama-graph.cpp

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1326,6 +1326,10 @@ ggml_tensor * llm_graph_context::build_inp_embd(ggml_tensor * tok_embd) const {
13261326

13271327
res->add_input(std::move(inp));
13281328

1329+
// make sure the produced embeddings are immediately materialized in the ggml graph
1330+
// ref: https://github.com/ggml-org/llama.cpp/pull/18599
1331+
ggml_build_forward_expand(gf, cur);
1332+
13291333
return cur;
13301334
}
13311335

0 commit comments

Comments
 (0)