Model: ggml-model-f32-q4_0.bin
with CPU backend
even just say hello, current package response with in plenty of minutes.
but if use LLamaSharp it will response immediately
just record this situation for your reference, i will dig into it if have time, maybe it because of memory limit of webapi, and console application doesn‘t have this constraint.
sample code in LLamaSharp demo
foreach (var text in session.Chat(prompt, new InferenceParams() { Temperature = 0.6f, AntiPrompts = new List<string> { "User:" } }))
{
Console.Write(text);
}
the screen shot of slow reponse when using postman interact with webapi sample code。
