too slow with semantic integration  than directly interact with LLamaSharp

Model: ggml-model-f32-q4_0.bin

with CPU backend

even just say hello,   current package response with in plenty of minutes.
but if use LLamaSharp  it will response immediately

just record this situation for your reference,  i will dig into it if have time, maybe it because of memory limit of webapi,  and console application doesn‘t have this constraint.

sample code in LLamaSharp demo
```csharp
foreach (var text in session.Chat(prompt, new InferenceParams() { Temperature = 0.6f, AntiPrompts = new List<string> { "User:" } }))
    {
        Console.Write(text);
    }
```

the screen shot of  slow reponse  when using postman interact with webapi sample code。

![image](https://github.com/xbotter/semantic-kernel-LLamaSharp/assets/48040144/a13990d5-3ad1-4f0f-bc1f-705cd0b4f25c)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

too slow with semantic integration than directly interact with LLamaSharp #5

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

too slow with semantic integration than directly interact with LLamaSharp #5

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions