Skip to content

too slow with semantic integration than directly interact with LLamaSharp #5

@bc4250

Description

@bc4250

Model: ggml-model-f32-q4_0.bin

with CPU backend

even just say hello, current package response with in plenty of minutes.
but if use LLamaSharp it will response immediately

just record this situation for your reference, i will dig into it if have time, maybe it because of memory limit of webapi, and console application doesn‘t have this constraint.

sample code in LLamaSharp demo

foreach (var text in session.Chat(prompt, new InferenceParams() { Temperature = 0.6f, AntiPrompts = new List<string> { "User:" } }))
    {
        Console.Write(text);
    }

the screen shot of slow reponse when using postman interact with webapi sample code。

image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions