Skip to content

During Request Chat GPU Memory Usage Sharply Increased #40

@Janus-Xu

Description

@Janus-Xu

Model

Orion-14B-Chat-Int4

Description

When the conversation started, the original 9G GPU memory usage, increased to 13G,
Test 4 concurrent sessions, the Mem growth to 22G has not stopped signs, only when the session is completely over a period of time, the Mem usage will be released.
It's easy to trigger a Crash.

Question

  1. Is there a way to prevent the rapid linear growth of GPU memory usage?
  2. Is this caused by enabling cache policy? Whether Mem can be used instead of GPU Mem?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions