Skip to content

Add KV Cache Manager to track block metrics #415

@maschad

Description

@maschad

Building off #410 we would like to track:

  • Accelerator (e.g. GPU) total KV cache block allocation for the node's model(s);

  • Accelerator average number of used blocks over the last time period;

Given nvlm doesn't have direct access for this, we have to tracks KV cache block usage over time on our own, this could be stored as a prom metric or even in our own db.

Metadata

Metadata

Assignees

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions