#
llm-infernece
Here are 3 public repositories matching this topic...
eLLM Infers LLM on CPUs in Real Time
-
Updated
Nov 3, 2025 - Rust
Layered prefill changes the scheduling axis from tokens to layers and removes redundant MoE weight reloads while keeping decode stall free. The result is lower TTFT, lower end-to-end latency, and lower energy per token without hurting TBT stability.
-
Updated
Oct 27, 2025 - Python
Improve this page
Add a description, image, and links to the llm-infernece topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the llm-infernece topic, visit your repo's landing page and select "manage topics."