llm-infernece

Star

Here are 3 public repositories matching this topic...

pandada8 / llm-inference-benchmark

Star

LLM 推理服务性能测试

llm-infernece

Updated Dec 17, 2023
Jupyter Notebook

lucienhuangfu / eLLM

Star

eLLM Infers LLM on CPUs in Real Time

llama cpu-inference deep-thinking llm-infernece deep-research context-engineering rust-llm

Updated Nov 3, 2025
Rust

Layered prefill changes the scheduling axis from tokens to layers and removes redundant MoE weight reloads while keeping decode stall free. The result is lower TTFT, lower end-to-end latency, and lower energy per token without hurting TBT stability.

inference moe llm llm-serving vllm llm-infernece

Updated Oct 27, 2025
Python

Improve this page

Add a description, image, and links to the llm-infernece topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the llm-infernece topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llm-infernece

Here are 3 public repositories matching this topic...

pandada8 / llm-inference-benchmark

lucienhuangfu / eLLM

scale-snu / layered-prefill

Improve this page

Add this topic to your repo