llm-compression

Here are 20 public repositories matching this topic...

horseee / Awesome-Efficient-LLM

A curated list for Efficient Large Language Models

compression language-model knowledge-distillation model-quantization pruning-algorithms llm llm-compression efficient-llm

Updated Jun 17, 2025
Python

open-compress / claw-compactor

Star

14-stage Fusion Pipeline for LLM token compression — reversible compression, AST-aware code analysis, intelligent content routing. Zero LLM inference cost. MIT licensed.

Updated Apr 1, 2026
Python

Tencent / AngelSlim

Star

Model compression toolkit engineered for enhanced usability, comprehensiveness, and efficiency.

audio eagle quantization diffusion vlm llm qwen speculative-decoding llm-compression hunyuan deepseek fp4 dflash

Updated Apr 2, 2026
Python

pprp / Pruner-Zero

Star

[ICML24] Pruner-Zero: Evolving Symbolic Pruning Metric from scratch for LLMs

symbolic-regression llm-compression llm-pruning

Updated Nov 25, 2024
Python

lliai / D2MoE

Star

D^2-MoE: Delta Decompression for MoE-based LLMs Compression

efficient language-model llm pruning-sparsity llm-compression deepseek mixtral-of-experts

Updated Mar 25, 2025
Python

VITA-Group / llm-kick

Star

[ICLR 2024] Jaiswal, A., Gan, Z., Du, X., Zhang, B., Wang, Z., & Yang, Y. Compressing llms: The truth is rarely pure and never simple.

llm-inference llm-evaluation llm-compression llm-pruning

Updated Apr 21, 2025
Python

Picovoice / llm-compression-benchmark

Star

LLM Compression Benchmark

llm llm-inference llm-compression

Updated Dec 18, 2025
Python

bupt-ai-club / llm-compression-papers

Star

papers of llm compression

survey pruning quantization knowledge-distillation llm llm-compression llm-survey llm-compression-survey

Updated Mar 6, 2024

Picovoice / serverless-picollm

Star

LLM Inference on AWS Lambda

aws-lambda serverless llm serverless-inference llm-inference llm-compression

Updated Jun 3, 2024
Python

GongCheng1919 / bias-compensation

Star

[CAAI AIR'24] Minimize Quantization Output Error with Bias Compensation

post-training-quantization llm-compression output-error-optimization bias-compensation llm-quantization

Updated Mar 12, 2025
Python

psunlpgroup / QuantLRM

Star

QuantLRM: Quantization of Large Reasoning Models via Fine-Tuning Signals. Support Qwen, Olmo3, Llama, etc.

quantization post-training-quantization llm-compression

Updated Feb 11, 2026
Python

cbacary / MoDeGPT

Star

An implementation of the MoDeGPT LLM compression from the ICLR 2025 Conference paper: Modular Decomposition For Large Language Model Compression.

pruning llama lora matrix-decomposition llm llama2 llm-compression llama3 iclr-2025 weight-decomposition sparsity-allocation iclr-2025-oral

Updated Apr 3, 2026
Python

SolomonB14D3 / knowledge-fidelity

Star

Behavioral auditing & repair toolkit for LLMs. Measures 8 dimensions via confidence probes.

transformers pytorch svd interpretability confidence bias-detection truthfulness model-merging sycophancy llm-compression mergekit activation-engineering model-auditing steering-vectors rho-audit behavioral-evaluation

Updated Mar 26, 2026
Python

Ryuketsukami / turboquant-skill

Star

AI agent skill implementing Google's TurboQuant compression algorithm (ICLR 2026) — 6x KV cache memory reduction, 8x speedup, zero accuracy loss. Compatible with Claude Code, Codex CLI, and all Agent Skills-compatible tools.

Updated Mar 28, 2026
Python

Ryuketsukami / turboquant-compression

Star

Near-optimal vector quantization for LLM KV cache compression. Python implementation of TurboQuant (ICLR 2026) — PolarQuant + QJL for 3-bit quantization with minimal accuracy loss and up to 8x memory reduction.

Updated Mar 28, 2026
Python

psunlpgroup / Compression-Effects

Star

[ICLR2026] When Reasoning Meets Compression: Understanding the Effects of LLMs Compression on Large Reasoning Models. Support interpretation of Qwen, Llama, etc.

pruning quantization distillation awq llm mechanistic-interpretability gptq llm-compression

Updated Feb 3, 2026
Python

txsing / GoogleLLMCompress

Star

A standard PyTorch implementation of Google’s paper Language Modeling Is Compression—with no reliance on Haiku or JAX. Drawing on the original repository (https://github.com/google-deepmind/language_modeling_is_compression), this code is capable of reproducing the key results from the paper.

llm-compression

Updated Oct 17, 2025
Python

FardinHash / tokencal

Star

Token Price Estimation for LLMs

tokens cost-optimization cost-management token-count llm-compression llm-cost llm-token token-cost

Updated Jun 20, 2024
Python

meghanmane84 / LLM-Manifold-Based-Compression-Techniques

Star

Research code for LLM Compression using Functional Algorithms, exploring stratified manifold learning, clustering, and compression techniques. Experiments span synthetic datasets (Swiss Roll, Manifold Singularities) and real-world text embeddings (DBpedia-14). The goal is to preserve semantic structure while reducing model complexity.

clustering cuda embeddings pruning quantization manifold-learning topological-data-analysis hdbscan spectral-clustering stratification functional-algorithms llm-compression classical-clustering

Updated Sep 12, 2025
Jupyter Notebook

KeithLin724 / NYCU_Edge_AI_SGLang

Star

NYCU Edge AI Final Project Using SGLang

quantization llm vllm llm-compression sglang

Updated Jun 4, 2025
Python

Improve this page

Add a description, image, and links to the llm-compression topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the llm-compression topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llm-compression

Here are 20 public repositories matching this topic...

horseee / Awesome-Efficient-LLM

open-compress / claw-compactor

Tencent / AngelSlim

pprp / Pruner-Zero

lliai / D2MoE

VITA-Group / llm-kick

Picovoice / llm-compression-benchmark

bupt-ai-club / llm-compression-papers

Picovoice / serverless-picollm

GongCheng1919 / bias-compensation

psunlpgroup / QuantLRM

cbacary / MoDeGPT

SolomonB14D3 / knowledge-fidelity

Ryuketsukami / turboquant-skill

Ryuketsukami / turboquant-compression

psunlpgroup / Compression-Effects

txsing / GoogleLLMCompress

FardinHash / tokencal

meghanmane84 / LLM-Manifold-Based-Compression-Techniques

KeithLin724 / NYCU_Edge_AI_SGLang

Improve this page

Add this topic to your repo