Skip to content
#

llm-compression

Here are 20 public repositories matching this topic...

14-stage Fusion Pipeline for LLM token compression — reversible compression, AST-aware code analysis, intelligent content routing. Zero LLM inference cost. MIT licensed.

  • Updated Apr 1, 2026
  • Python

AI agent skill implementing Google's TurboQuant compression algorithm (ICLR 2026) — 6x KV cache memory reduction, 8x speedup, zero accuracy loss. Compatible with Claude Code, Codex CLI, and all Agent Skills-compatible tools.

  • Updated Mar 28, 2026
  • Python

Near-optimal vector quantization for LLM KV cache compression. Python implementation of TurboQuant (ICLR 2026) — PolarQuant + QJL for 3-bit quantization with minimal accuracy loss and up to 8x memory reduction.

  • Updated Mar 28, 2026
  • Python

Research code for LLM Compression using Functional Algorithms, exploring stratified manifold learning, clustering, and compression techniques. Experiments span synthetic datasets (Swiss Roll, Manifold Singularities) and real-world text embeddings (DBpedia-14). The goal is to preserve semantic structure while reducing model complexity.

  • Updated Sep 12, 2025
  • Jupyter Notebook

Improve this page

Add a description, image, and links to the llm-compression topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the llm-compression topic, visit your repo's landing page and select "manage topics."

Learn more