Skip to content

Commit 6f6093a

Browse files
unamedkrclaude
andcommitted
README: add v0.5.0 release badge, documentation section, fix LOC counts
Both EN and KO READMEs updated: - Release badge linking to v0.5.0 - WASM demo badge linking to GitHub Pages - Score badge updated to 99.2% - LOC count: 55K → 67K (reflects Gemma 4 + Metal additions) - Context diagram: 55K → 350K (measured with KV compression) - Documentation section: API ref, custom quant guide, roadmap, changelog, tech report, WASM demo — all linked - FAQ: updated LOC reference Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 797cf20 commit 6f6093a

File tree

2 files changed

+40
-12
lines changed

2 files changed

+40
-12
lines changed

README.ko.md

Lines changed: 20 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -6,15 +6,16 @@
66

77
<p align="center">
88
무손실 KV 캐시 압축. <a href="#-단일-헤더-모드"><b>quant.h</b></a> 단일 헤더 라이브러리로도 제공됩니다.<br>
9-
55K LOC. 임베딩 가능. 오후 한나절이면 전체 코드를 읽을 수 있습니다.
9+
67K LOC. 임베딩 가능. 오후 한나절이면 전체 코드를 읽을 수 있습니다.
1010
</p>
1111

1212
<p align="center">
13+
<a href="https://github.com/quantumaikr/quant.cpp/releases/tag/v0.5.0"><img src="https://img.shields.io/badge/release-v0.5.0-blue" alt="Release"></a>
1314
<a href="#"><img src="https://img.shields.io/badge/license-Apache%202.0-blue" alt="License"></a>
1415
<a href="#"><img src="https://img.shields.io/badge/tests-34%20pass-brightgreen" alt="Tests"></a>
15-
<a href="#"><img src="https://img.shields.io/badge/score-99.7%25-brightgreen" alt="Score"></a>
16+
<a href="#"><img src="https://img.shields.io/badge/score-99.2%25-brightgreen" alt="Score"></a>
1617
<a href="#"><img src="https://img.shields.io/badge/models-7%20verified-blue" alt="Models"></a>
17-
<a href="#"><img src="https://img.shields.io/badge/WASM-192KB-purple" alt="WASM"></a>
18+
<a href="https://quantumaikr.github.io/quant.cpp/"><img src="https://img.shields.io/badge/WASM_데모-192KB-purple" alt="WASM"></a>
1819
<a href="#"><img src="https://img.shields.io/badge/platforms-macOS%20%7C%20Linux%20%7C%20Windows%20%7C%20WASM-orange" alt="Platforms"></a>
1920
</p>
2021

@@ -31,7 +32,7 @@ LLM 메모리의 병목은 모델 가중치가 아니라 **KV 캐시**입니다.
3132
│ 모델 (4GB) │ KV 캐시 (FP16) │
3233
│ │ ████████████████████████ 8K 컨텍스트 ← OOM │
3334
├──────────────┼──────────────────────────────────────────────┤
34-
│ 모델 (4GB) │ KV (4-bit) ███ →→→→→ 55K 컨텍스트 │
35+
│ 모델 (4GB) │ KV (4-bit) ███ →→→→→ 350K 컨텍스트 │
3536
│ │ ↑ 6.9배 작음 │
3637
└──────────────┴──────────────────────────────────────────────┘
3738
```
@@ -76,7 +77,7 @@ LLM 메모리의 병목은 모델 가중치가 아니라 **KV 캐시**입니다.
7677
| | quant.cpp | llama.cpp | vLLM | MLX | ONNX RT |
7778
|:--|:---------:|:---------:|:----:|:---:|:-------:|
7879
| KV 압축 | **7x, +0% PPL** | +10.6% PPL | -- | -- | -- |
79-
| 코드 크기 | **55K LOC** | 250K+ | 100K+ | 50K+ | 500K+ |
80+
| 코드 크기 | **67K LOC** | 250K+ | 100K+ | 50K+ | 500K+ |
8081
| 의존성 | **제로** | ggml | PyTorch | Apple fw | 런타임 |
8182
| 임베더블 | **단일 헤더** | -- | -- | -- | 복잡 |
8283
| WASM | **192KB** | -- | -- | -- | -- |
@@ -300,7 +301,7 @@ curl http://localhost:8080/v1/chat/completions \
300301
<details>
301302
<summary><b>llama.cpp와 뭐가 다른가요?</b></summary>
302303

303-
llama.cpp는 전체 기능을 갖춘 추론 프레임워크 (250K+ LOC). quant.cpp는 읽고, 수정하고, 임베딩할 수 있는 미니멀 엔진 (55K LOC). 다른 문제를 위한 다른 도구입니다: llama.cpp는 속도를, quant.cpp는 메모리(KV 압축)와 임베더빌리티(단일 헤더)를 최적화합니다.
304+
llama.cpp는 전체 기능을 갖춘 추론 프레임워크 (250K+ LOC). quant.cpp는 읽고, 수정하고, 임베딩할 수 있는 미니멀 엔진 (67K LOC). 다른 문제를 위한 다른 도구입니다: llama.cpp는 속도를, quant.cpp는 메모리(KV 압축)와 임베더빌리티(단일 헤더)를 최적화합니다.
304305

305306
</details>
306307

@@ -352,6 +353,19 @@ Linux, macOS, Windows (MSVC/MinGW), iOS, Android, WASM에서 동작합니다.
352353

353354
---
354355

356+
## 문서
357+
358+
| 문서 | 설명 |
359+
|:-----|:-----|
360+
| **[API 레퍼런스](docs/api.md)** | quant.h + libturboquant 전체 C API (730줄) |
361+
| **[커스텀 양자화 가이드](docs/custom-quantization.md)** | 함수 3개로 새 KV 양자화 타입 추가 |
362+
| **[로드맵](ROADMAP.md)** | 프로젝트 방향과 계획 |
363+
| **[변경 이력](CHANGELOG.md)** | 버전별 릴리스 노트 |
364+
| **[기술 리포트](docs/papers/quant_cpp_tech_report.md)** | 아키텍처와 벤치마크 (Arxiv 초안) |
365+
| **[WASM 데모](https://quantumaikr.github.io/quant.cpp/)** | 브라우저에서 바로 체험 — 설치 불필요 |
366+
367+
---
368+
355369
## 참고 논문
356370

357371
- [TurboQuant](https://arxiv.org/abs/2504.19874) (ICLR 2026) — KV 캐시 압축 이론

README.md

Lines changed: 20 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -6,15 +6,16 @@
66

77
<p align="center">
88
Lossless KV cache compression. Also ships as <a href="#-single-header-mode"><b>quant.h</b></a> — a single-header library.<br>
9-
55K LOC. Embeddable. Read it in an afternoon.
9+
67K LOC. Embeddable. Read it in an afternoon.
1010
</p>
1111

1212
<p align="center">
13+
<a href="https://github.com/quantumaikr/quant.cpp/releases/tag/v0.5.0"><img src="https://img.shields.io/badge/release-v0.5.0-blue" alt="Release"></a>
1314
<a href="#"><img src="https://img.shields.io/badge/license-Apache%202.0-blue" alt="License"></a>
1415
<a href="#"><img src="https://img.shields.io/badge/tests-34%20pass-brightgreen" alt="Tests"></a>
15-
<a href="#"><img src="https://img.shields.io/badge/score-99.7%25-brightgreen" alt="Score"></a>
16+
<a href="#"><img src="https://img.shields.io/badge/score-99.2%25-brightgreen" alt="Score"></a>
1617
<a href="#"><img src="https://img.shields.io/badge/models-7%20verified-blue" alt="Models"></a>
17-
<a href="#"><img src="https://img.shields.io/badge/WASM-192KB-purple" alt="WASM"></a>
18+
<a href="https://quantumaikr.github.io/quant.cpp/"><img src="https://img.shields.io/badge/WASM_demo-192KB-purple" alt="WASM"></a>
1819
<a href="#"><img src="https://img.shields.io/badge/platforms-macOS%20%7C%20Linux%20%7C%20Windows%20%7C%20WASM-orange" alt="Platforms"></a>
1920
</p>
2021

@@ -31,7 +32,7 @@ LLM memory is dominated by the **KV cache**, not model weights. At 32K context,
3132
│ Model (4GB) │ KV Cache (FP16) │
3233
│ │ ████████████████████████ 8K context ← OOM │
3334
├──────────────┼──────────────────────────────────────────────┤
34-
│ Model (4GB) │ KV (4-bit) ███ →→→→→ 55K context │
35+
│ Model (4GB) │ KV (4-bit) ███ →→→→→ 350K context │
3536
│ │ ↑ 6.9x smaller │
3637
└──────────────┴──────────────────────────────────────────────┘
3738
```
@@ -76,7 +77,7 @@ LLM memory is dominated by the **KV cache**, not model weights. At 32K context,
7677
| | quant.cpp | llama.cpp | vLLM | MLX | ONNX RT |
7778
|:--|:---------:|:---------:|:----:|:---:|:-------:|
7879
| KV compression | **7x, +0% PPL** | +10.6% PPL | -- | -- | -- |
79-
| Code size | **55K LOC** | 250K+ | 100K+ | 50K+ | 500K+ |
80+
| Code size | **67K LOC** | 250K+ | 100K+ | 50K+ | 500K+ |
8081
| Dependencies | **zero** | ggml | PyTorch | Apple fw | runtime |
8182
| Embeddable | **single header** | -- | -- | -- | complex |
8283
| WASM | **192KB** | -- | -- | -- | -- |
@@ -300,7 +301,7 @@ Build with `-DTQ_BUILD_SERVER=ON`. Streaming SSE supported. KV compression confi
300301
<details>
301302
<summary><b>How is this different from llama.cpp?</b></summary>
302303

303-
llama.cpp is a full-featured inference framework (250K+ LOC). quant.cpp is a minimal engine (55K LOC) you can read, modify, and embed. Different tools for different problems: llama.cpp optimizes speed, quant.cpp optimizes memory (KV compression) and embeddability (single header).
304+
llama.cpp is a full-featured inference framework (250K+ LOC). quant.cpp is a minimal engine (67K LOC) you can read, modify, and embed. Different tools for different problems: llama.cpp optimizes speed, quant.cpp optimizes memory (KV compression) and embeddability (single header).
304305

305306
</details>
306307

@@ -352,6 +353,19 @@ Tested extensively (2-bit delta, NF2, online SVD, multi-hash). None reached acce
352353

353354
---
354355

356+
## Documentation
357+
358+
| Document | Description |
359+
|:---------|:------------|
360+
| **[API Reference](docs/api.md)** | Full C API for quant.h and libturboquant (730 lines) |
361+
| **[Custom Quantization](docs/custom-quantization.md)** | Add your own KV type in 3 functions |
362+
| **[ROADMAP](ROADMAP.md)** | Project direction and planned features |
363+
| **[CHANGELOG](CHANGELOG.md)** | Version history and release notes |
364+
| **[Tech Report](docs/papers/quant_cpp_tech_report.md)** | Architecture and benchmarks (Arxiv draft) |
365+
| **[WASM Demo](https://quantumaikr.github.io/quant.cpp/)** | Try it in your browser — no install needed |
366+
367+
---
368+
355369
## References
356370

357371
- [TurboQuant](https://arxiv.org/abs/2504.19874) (ICLR 2026) — KV cache compression theory

0 commit comments

Comments
 (0)