arm-aarch64

Here is 1 public repository matching this topic...

atcuality2021 / manthanquant

3-bit Lloyd-Max KV Cache Compression for LLM Inference on NVIDIA DGX Spark GB10 — 5.12x compression, 0.983 cosine similarity, pure numpy on ARM unified memory

compression numpy transformers quantization lloyd-max kv-cache unified-memory vllm llm-inference vibe-coding claude-code gb10 nvidia-dgx-spark arm-aarch64

Updated Apr 3, 2026
Python

Improve this page

Add a description, image, and links to the arm-aarch64 topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the arm-aarch64 topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

arm-aarch64

Here is 1 public repository matching this topic...

atcuality2021 / manthanquant

Improve this page

Add this topic to your repo