atcuality2021

Harish Kumar atcuality2021

Popular repositories Loading

vllm-gb10-gemma4 vllm-gb10-gemma4 Public

Complete vLLM + Gemma 4 for NVIDIA DGX Spark GB10 — one command install with benchmarks

Shell 1 1
manthanquant manthanquant Public

3-bit Lloyd-Max KV Cache Compression for LLM Inference on NVIDIA DGX Spark GB10 — 5.12x compression, 0.983 cosine similarity, pure numpy on ARM unified memory

Python 1
vllm-gb10 vllm-gb10 Public

Custom native vLLM for NVIDIA DGX Spark GB10 (ARM aarch64, Blackwell sm_121)

Shell 1
vllm-gemma4-patch vllm-gemma4-patch Public

Gemma 4 support patch for vLLM 0.18.x — backports PR #38826

Shell 1
manthanquant-x86 manthanquant-x86 Public

TurboQuant KV Cache Compression for vLLM on x86 GPUs — 5.12x compression, 0.983 cosine similarity (BiltIQ AI)

Python