scottski78

Scottski scottski78

Popular repositories Loading

vllm-turboquant-gb10 vllm-turboquant-gb10 Public

Build guide for vLLM 0.18.1 with TurboQuant KV cache compression on NVIDIA GB10 (Grace Blackwell) aarch64 / CUDA 13.0 / SM 12.1
gb10-nccl-switched-fabric gb10-nccl-switched-fabric Public

practical guide to multi-node NCCL over switched RoCE fabric on NVIDIA GB10 (DGX Spark class) — documenting the gaps in NVIDIA's official playbooks
the-forge the-forge Public

Multi-model orchestrated inference platform — LangGraph state machine routing queries across three GPU nodes over a 200Gbps RoCE fabric

Python
Local-RAG-Engine-Private-Document-Intelligence-with-Gemma-4 Local-RAG-Engine-Private-Document-Intelligence-with-Gemma-4 Public

A lightweight, high-performance Retrieval-Augmented Generation (RAG) pipeline designed to run entirely offline on macOS. This system allows users to perform conversational AI queries against a priv…

Python