NUMA-aware GPU provisioning and orchestration for stateless MoE workloads of all sizes - *Claude Code native*
kubernetes terraform moe numa ray multi-cloud gitops mlops mixture-of-experts huggingface-spaces runpod vllm ollama litellm sglang claude-code qwen3 cost-optimization-cloud-devops glm-5 dissagregated-inference
-
Updated
Mar 1, 2026 - Python