Hardened RAG pipeline with Llama 3.2 (3B) & Arize Phoenix. Features 4-bit Unsloth optimization, OpenTelemetry auditing, and a KV-cache stability patch for T4 GPUs. P99 Latency: 19.2s.
-
Updated
Mar 31, 2026 - Python
Hardened RAG pipeline with Llama 3.2 (3B) & Arize Phoenix. Features 4-bit Unsloth optimization, OpenTelemetry auditing, and a KV-cache stability patch for T4 GPUs. P99 Latency: 19.2s.
Viewed README.md:1-169 Viewed rag_failure_diagnostics_clinic.py:1-302 The "RAG Failure Diagnostics Clinic" is a framework-agnostic tool designed to triage and classify LLM + RAG pipeline failures into 12 reusable patterns. It analyzes bug descriptions to provide clear reasoning and suggest minimal structural fixes for resolving complex RAG incid
Add a description, image, and links to the rag-observability topic page so that developers can more easily learn about it.
To associate your repository with the rag-observability topic, visit your repo's landing page and select "manage topics."