The AI-Powered SRE Agent for Kubernetes
KubeSentinel is a custom Kubernetes Controller that transforms your cluster from "Self-Healing" to "Self-Diagnosing".
It automatically detects pod failures (CrashLoopBackOff, OOMKilled), fetches the relevant logs, and uses Large Language Models (LLMs) to instantly analyze the root cause and suggest fixes.
KubeSentinel runs as a native Kubernetes deployment and interacts with the API Server to watch for state changes.
graph TD
subgraph EKS_Cluster
A[KubeSentinel Agent]
B(K8s API Server)
C[Failing Pod]
end
D[LLM Provider]
C -- "1. Crashes (OOM/Panic)" --> B
B -- "2. Event Update" --> A
A -- "3. Fetch Logs & Describe" --> B
A -- "4. Analyze Context" --> D
D -- "5. Root Cause & Fix" --> A
A -- "6. Publish K8s Event" --> B
- Real-time Detection: Instantly catches
CrashLoopBackOff,ImagePullBackOff, andOOMKilled. - Smart Analysis: Uses GenAI to parse stack traces and error codes (no more grepping regex).
- Native Integration: Reports findings directly as Kubernetes Events (
kubectl get events). - Production Ready: Built with Terraform (AWS EKS) and packaged with Helm.
- Core: Golang, Client-go, Controller Pattern
- Infrastructure: Terraform, AWS (EKS, VPC)
- Packaging: Docker (Distroless), Helm Charts
- AI Integration: Pluggable interface for OpenAI/Gemini
- Kubernetes 1.24+
- Helm 3.0+
helm install kubesentinel ./deploy/charts/kubesentinel \
--set serviceAccount.create=trueTrigger a crash in a test pod, then run:
kubectl get events --sort-by='.lastTimestamp'You will see a new event from KubeSentinel with the AI analylsis.
- Integration with Slack / PagerDuty
- Auto-Remediation (Apply fixes automatically)
- FinOps Module (Detect wasted resources)
Built with ❤️ by Shishir Shetty