-
Notifications
You must be signed in to change notification settings - Fork 40
Open
Description
Description
Track Sentinel and cluster failover events and correlate them with slowlog/COMMANDLOG spikes for post-incident analysis.
Problem
During failovers, latency spikes and command failures are common but the cause isn't always obvious from metrics alone. Being able to see "a failover happened at 03:12, and here's the slowlog spike that preceded/followed it" would make post-incident analysis significantly easier.
Proposed Scope
- Detect failover events by monitoring
INFO replicationrole changes andCLUSTER INFOstate transitions - Persist failover events with timestamps, old/new primary, and trigger reason where available
- Correlate failover timestamps with existing slowlog and anomaly detection data
- Surface in the UI timeline alongside existing anomaly events
- Add
failover.startedandfailover.completedwebhook event types
Prior Art / Context
Requested by community — correlating failovers with slowlog spikes is a common post-incident debugging need for teams running Sentinel or Cluster topologies.
Related
- Existing cluster topology visualization
- Existing per-slot heatmaps and migration tracking
- Anomaly detection correlator (could add a
FAILOVERpattern)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels