I build backend and distributed systems that stay correct under failure and make failures easier to diagnose.
Built production backend systems at Thales Group, contributed merged fixes to the Temporal Go SDK, and built systems with proof like 0 duplicate commits across 1,500 race reproductions and probe-healthy / system-unsafe detection under failure.
If you're hiring for backend, infrastructure, reliability, or production engineering roles, start here:
- Faultline — crash-safe job execution, fencing tokens, race validation
- KubePulse — resilience validation, recovery measurement, unsafe-state detection
- Temporal Go SDK PRs — merged OSS fixes in workflow/runtime behavior
| Project | What it proves |
|---|---|
| Faultline | I can design execution systems that preserve correctness under crashes, lease expiry, and race conditions |
| KubePulse | I can validate real recovery behavior, not just surface-level health signals |
| AutoOps-Insight | I can turn noisy operational failures into structured incident signals and operator-facing decisions |
| DetTrace | I can isolate first-failure points and reconstruct divergent system behavior deterministically |
- Temporal Go SDK: 2 merged PRs and 1 open PR across workflow test reliability and context propagation behavior
- Azure Go SDK: 2 PRs under review in retry/error handling and trace context propagation
- How I Built a Distributed Job Queue That Stays Correct Under Crashes, Races, and Network Faults
- I Thought I Built Observability. Then an Incident Proved I Didn’t.
- Detecting Silent Regressions in GenAI Systems at Scale
Most entry-level profiles show projects that work.
This profile is built around systems that are tested under:
- crashes
- retries
- lease expiry
- stale writes
- degraded dependencies
- misleading health signals
The goal is not just building software that runs. It is building software that stays correct, exposes unsafe behavior, and leaves behind enough evidence to debug failures precisely.
Backend infrastructure · Distributed systems · Reliability engineering · Incident analysis · Developer tooling
LinkedIn · GitHub · Medium · kriti0608@gmail.com
If you're hiring for backend, infrastructure, reliability or production engineering roles, start with Faultline and KubePulse. \n\n
I build systems that:
-
execute correctly under failure
Faultline — crash-safe execution, replayable races, and correctness under partial failure -
detect unsafe system behavior
KubePulse — resilience validation, timing-aware diagnostics, and unsafe-state detection under faults -
diagnose failures precisely
DetTrace — deterministic replay, first-divergence isolation, and replay-based debugging for concurrent, distributed, and control-loop systems \n

