Skip to content
View kritibehl's full-sized avatar

Block or report kritibehl

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
kritibehl/README.md

Kriti Behl

I build backend and distributed systems that stay correct under failure and make failures easier to diagnose.

Built production backend systems at Thales Group, contributed merged fixes to the Temporal Go SDK, and built systems with proof like 0 duplicate commits across 1,500 race reproductions and probe-healthy / system-unsafe detection under failure.

If you're hiring for backend, infrastructure, reliability, or production engineering roles, start here:

  • Faultline — crash-safe job execution, fencing tokens, race validation
  • KubePulse — resilience validation, recovery measurement, unsafe-state detection
  • Temporal Go SDK PRs — merged OSS fixes in workflow/runtime behavior

What these projects prove

Project What it proves
Faultline I can design execution systems that preserve correctness under crashes, lease expiry, and race conditions
KubePulse I can validate real recovery behavior, not just surface-level health signals
AutoOps-Insight I can turn noisy operational failures into structured incident signals and operator-facing decisions
DetTrace I can isolate first-failure points and reconstruct divergent system behavior deterministically

Open Source Contributions

  • Temporal Go SDK: 2 merged PRs and 1 open PR across workflow test reliability and context propagation behavior
  • Azure Go SDK: 2 PRs under review in retry/error handling and trace context propagation

Selected Writing


Why this profile is different

Most entry-level profiles show projects that work.

This profile is built around systems that are tested under:

  • crashes
  • retries
  • lease expiry
  • stale writes
  • degraded dependencies
  • misleading health signals

The goal is not just building software that runs. It is building software that stays correct, exposes unsafe behavior, and leaves behind enough evidence to debug failures precisely.

Focus Areas

Backend infrastructure · Distributed systems · Reliability engineering · Incident analysis · Developer tooling


Connect

LinkedIn · GitHub · Medium · kriti0608@gmail.com


If you're hiring for backend, infrastructure, reliability or production engineering roles, start with Faultline and KubePulse. \n\n

What I build

I build systems that:

  1. execute correctly under failure
    Faultline — crash-safe execution, replayable races, and correctness under partial failure

  2. detect unsafe system behavior
    KubePulse — resilience validation, timing-aware diagnostics, and unsafe-state detection under faults

  3. diagnose failures precisely
    DetTrace — deterministic replay, first-divergence isolation, and replay-based debugging for concurrent, distributed, and control-loop systems \n

Pinned Loading

  1. faultline faultline Public

    Crash-safe distributed job execution with fencing tokens, lease recovery and deterministic failure validation.

    Python 3

  2. AutoOps-Insight AutoOps-Insight Public

    Reliability analytics for CI failures — recurring signature detection, release-risk reporting, Prometheus metrics, API/CLI and dashboard

    Python 2

  3. dettrace dettrace Public

    Deterministic replay and distributed incident forensics for first-failure and blast-radius analysis.

    C++ 2

  4. KubePulse KubePulse Public

    Kubernetes resilience validation for real recovery behavior, probe integrity and rollout scorecards.

    Python 1