Machine-readable “truth artifacts” (receipts) for reliability failures: rate limits, retries, timeouts, flaky CI.
Each receipt links to primary evidence and normalizes:
hazard → constraints → knobs → verification
This repo is a public library of small, audit-friendly receipts that:
- point to primary evidence (issue, logs, PR, incident writeup)
- extract the failure mode (“hazard”)
- propose constraints/knobs that prevent recurrence
- describe how to verify the mitigation
A receipt is eligible if it has:
- Primary evidence (issue, PR, logs, release note, or postmortem)
- A clear failure mode worth naming (hazard)
- At least one constraint (“must be true”) and one knob (“we can tune this”)
- A verification path (how you’d prove the mitigation worked)
Receipts may be sourced from public issues/PRs/releases even when the incident is already fixed — already fixed is fine if the evidence is still legible and the mitigation generalizes.
This repo is intentionally high-signal. Not every “bug” or “rate limit” is a receipt.
A candidate is receipt-worthy only if it meets all of:
-
Real failure, not a wishlist
- Evidence of an actual incident: error output, status codes, timeouts, CI failure logs, user impact.
- Not just “implement X” / “add rate limiting” / generic hardening.
- If the issue has no incident evidence (only ‘please add rate limiting’), it’s not a receipt.
-
Reusable pattern
- The failure mode generalizes beyond one codebase (e.g., 429 + Retry-After, retry budget exhaustion, deadline exceeded masking root cause).
- If it’s a one-off patch detail, it doesn’t belong here.
-
Leverage
- The constraints/knobs materially change outcomes (reduce recurrence, bound blast radius, make diagnosis faster).
- A receipt should teach a guardrail, not narrate a fix.
Radar can surface candidates; this repo is the final filter.
If it wouldn’t help a different team six months from now, don’t capture it.
index.json— machine entrypoint (registry of receipts)schemas/receipt.v0.json— JSON schema for curated truth artifactsschemas/decision_event.v1.schema.json— canonical execution receipt schemapitstop_truth/ingest.py— JSONL ingest +decision_event.v1validationreceipts/YYYY/MM/<receipt-id>/receipt.json— one curated receipt per folder
capability.json — structured summary of this corpus for programmatic discovery.
Contains:
- corpus statistics (receipt count, hazard classes, repos analyzed)
- documented failure patterns with canonical receipts
- detection tool reference (pitstop-check)
- classification model (WAIT / CAP / STOP)
- query URLs for index and schema
Intended for agents and tools that need to understand what this corpus contains without parsing individual receipts.
This repo works with two distinct receipt types:
Machine-emitted execution records (produced by Pitstop Guard in pitstop-commons) and consumable by pitstop-scan.
- Schema:
schemas/decision_event.v1.schema.json - Format: JSONL (one event per line)
- Ingested here by:
pitstop_truth.ingest
These receipts capture runtime facts:
- operation + endpoint
- budget + retry envelope
- outcome + error classification
- latency + cost
- policy decision
They are canonical and mechanical, not editorial.
Example ingest:
python -m pitstop_truth.ingest \
--in ../pitstop-commons/proto/receipts.run.jsonl \
--schema schemas/decision_event.v1.schema.json \
--out receipts/_ingest/receipts.jsonlExecution receipts are append-only and may be high volume.
Human-authored, normalized reliability learnings.
- Schema:
schemas/receipt.v0.json - Stored as:
receipts/YYYY/MM/<receipt-id>/receipt.json - Indexed via:
index.json
These receipts answer:
hazard → constraints → knobs → verification
They are editorial and high-signal.
- Many execution receipts may roll up into a single curated truth artifact.
- Truth artifacts describe the generalizable lesson.
decision_event.v1is validated viajsonschemaduring ingest (pitstop_truth/ingest.py).receipt.v0is validated viascripts/validate_receipts.py(jsonschema overreceipts/**/receipt.json).
A receipt is a single JSON file that conforms to schemas/receipt.v0.json.
-
schema_version: must be"receipt.v0" -
id:PT-YYYY-MM-DD-<slug> where <slug> is lowercase a–z / 0–9 / '-' and length ≥ 8.<slug>must be lowercasea–z / 0–9 / '-'- minimum length ≥ 8 characters
- Convention: prefix the slug with a source hint such as
github-issue-,github-pr-,log-,incident-
-
created_at: ISO 8601 datetime (UTC recommended, e.g.2026-02-26T07:31:39Z) -
source:url(primary evidence)kind:github_issue | github_pr | log | incident_postmortem | other- optional:
repo,issue_or_pr, plus any additional metadata (allowed)
-
hazard:class: array of hazard class strings
(e.g.rate_limit_429,retry_budget_exhausted)summary: one-line description of the failure modesignals: array of machine-readable-ish strings extracted from evidence
(error text, headers, log fragments, “--parallel 1 fixes”, etc.)
-
constraints: array of “must be true” guardrails that prevent recurrence -
knobs: array of configurable controls
(limits, backoff params, concurrency caps, driver flags, etc.) -
verification: array of steps to prove mitigation worked
Optional fields may be omitted.
-
notes: freeform context (background, edge cases, why the mitigation mattered, etc.) -
tags: array of strings for lightweight grouping
(e.g.ci,rate_limit,timeout,driver,control_plane) -
mitigation_signature: compact, comparable summary of the guardrail pattern
Intended for clustering and deduplication across receipts.
Typical structure:hazards: normalized hazard labelsconstraints: distilled guardrail invariantsknobs: key control surfacesanti_patterns: common failure shapes this prevents
-
routing_impact: router / executor-facing implications of the hazard
Describes how a runtime system should behave when this pattern is detected.
May include:default_action(e.g.cooldown_and_route_away,failfast_with_hint)- cooldown semantics or scope keys
- probe strategy guidance
- classification rules
- detection thresholds (log signatures, repeat counts, etc.)
Schema: schemas/receipt.v0.json
Schema enforcement:
decision_event.v1is validated viajsonschemaduring ingest (pitstop_truth/ingest.py).receipt.v0is validated viascripts/validate_receipts.py(jsonschema acrossreceipts/**/receipt.json).
Run locally:
python3 scripts/validate_receipts.py- Create a new receipt folder:
DATE="YYYY-MM-DD"
YYYY="${DATE%%-*}"
MM="${DATE#*-}"; MM="${MM%%-*}"
RID="PT-$DATE-<slug>" # where slug starts with github-issue-... / github-pr-... etc
mkdir -p "receipts/$YYYY/$MM/$RID"- Add receipt.json in that folder and validate it locally:
python3 scripts/validate_receipts.py
python3 -m json.tool index.json >/dev/null- Upsert it into index.json using the helper script:
python3 scripts/add_to_index.py \
--id "$RID" \
--date "$DATE" \
--repo "owner/name" \
--source-url "https://..." \
--path "receipts/YYYY/MM/$RID/receipt.json" \
--hazard "rate_limit_429" \
--hazard "retry_budget_exhausted" \
--signal "--parallel 1 fixes" \
--signal "429 after backoff" \
--knob "parallelism_cap" \
--knob "backoff_jitter"- Validate receipts + index (schema + JSON):
python3 scripts/validate_receipts.py && python3 -m json.tool index.json >/dev/null && echo "ok ✅"- Commit + push:
python3 -m json.tool index.json >/dev/null
git add index.json "receipts/YYYY/MM/$RID/receipt.json"
git commit -m "receipt: add $RID"
git push- IDs: PT-YYYY-MM-DD- (slug convention: github-issue-..., github-pr-..., log-..., incident-...)
- Immutability: receipts should be treated as immutable once published; if you must revise, add a new receipt or note a superseding receipt in notes.
- Index stability: index.json is the canonical list for machines; paths are repo-relative.
Apache-2.0