Skip to content

BK-139: Research bug prevention beyond testing#373

Merged
haalfi merged 5 commits intomasterfrom
claude/analyze-bug-prevention-W0LIe
Apr 4, 2026
Merged

BK-139: Research bug prevention beyond testing#373
haalfi merged 5 commits intomasterfrom
claude/analyze-bug-prevention-W0LIe

Conversation

@haalfi
Copy link
Copy Markdown
Owner

@haalfi haalfi commented Apr 3, 2026

Summary

  • Research document analyzing all 20 bugs from 0.21.1, categorizing by root cause (cross-backend inconsistency, resource leaks, error swallowing, edge-case inputs, cache coherency, caller data mutation), and evaluating 6 prevention strategies beyond testing.
  • BK-139 backlog item with 7 prioritized deliverables: _safe_wrap helper (+ latent S3 bug fix), check_error_handling.py AST script, ruff BLE rules, Hypothesis PBT for partition/config/path, extended conformance suite (~58 tests), stateful backend model, ResourceWarning safety net.
  • Bonus finding: S3Backend.read() has the same unprotected acquire-then-wrap pattern as BUG-142/158 — a latent resource leak bug.

Test plan

  • Docs-only change — no code, no coverage impact
  • Research doc follows existing format (sdd/research/research-*.md)
  • Backlog item follows ID conventions (BK-139, next after BK-138)

https://claude.ai/code/session_01HRCdVGrFkWwX9S1XGBd8EG

claude added 2 commits April 3, 2026 19:34
Analyze 20 bugs from 0.21.1, categorize by root cause, evaluate 6 prevention strategies: Design-by-Contract, Property-Based Testing (Hypothesis), extended conformance suite, resource safety patterns, static analysis for error handling, and machine-checked review rules.

Key findings: adopt Hypothesis for 4 targets, add _safe_wrap helper (found latent S3 bug), build check_error_handling.py AST script, enable ruff BLE rules, extend conformance with ~58 edge-case tests.

https://claude.ai/code/session_01HRCdVGrFkWwX9S1XGBd8EG
Seven prioritized deliverables from research-bug-prevention-beyond-testing.md: _safe_wrap helper, AST error-handling checker, ruff BLE rules, Hypothesis PBT, extended conformance suite, stateful backend model, ResourceWarning safety net.

https://claude.ai/code/session_01HRCdVGrFkWwX9S1XGBd8EG
Copy link
Copy Markdown
Owner Author

@haalfi haalfi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review of research doc and backlog entry. 3 findings posted as inline comments.

Research validation (user-requested):

  • Latent S3 bug claim: CONFIRMED. S3Backend.read() (_s3.py:130-134) acquires a file handle then wraps it in _ErrorMappingStream / BufferedReader without try/except protection. SFTP (_sftp.py:303-308, BUG-142 fix) and Azure (_azure.py:316-321, BUG-158 fix) both have this protection. The pattern is identical.
  • DbC rejection: SOUND. Zero-runtime-dep constraint rules out icontract. The conformance suite already serves as the behavioral contract system. Industry precedent (fsspec, smart_open, SQLAlchemy) confirms no major storage lib uses DbC libraries.
  • PBT adoption for P1-P4: WELL-JUSTIFIED. The 4 targets have clear oracles (roundtrip, no-corruption, idempotency, dict-equivalence) making them textbook PBT candidates. Dev-only Hypothesis dependency has zero runtime impact.
  • _safe_wrap helper: CORRECT PATTERN. Exactly generalizes the individual fixes applied to SFTP (BUG-142) and Azure (BUG-158). The proposed signature matches what both backends already do inline.
  • Extended conformance suite: REASONABLE. The gap analysis (parameter combos, edge inputs, error fidelity, metadata, resource cleanup, operational consistency) maps directly to the 0.21.1 bug classes.
  • Static analysis approach: REASONABLE. ruff BLE rules + custom AST script is proportionate. The AST script targets the semantic gap (broad except with silent return and no errno check) that ruff cannot detect.

Generated by Claude Code

Copy link
Copy Markdown
Owner Author

@haalfi haalfi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Research and decisions review -- 3 inline comments posted. See first review comment for full validation of research findings and decisions (all confirmed sound).


Generated by Claude Code

claude added 2 commits April 4, 2026 08:50
- Add missing Status/Scope/Related header fields to research doc

- Fix cluster count (seven, not six), bug count (21, not 20), add BUG-157

- File BUG-159 for latent S3/S3PyArrow read() stream-wrapping leak

https://claude.ai/code/session_01HRCdVGrFkWwX9S1XGBd8EG
- Reorder priority: resource safety + PBT first (highest ROI)

- Add P4 as highest-value PBT target note

- Add CI impact warning for extended conformance (~400 parameterized cases)

- Confirm ruff TRY/PGH don't cover silent-swallow pattern; BLE001 only flags Exception not IOError

- Add maintenance risk note for AST script; defer behind conformance error fidelity tests

- Add risk column to priority table

- Update BK-139 deliverable order to match

https://claude.ai/code/session_01HRCdVGrFkWwX9S1XGBd8EG
Copy link
Copy Markdown
Owner Author

@haalfi haalfi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review comments

Bug: BUG-159 description overstates the S3PyArrow leak pattern

sdd/BACKLOG.md lines added in this PR describe S3PyArrowBackend.read() (_s3_pyarrow.py:196-200) as having "the same issue" as BUG-142 (SFTP) and BUG-158 (Azure). The actual code is:

def read(self, path: str) -> BinaryIO:
    with self._pyarrow_errors(path):
        pa_file = self._pa_fs.open_input_file(self._pa_path(path))
        raw = _PyArrowBinaryIO(pa_file)
        return cast("BinaryIO", _ErrorMappingStream(raw, self._classify_error, path))

This is a single-layer wrap (no BufferedReader). The leaked resource on failure would be a PyArrow NativeFile (pa_file), not an s3fs file handle. The SFTP/Azure bugs involved a two-layer acquire-then-double-wrap pattern. The description says "same unprotected acquire-then-wrap pattern" — which is structurally true — but calling it the "same issue" and lumping it with the SFTP/Azure fixes is imprecise and will mislead whoever implements the fix. The reproduce instruction ("monkeypatch _ErrorMappingStream.__init__ to raise") is valid for detecting the leak, but the fix and the resource type differ. The BACKLOG entry should describe what actually leaks (a pa_file PyArrow NativeFile) and note it is single-wrap, not double-wrap.


Spec: Bug count is inconsistent — taxonomy sums to 22, narrative claims 21

sdd/research/research-bug-prevention-beyond-testing.md states "21 bugs" in two places: the PR description, the Context paragraph ("The 0.21.1 patch release fixed 21 bugs"), and the taxonomy section header ("Categorizing the 21 bugs"). However, the taxonomy table lists the following bug IDs:

  • Cross-backend inconsistency: BUG-150, 151, 152, 155 (4)
  • Resource leak: BUG-142, 144, 156, 158 (4)
  • Error swallowing: BUG-145, 146, 147 (3)
  • Edge-case inputs: BUG-136, 139, 140, 141 (4)
  • Cache coherency: BUG-137, 138 (2)
  • Mutation of caller data: BUG-148 (1)
  • Edge-case behavior: BUG-143, 153, 154, 157 (4)

Total: 22 unique bug IDs. The actual BACKLOG-DONE count for v0.21.1 BUG- entries is 23, minus BUG-149 (investigated and closed as not-a-defect, correctly excluded) = 22 real fixes. The "21" figure is off by one throughout the document.


Consistency: BK-139 deliverable #4 contradicts the research doc's analysis of TRY rules

sdd/BACKLOG.md BK-139 deliverable #4 reads:

Enable ruff BLE + TRY rule sets (1-line config change)

But the research doc (§2.5 ruff gap analysis table) explicitly states:

TRY* (tryceratops) | Not enabled | No — TRY rules cover raise style, logging, else clauses. None flag silent returns.

The research doc's verdict for §2.5 is "One custom AST script + enable ruff BLE rules" — TRY is listed in the gap analysis as not catching the bug class of interest. There is no rationale in the document for including TRY in the backlog deliverable. If the intent is to enable TRY for a different reason (raise style, logging patterns), that reasoning should be stated in the backlog item. As written, the backlog item contradicts the research that motivated it.


Generated by Claude Code

- Fix bug count: 22 bugs (not 21), last cluster has 4 entries not 3

- BUG-159: distinguish S3 double-layer wrap (s3fs handle) from S3PyArrow single-layer wrap (PyArrow NativeFile)

- Remove ruff TRY from priority table and BK-139 — research explicitly concluded TRY rules don't catch the target pattern

https://claude.ai/code/session_01HRCdVGrFkWwX9S1XGBd8EG
@haalfi haalfi merged commit 61207ae into master Apr 4, 2026
13 checks passed
@haalfi haalfi deleted the claude/analyze-bug-prevention-W0LIe branch April 4, 2026 13:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants