BK-139: Research bug prevention beyond testing by haalfi · Pull Request #373 · haalfi/remote-store

haalfi · 2026-04-03T19:49:10Z

Summary

Research document analyzing all 20 bugs from 0.21.1, categorizing by root cause (cross-backend inconsistency, resource leaks, error swallowing, edge-case inputs, cache coherency, caller data mutation), and evaluating 6 prevention strategies beyond testing.
BK-139 backlog item with 7 prioritized deliverables: _safe_wrap helper (+ latent S3 bug fix), check_error_handling.py AST script, ruff BLE rules, Hypothesis PBT for partition/config/path, extended conformance suite (~58 tests), stateful backend model, ResourceWarning safety net.
Bonus finding: S3Backend.read() has the same unprotected acquire-then-wrap pattern as BUG-142/158 — a latent resource leak bug.

Test plan

Docs-only change — no code, no coverage impact
Research doc follows existing format (sdd/research/research-*.md)
Backlog item follows ID conventions (BK-139, next after BK-138)

https://claude.ai/code/session_01HRCdVGrFkWwX9S1XGBd8EG

Analyze 20 bugs from 0.21.1, categorize by root cause, evaluate 6 prevention strategies: Design-by-Contract, Property-Based Testing (Hypothesis), extended conformance suite, resource safety patterns, static analysis for error handling, and machine-checked review rules. Key findings: adopt Hypothesis for 4 targets, add _safe_wrap helper (found latent S3 bug), build check_error_handling.py AST script, enable ruff BLE rules, extend conformance with ~58 edge-case tests. https://claude.ai/code/session_01HRCdVGrFkWwX9S1XGBd8EG

Seven prioritized deliverables from research-bug-prevention-beyond-testing.md: _safe_wrap helper, AST error-handling checker, ruff BLE rules, Hypothesis PBT, extended conformance suite, stateful backend model, ResourceWarning safety net. https://claude.ai/code/session_01HRCdVGrFkWwX9S1XGBd8EG

haalfi

Review of research doc and backlog entry. 3 findings posted as inline comments.

Research validation (user-requested):

Latent S3 bug claim: CONFIRMED. S3Backend.read() (_s3.py:130-134) acquires a file handle then wraps it in _ErrorMappingStream / BufferedReader without try/except protection. SFTP (_sftp.py:303-308, BUG-142 fix) and Azure (_azure.py:316-321, BUG-158 fix) both have this protection. The pattern is identical.
DbC rejection: SOUND. Zero-runtime-dep constraint rules out icontract. The conformance suite already serves as the behavioral contract system. Industry precedent (fsspec, smart_open, SQLAlchemy) confirms no major storage lib uses DbC libraries.
PBT adoption for P1-P4: WELL-JUSTIFIED. The 4 targets have clear oracles (roundtrip, no-corruption, idempotency, dict-equivalence) making them textbook PBT candidates. Dev-only Hypothesis dependency has zero runtime impact.
_safe_wrap helper: CORRECT PATTERN. Exactly generalizes the individual fixes applied to SFTP (BUG-142) and Azure (BUG-158). The proposed signature matches what both backends already do inline.
Extended conformance suite: REASONABLE. The gap analysis (parameter combos, edge inputs, error fidelity, metadata, resource cleanup, operational consistency) maps directly to the 0.21.1 bug classes.
Static analysis approach: REASONABLE. ruff BLE rules + custom AST script is proportionate. The AST script targets the semantic gap (broad except with silent return and no errno check) that ruff cannot detect.

Generated by Claude Code

haalfi

Research and decisions review -- 3 inline comments posted. See first review comment for full validation of research findings and decisions (all confirmed sound).

Generated by Claude Code

sdd/research/research-bug-prevention-beyond-testing.md

- Add missing Status/Scope/Related header fields to research doc - Fix cluster count (seven, not six), bug count (21, not 20), add BUG-157 - File BUG-159 for latent S3/S3PyArrow read() stream-wrapping leak https://claude.ai/code/session_01HRCdVGrFkWwX9S1XGBd8EG

- Reorder priority: resource safety + PBT first (highest ROI) - Add P4 as highest-value PBT target note - Add CI impact warning for extended conformance (~400 parameterized cases) - Confirm ruff TRY/PGH don't cover silent-swallow pattern; BLE001 only flags Exception not IOError - Add maintenance risk note for AST script; defer behind conformance error fidelity tests - Add risk column to priority table - Update BK-139 deliverable order to match https://claude.ai/code/session_01HRCdVGrFkWwX9S1XGBd8EG

haalfi

Review comments

Bug: BUG-159 description overstates the S3PyArrow leak pattern

sdd/BACKLOG.md lines added in this PR describe S3PyArrowBackend.read() (_s3_pyarrow.py:196-200) as having "the same issue" as BUG-142 (SFTP) and BUG-158 (Azure). The actual code is:

def read(self, path: str) -> BinaryIO:
    with self._pyarrow_errors(path):
        pa_file = self._pa_fs.open_input_file(self._pa_path(path))
        raw = _PyArrowBinaryIO(pa_file)
        return cast("BinaryIO", _ErrorMappingStream(raw, self._classify_error, path))

This is a single-layer wrap (no BufferedReader). The leaked resource on failure would be a PyArrow NativeFile (pa_file), not an s3fs file handle. The SFTP/Azure bugs involved a two-layer acquire-then-double-wrap pattern. The description says "same unprotected acquire-then-wrap pattern" — which is structurally true — but calling it the "same issue" and lumping it with the SFTP/Azure fixes is imprecise and will mislead whoever implements the fix. The reproduce instruction ("monkeypatch _ErrorMappingStream.__init__ to raise") is valid for detecting the leak, but the fix and the resource type differ. The BACKLOG entry should describe what actually leaks (a pa_file PyArrow NativeFile) and note it is single-wrap, not double-wrap.

Spec: Bug count is inconsistent — taxonomy sums to 22, narrative claims 21

sdd/research/research-bug-prevention-beyond-testing.md states "21 bugs" in two places: the PR description, the Context paragraph ("The 0.21.1 patch release fixed 21 bugs"), and the taxonomy section header ("Categorizing the 21 bugs"). However, the taxonomy table lists the following bug IDs:

Cross-backend inconsistency: BUG-150, 151, 152, 155 (4)
Resource leak: BUG-142, 144, 156, 158 (4)
Error swallowing: BUG-145, 146, 147 (3)
Edge-case inputs: BUG-136, 139, 140, 141 (4)
Cache coherency: BUG-137, 138 (2)
Mutation of caller data: BUG-148 (1)
Edge-case behavior: BUG-143, 153, 154, 157 (4)

Total: 22 unique bug IDs. The actual BACKLOG-DONE count for v0.21.1 BUG- entries is 23, minus BUG-149 (investigated and closed as not-a-defect, correctly excluded) = 22 real fixes. The "21" figure is off by one throughout the document.

Consistency: BK-139 deliverable #4 contradicts the research doc's analysis of TRY rules

sdd/BACKLOG.md BK-139 deliverable #4 reads:

Enable ruff BLE + TRY rule sets (1-line config change)

But the research doc (§2.5 ruff gap analysis table) explicitly states:

TRY* (tryceratops) | Not enabled | No — TRY rules cover raise style, logging, else clauses. None flag silent returns.

The research doc's verdict for §2.5 is "One custom AST script + enable ruff BLE rules" — TRY is listed in the gap analysis as not catching the bug class of interest. There is no rationale in the document for including TRY in the backlog deliverable. If the intent is to enable TRY for a different reason (raise style, logging patterns), that reasoning should be stated in the backlog item. As written, the backlog item contradicts the research that motivated it.

Generated by Claude Code

- Fix bug count: 22 bugs (not 21), last cluster has 4 entries not 3 - BUG-159: distinguish S3 double-layer wrap (s3fs handle) from S3PyArrow single-layer wrap (PyArrow NativeFile) - Remove ruff TRY from priority table and BK-139 — research explicitly concluded TRY rules don't catch the target pattern https://claude.ai/code/session_01HRCdVGrFkWwX9S1XGBd8EG

claude added 2 commits April 3, 2026 19:34

haalfi commented Apr 3, 2026

View reviewed changes

sdd/research/research-bug-prevention-beyond-testing.md Show resolved Hide resolved

sdd/research/research-bug-prevention-beyond-testing.md Show resolved Hide resolved

sdd/research/research-bug-prevention-beyond-testing.md Show resolved Hide resolved

claude added 2 commits April 4, 2026 08:50

haalfi commented Apr 4, 2026

View reviewed changes

haalfi merged commit 61207ae into master Apr 4, 2026
13 checks passed

haalfi deleted the claude/analyze-bug-prevention-W0LIe branch April 4, 2026 13:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BK-139: Research bug prevention beyond testing#373

BK-139: Research bug prevention beyond testing#373
haalfi merged 5 commits intomasterfrom
claude/analyze-bug-prevention-W0LIe

haalfi commented Apr 3, 2026

Uh oh!

haalfi left a comment

Uh oh!

haalfi left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

haalfi left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

haalfi commented Apr 3, 2026

Summary

Test plan

Uh oh!

haalfi left a comment

Choose a reason for hiding this comment

Uh oh!

haalfi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

haalfi left a comment

Choose a reason for hiding this comment

Review comments

Bug: BUG-159 description overstates the S3PyArrow leak pattern

Spec: Bug count is inconsistent — taxonomy sums to 22, narrative claims 21

Consistency: BK-139 deliverable #4 contradicts the research doc's analysis of TRY rules

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants