BK-139: Research bug prevention beyond testing#373
Conversation
Analyze 20 bugs from 0.21.1, categorize by root cause, evaluate 6 prevention strategies: Design-by-Contract, Property-Based Testing (Hypothesis), extended conformance suite, resource safety patterns, static analysis for error handling, and machine-checked review rules. Key findings: adopt Hypothesis for 4 targets, add _safe_wrap helper (found latent S3 bug), build check_error_handling.py AST script, enable ruff BLE rules, extend conformance with ~58 edge-case tests. https://claude.ai/code/session_01HRCdVGrFkWwX9S1XGBd8EG
Seven prioritized deliverables from research-bug-prevention-beyond-testing.md: _safe_wrap helper, AST error-handling checker, ruff BLE rules, Hypothesis PBT, extended conformance suite, stateful backend model, ResourceWarning safety net. https://claude.ai/code/session_01HRCdVGrFkWwX9S1XGBd8EG
haalfi
left a comment
There was a problem hiding this comment.
Review of research doc and backlog entry. 3 findings posted as inline comments.
Research validation (user-requested):
- Latent S3 bug claim: CONFIRMED.
S3Backend.read()(_s3.py:130-134) acquires a file handle then wraps it in_ErrorMappingStream/BufferedReaderwithout try/except protection. SFTP (_sftp.py:303-308, BUG-142 fix) and Azure (_azure.py:316-321, BUG-158 fix) both have this protection. The pattern is identical. - DbC rejection: SOUND. Zero-runtime-dep constraint rules out icontract. The conformance suite already serves as the behavioral contract system. Industry precedent (fsspec, smart_open, SQLAlchemy) confirms no major storage lib uses DbC libraries.
- PBT adoption for P1-P4: WELL-JUSTIFIED. The 4 targets have clear oracles (roundtrip, no-corruption, idempotency, dict-equivalence) making them textbook PBT candidates. Dev-only Hypothesis dependency has zero runtime impact.
_safe_wraphelper: CORRECT PATTERN. Exactly generalizes the individual fixes applied to SFTP (BUG-142) and Azure (BUG-158). The proposed signature matches what both backends already do inline.- Extended conformance suite: REASONABLE. The gap analysis (parameter combos, edge inputs, error fidelity, metadata, resource cleanup, operational consistency) maps directly to the 0.21.1 bug classes.
- Static analysis approach: REASONABLE. ruff
BLErules + custom AST script is proportionate. The AST script targets the semantic gap (broadexceptwith silent return and noerrnocheck) that ruff cannot detect.
Generated by Claude Code
haalfi
left a comment
There was a problem hiding this comment.
Research and decisions review -- 3 inline comments posted. See first review comment for full validation of research findings and decisions (all confirmed sound).
Generated by Claude Code
- Add missing Status/Scope/Related header fields to research doc - Fix cluster count (seven, not six), bug count (21, not 20), add BUG-157 - File BUG-159 for latent S3/S3PyArrow read() stream-wrapping leak https://claude.ai/code/session_01HRCdVGrFkWwX9S1XGBd8EG
- Reorder priority: resource safety + PBT first (highest ROI) - Add P4 as highest-value PBT target note - Add CI impact warning for extended conformance (~400 parameterized cases) - Confirm ruff TRY/PGH don't cover silent-swallow pattern; BLE001 only flags Exception not IOError - Add maintenance risk note for AST script; defer behind conformance error fidelity tests - Add risk column to priority table - Update BK-139 deliverable order to match https://claude.ai/code/session_01HRCdVGrFkWwX9S1XGBd8EG
haalfi
left a comment
There was a problem hiding this comment.
Review comments
Bug: BUG-159 description overstates the S3PyArrow leak pattern
sdd/BACKLOG.md lines added in this PR describe S3PyArrowBackend.read() (_s3_pyarrow.py:196-200) as having "the same issue" as BUG-142 (SFTP) and BUG-158 (Azure). The actual code is:
def read(self, path: str) -> BinaryIO:
with self._pyarrow_errors(path):
pa_file = self._pa_fs.open_input_file(self._pa_path(path))
raw = _PyArrowBinaryIO(pa_file)
return cast("BinaryIO", _ErrorMappingStream(raw, self._classify_error, path))This is a single-layer wrap (no BufferedReader). The leaked resource on failure would be a PyArrow NativeFile (pa_file), not an s3fs file handle. The SFTP/Azure bugs involved a two-layer acquire-then-double-wrap pattern. The description says "same unprotected acquire-then-wrap pattern" — which is structurally true — but calling it the "same issue" and lumping it with the SFTP/Azure fixes is imprecise and will mislead whoever implements the fix. The reproduce instruction ("monkeypatch _ErrorMappingStream.__init__ to raise") is valid for detecting the leak, but the fix and the resource type differ. The BACKLOG entry should describe what actually leaks (a pa_file PyArrow NativeFile) and note it is single-wrap, not double-wrap.
Spec: Bug count is inconsistent — taxonomy sums to 22, narrative claims 21
sdd/research/research-bug-prevention-beyond-testing.md states "21 bugs" in two places: the PR description, the Context paragraph ("The 0.21.1 patch release fixed 21 bugs"), and the taxonomy section header ("Categorizing the 21 bugs"). However, the taxonomy table lists the following bug IDs:
- Cross-backend inconsistency: BUG-150, 151, 152, 155 (4)
- Resource leak: BUG-142, 144, 156, 158 (4)
- Error swallowing: BUG-145, 146, 147 (3)
- Edge-case inputs: BUG-136, 139, 140, 141 (4)
- Cache coherency: BUG-137, 138 (2)
- Mutation of caller data: BUG-148 (1)
- Edge-case behavior: BUG-143, 153, 154, 157 (4)
Total: 22 unique bug IDs. The actual BACKLOG-DONE count for v0.21.1 BUG- entries is 23, minus BUG-149 (investigated and closed as not-a-defect, correctly excluded) = 22 real fixes. The "21" figure is off by one throughout the document.
Consistency: BK-139 deliverable #4 contradicts the research doc's analysis of TRY rules
sdd/BACKLOG.md BK-139 deliverable #4 reads:
Enable ruff
BLE+TRYrule sets (1-line config change)
But the research doc (§2.5 ruff gap analysis table) explicitly states:
TRY* (tryceratops) | Not enabled | No — TRY rules cover raise style, logging, else clauses. None flag silent returns.
The research doc's verdict for §2.5 is "One custom AST script + enable ruff BLE rules" — TRY is listed in the gap analysis as not catching the bug class of interest. There is no rationale in the document for including TRY in the backlog deliverable. If the intent is to enable TRY for a different reason (raise style, logging patterns), that reasoning should be stated in the backlog item. As written, the backlog item contradicts the research that motivated it.
Generated by Claude Code
- Fix bug count: 22 bugs (not 21), last cluster has 4 entries not 3 - BUG-159: distinguish S3 double-layer wrap (s3fs handle) from S3PyArrow single-layer wrap (PyArrow NativeFile) - Remove ruff TRY from priority table and BK-139 — research explicitly concluded TRY rules don't catch the target pattern https://claude.ai/code/session_01HRCdVGrFkWwX9S1XGBd8EG
Summary
_safe_wraphelper (+ latent S3 bug fix),check_error_handling.pyAST script, ruffBLErules, Hypothesis PBT for partition/config/path, extended conformance suite (~58 tests), stateful backend model,ResourceWarningsafety net.Test plan
sdd/research/research-*.md)https://claude.ai/code/session_01HRCdVGrFkWwX9S1XGBd8EG