Add flaw tracer for root-cause analysis of plan reports by neoneye · Pull Request #534 · PlanExeOrg/PlanExe

neoneye · 2026-04-05T22:00:40Z

Summary

Adds worker_plan_internal/flaw_tracer/ package — a CLI tool that traces flaws in PlanExe reports upstream through the pipeline DAG to find where they originated
Static registry maps all 70 pipeline stages with their output files, upstream dependencies, and source code paths
Three-phase recursive algorithm: (1) identify flaws via LLM, (2) trace each upstream with dedup, (3) analyze source code at origin
Produces both JSON and markdown reports, sorted by trace depth (deepest root cause first)
46 new tests, all passing. No regressions in the broader test suite (314 passed)

Usage

python -m worker_plan_internal.flaw_tracer \
    --dir /path/to/output \
    --file 030-report.html \
    --flaw "The budget appears unvalidated..." \
    --verbose

Test plan

All 46 flaw_tracer tests pass
All 314 worker_plan tests pass (no regressions)
CLI --help works
Module is importable
Manual test with a real output directory and LLM (post-merge)

🤖 Generated with Claude Code

Static DAG registry mapping all 48 PlanExe pipeline stages to their output files, upstream dependencies, and source code paths. Includes lookup functions (find_stage_by_filename, get_upstream_files, get_source_code_paths) and 14 passing tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

FlawTracer orchestrates three-phase flaw tracing through the pipeline DAG: - Phase 1: LLM-based flaw identification in starting file - Phase 2: Recursive upstream tracing with deduplication and max depth - Phase 3: Source code analysis at flaw origin stages Tests mock the LLM-calling methods to verify tracing logic, deduplication, depth limits, multi-flaw handling, and depth-sorted output. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Previously, _analyze_source_code was only called when no upstream origin was found (the fallback path). When _trace_upstream successfully identified a deeper origin, Phase 3 was skipped entirely. Now Phase 3 runs whenever an origin stage is known, regardless of how it was determined. Also removes unused imports (json in tracer.py, MagicMock and json in test_tracer.py) and adds a test verifying Phase 3 is called at a deep upstream origin. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ze types - Remove unused `field` import from dataclasses - Remove unused `source_code_base` parameter from FlawTracer.__init__() (registry handles source code path resolution via its own _SOURCE_BASE) - Replace `Optional[X]` with `X | None` using `from __future__ import annotations` - Add clarifying comments for dedup strategy and first-match-wins logic - Remove dead `mock_analysis` variable and unused `SourceCodeAnalysisResult` import from test_tracer.py Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Appends a JSONL line for each significant event during tracing: phase1_start/done, trace_flaw_start/done, upstream_check, upstream_found, origin_found, phase3_start, trace_complete. Monitor progress with: tail -f events.jsonl Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Format: 2026-04-05T23:40:03Z Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Documents what works, what needs fixing (Phase 1 anchoring, loose upstream checks, long evidence quotes), and test run results. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Phase 1 prompt now requires the user's specific flaw as the first result, with additional flaws limited to the same problem family. Phase 2 prompt now requires causal mechanism (not just topical overlap) and limits evidence quotes to 200 characters. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Shows "stages/identify_purpose.py" and "assume/identify_purpose.py" instead of "identify_purpose.py" twice. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Phase 3 always blames the prompt, but some flaws are inherent domain complexity. Future improvement: classify root causes into prompt-fixable, domain complexity, and missing input data. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…xity, missing_input Phase 3 now categorizes each root cause so suggestions are honest: - prompt_fixable: the prompt has a gap that can be edited - domain_complexity: inherently uncertain/contentious, no prompt change resolves it - missing_input: the user's plan didn't provide enough detail Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…esults README: document category field, events.jsonl, updated examples and typical run stats. AGENTS: move Phase 3 to fixed, add India census v3 results, update what-works-well. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

README: add Tips section (start from self_audit, trust chains over suggestions, check category, results are non-deterministic) and Limitations section (LLM subjectivity, first-match-wins, static registry, text-only, diagnostic not prescriptive). AGENTS: add non-determinism and registry drift as MEDIUM open issues, add honest assessment section with guidance on what to trust. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

neoneye and others added 24 commits April 5, 2026 22:52

refactor: use tuples and modern type syntax in flaw_tracer registry

6525dca

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat: add flaw_tracer Pydantic models and prompt builders

2fa4de3

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat: add flaw_tracer JSON and markdown report generation

5c7dd82

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fix: sort flaws by depth in markdown report output

bdacb19

feat: add flaw_tracer CLI entry point

e479283

docs: add flaw tracer design spec and implementation plan

831ea6b

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Merge remote-tracking branch 'origin/main' into feature/flaw-tracer

83f9488

docs: add flaw_tracer README with usage instructions

6cb35c8

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

docs: add flaw_tracer README with usage instructions

8b2e6ff

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: shorten event timestamp to HH:MM:SS

92936a4

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: use compact UTC timestamp without subseconds

ffff673

Format: 2026-04-05T23:40:03Z Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

docs: add AGENTS.md with flaw tracer status and known issues

c5c7c15

Documents what works, what needs fixing (Phase 1 anchoring, loose upstream checks, long evidence quotes), and test run results. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: disambiguate source code filenames with parent directory

2c9b401

Shows "stages/identify_purpose.py" and "assume/identify_purpose.py" instead of "identify_purpose.py" twice. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

docs: update AGENTS.md — mark fixed issues, add test run v2 results

8e20e9e

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add flaw tracer for root-cause analysis of plan reports#534

Add flaw tracer for root-cause analysis of plan reports#534
neoneye wants to merge 24 commits intomainfrom
feature/flaw-tracer

neoneye commented Apr 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

neoneye commented Apr 5, 2026

Summary

Usage

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant