Skip to content

fix: LLM-based comment resolution + cap commit fetch#21

Merged
mthooyavan merged 2 commits intomainfrom
fix/resolver-llm-verification
Mar 16, 2026
Merged

fix: LLM-based comment resolution + cap commit fetch#21
mthooyavan merged 2 commits intomainfrom
fix/resolver-llm-verification

Conversation

@mthooyavan
Copy link
Copy Markdown
Owner

Summary

  • Resolver rewrite: Replace heuristic matching (location + title prefix vs new_findings) with LLM verification that reads the actual current code and confirms whether each old finding was fixed
  • Cap commit fetch: Limit get_pr_commits() to 50 most recent commits to prevent unbounded API calls and prompt inflation
  • Add .gitignore: Exclude __pycache__/, .pyc, build artifacts

Problem

PR #20 exposed a flaw: DiffFox flagged "unbounded paginated commit fetch" (valid finding), but a formatting-only follow-up push caused the resolver to mark it "Addressed" because:

  1. The dedup filter removes already-posted findings from new_findings (by design)
  2. The resolver saw the old finding missing from new_findings → assumed it was fixed
  3. In reality, the issue was unchanged — only whitespace formatting changed

How it works now

For each old DiffFox comment:

  1. Check if already replied to (skip if so)
  2. Fetch the current file content at head_sha
  3. Extract a ±30 line window around the comment's location
  4. Ask LLM: "Was this specific issue fixed in the current code?"
  5. Only reply "Addressed" if LLM confirms verdict: "fixed" with reasoning
  6. If file was deleted → automatically mark as fixed
  7. If LLM says not fixed or uncertain → leave the comment open

Test plan

  • All 23 tests pass
  • Ruff lint + format clean
  • Verify on next real PR review that resolution is accurate

🤖 Generated with Claude Code

Replace heuristic-based resolver (location/title matching against
new_findings) with LLM verification that reads the current code and
confirms whether each old finding was actually fixed. The old approach
falsely marked findings as "Addressed" when they were simply not
re-flagged (e.g., due to dedup filtering or formatting-only changes).

Also cap get_pr_commits() at 50 most recent commits to avoid unbounded
API calls and prompt inflation on long-lived PRs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DiffFox — 3 Findings

🟡 Warning: 3

Files with findings:

  • src/diff_fox/review/resolver.py (2)
  • src/diff_fox/scm/github.py (1)

DiffFox

Comment on lines +95 to 103
logger.debug("LLM resolution check failed for comment %d", comment["id"])
return ResolutionVerdict(verdict="uncertain", reasoning="LLM check failed")


async def resolve_addressed_comments(
new_findings: list[Finding],
repo: str,
pr_number: int,
head_sha: str,
scm: SCMProvider,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Issue — worth fixing: LLM call per old comment, no token budget on context

Each old DiffFox comment triggers a separate LLM call with up to 61 lines of code context plus the comment body. With max_tokens=512 set for the response, the input token cost is unbounded — a large file with many old comments could send thousands of input tokens per call, multiplied by the number of unresolved comments per PR review.

Fix: Add a hard cap on the number of comments checked per PR (e.g., MAX_COMMENTS_TO_CHECK = 20) and log a warning when skipping. Also consider caching verdicts by (comment_id, head_sha) so re-runs of the same review don't re-spend tokens.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed — this issue is no longer detected in the latest review.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — added MAX_COMMENTS_TO_CHECK = 20 cap with a warning log when exceeded.

Comment on lines +143 to +146
# Run LLM verification concurrently (capped)
semaphore = asyncio.Semaphore(MAX_CONCURRENT_CHECKS)

async def check_with_limit(c: DiffFoxComment) -> tuple[DiffFoxComment, ResolutionVerdict]:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Issue — worth fixing: Already-resolved check uses pre-filtered candidates list

The 'already resolved' check (lines 143-146) is done before building candidates, so it correctly skips resolved comments. However, the all_reply_text check looks for 'Addressed' or 'Acknowledged' in bot replies — but the bot's new reply text uses em-dash () not a hyphen, and the check is purely substring-based. This is fine for the new text, but if a user happens to write 'Addressed' in their own reply, the comment will be permanently skipped even if the issue was never actually resolved by the bot.

Fix: Consider checking only bot-authored replies (e.g., filter all_replies by author) rather than all reply text when determining if a comment is already resolved.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed — this issue is no longer detected in the latest review.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — now matching exact bot reply prefix (✅ **Addressed** / ✅ **Acknowledged**) instead of bare substring matching.

Comment on lines 134 to 135
"""Fetch commits for a pull request (capped at 50 most recent)."""
data = await self._get_paginated(f"/repos/{repo}/pulls/{pr_number}/commits")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Issue — worth fixing: Cap applied after full paginated fetch, not before

get_pr_commits() still fetches ALL pages via _get_paginated() and only slices to the last 50 after the fact. For PRs with hundreds of commits this still makes unbounded API calls and incurs the full network/rate-limit cost — the cap only reduces prompt inflation, not API call count.

Fix: Pass a per_page=50 parameter and fetch only the last page, or add an early-exit in _get_paginated once a page limit is reached. Alternatively, document that this only caps prompt size, not API calls.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed — this issue is no longer detected in the latest review.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — switched from _get_paginated() to a single _get() call with per_page=50, so only one API request is made.

- Cap resolution checks at 20 comments per PR to bound LLM token cost
- Match exact bot reply prefix for already-resolved check instead of
  bare substring matching (prevents false skip if user writes "Addressed")
- Use single-page API call (per_page=50) instead of full paginated fetch
  to avoid unbounded API calls on PRs with hundreds of commits

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

✅ DiffFox — No Issues Found

Reviewed PR #21 in mthooyavan/diff-fox. No issues detected.


DiffFox

@mthooyavan mthooyavan merged commit a618429 into main Mar 16, 2026
8 checks passed
@mthooyavan mthooyavan deleted the fix/resolver-llm-verification branch March 16, 2026 04:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant