Skip to content

Richer core dumps: structured errors, sync_steps, LLM trace, ANSI strip#712

Open
vishalramvelu wants to merge 9 commits intopromptdriven:mainfrom
vishalramvelu:feature/711
Open

Richer core dumps: structured errors, sync_steps, LLM trace, ANSI strip#712
vishalramvelu wants to merge 9 commits intopromptdriven:mainfrom
vishalramvelu:feature/711

Conversation

@vishalramvelu
Copy link
Copy Markdown
Contributor

Summary

Improves PDD debug snapshots (core dumps written to .pdd/core_dumps/pdd-core-*.json) so failures are easier to diagnose without log spelunking.

  • Structured errors: Non-exception failures can be recorded via record_core_dump_error and appear under errors in the dump.
  • Dump schema: Bumps to schema_version 2; aligns steps with invoked subcommands and defaults missing model to "unknown".
  • Context in the JSON: Auto-includes relevant .pdd/meta/*_sync.log / *_run.json content, derives sync_steps from sync logs (skips noise where appropriate), and keeps file_contents for tracked / meta paths.
  • Terminal output: Strips CSI + OSC (and related) sequences so terminal_output is readable plain text.
  • Sync failures: On relevant failure paths, attaches details.llm_trace (last prompt/response, redacted/truncated) and test_output_excerpt where applicable.
  • Budget exhaustion: Surfaces as a structured error suitable for the dump.

Test Results

Test_Core_Dump - Passed
Sync_Orchestration - Passed
Test_Core_Errors - Passed
Test_Cli - Passed

Manual testing

  • From a small fixture repo (or this repo), run a command that fails during sync (e.g. intentional test failure on fix) with core dumps enabled (default on unless --no-core-dump).
  • Open the newest .pdd/core_dumps/pdd-core-*.json and confirm:
    schema_version is 2, errors includes expected structured entries when no Python traceback exists, steps length/order matches the run; missing models show "unknown", terminal_output has no obvious ANSI/OSC garbage, sync_steps present when meta sync logs exist; entries look like the failing operations, on LLM-using failure paths, details.llm_trace appears when designed
  • Run with --no-core-dump and confirm no new dump file (or no write, per existing behavior).

Checklist

  • Code changes limited to PDD CLI / sync / dump behavior (no PDD_CAP prompt files in this PR).
  • uv run pytest passes for the suites/commands above (or full CI-equivalent run).
  • No secrets or raw API keys in llm_trace / file snapshots (redaction/truncation behavior reviewed).
  • Manual spot-check of a real .pdd/core_dumps/*.json after a failed sync.
  • Prompts submitted to PDD_CAP Repo as PR

Fixes #710

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves PDD “core dump” debug snapshots to make sync failures diagnosable from the JSON alone (structured non-exception errors, per-operation sync steps derived from meta logs, failure-only LLM/test traces, and ANSI/OSC stripping for captured terminal output).

Changes:

  • Add structured core-dump errors via record_core_dump_error and record logical/non-exception failure paths in sync.
  • Bump core dump schema to v2 and enrich dumps with auto-included meta artifacts plus derived sync_steps.
  • Capture failure-only debugging context (ANSI/OSC-clean terminal output, truncated test output excerpts, and last LLM prompt/response pair).

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
pdd/core/errors.py Adds structured core-dump error recording API.
pdd/core/dump.py Bumps schema to v2; auto-includes meta sync/run files; derives sync_steps; ensures steps[*].model defaults to "unknown".
pdd/core/cli.py Expands ANSI stripping to cover CSI + OSC sequences.
pdd/core/llm_trace.py Introduces lightweight, redacted/truncated LLM prompt/response trace capture by operation.
pdd/llm_invoke.py Records best-effort LLM traces for cloud + LiteLLM paths.
pdd/sync_orchestration.py Records logical failures as structured core-dump errors; captures truncated test output excerpts; attaches failure-only LLM traces to operation log entries.
pdd/sync_main.py Records budget exhaustion as a structured core-dump error.
tests/test_core_dump.py Updates expectations for schema v2; adds tests for meta sync/run auto-inclusion, derived sync_steps, and model defaulting.
tests/core/test_cli.py Adds tests covering OSC/cursor-sequence stripping.
tests/test_core_errors.py Adds unit test for structured error recording.
tests/test_sync_orchestration.py Adds tests validating logical-failure error recording, test output excerpt truncation, and LLM trace attachment.
pdd/simple_math.py Adds a new module (appears unrelated to the PR’s stated scope).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@gltanaka
Copy link
Copy Markdown
Contributor

target 3/24

@vishalramvelu
Copy link
Copy Markdown
Contributor Author

Units Tests failing because TestConvergencePromptRequirements expects literal substrings in agentic_e2e_fix_orchestrator_python.prompt, but updating that prompt caused a merge conflict with main on the same file. Leaving the prompt as-is for now. We can align prompt text with Issue #903 tests in a follow-up once the branch is rebased cleanly or the file is de-conflicted upstream.

@gltanaka
Copy link
Copy Markdown
Contributor

Hey @vishalramvelu — CI is failing because the PR's version of pdd/prompts/agentic_e2e_fix_orchestrator_python.prompt is missing the Requirement 5a/5b convergence sections ("empty dev-units short-circuit" and "per-cycle file-hash comparison") that the updated tests assert on.

The two failing tests:

  • test_orchestrator_prompt_contains_empty_dev_units_requirement
  • test_orchestrator_prompt_contains_file_hash_comparison_requirement

These requirements exist in upstream main but weren't picked up by the fork. Could you rebase onto the latest upstream main? That should bring in the prompt content and resolve the merge conflicts as well.

@gltanaka
Copy link
Copy Markdown
Contributor

target 3/31

Made-with: Cursor

# Conflicts:
#	pdd/prompts/agentic_bug_step11_e2e_test_LLM.prompt
#	pdd/prompts/agentic_e2e_fix_orchestrator_python.prompt
#	tests/test_issue_633_reproduction.py
Copy link
Copy Markdown
Contributor

@gltanaka gltanaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @vishalramvelu — the core changes here look solid and well-tested. Just 3 files that need to be removed before we can merge:

  1. pdd/core_dump_smoke.py — PDD-generated test file that from solution import add and from z3 import .... Neither exists in the package, so this will break imports. Same issue Copilot flagged for simple_math.py (which you removed) — this one slipped through.

  2. context/simple_math_example.py — Another PDD-generated artifact, unrelated to this PR's scope.

  3. uv.lock — The project doesn't use uv (no references in Makefile, CI, or pyproject.toml). This looks like it was committed from your local setup. 3,637 lines we don't need.

Once those are removed, this is good to merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve core dumps: per-op sync steps, structured failures, and failure-only LLM/test traces

3 participants