Skip to content

Heal Prompt Drift#728

Open
vishalramvelu wants to merge 8 commits intopromptdriven:mainfrom
vishalramvelu:feature/701_sync
Open

Heal Prompt Drift#728
vishalramvelu wants to merge 8 commits intopromptdriven:mainfrom
vishalramvelu:feature/701_sync

Conversation

@vishalramvelu
Copy link
Copy Markdown
Contributor

Summary

  • Improves pdd update repo-wide mode to be a reliable drift-healing workflow with explicit UX and safer controls.
  • Adds repo-wide flags --all, --dry-run, and --budget, plus validation so repo-only options are enforced in repo mode.
  • Implements dependency-ordered processing of changed modules so high-impact updates are prioritized when budget-limited.
  • Adds post-update maintenance for successful repo updates:
    regenerate included markdown docs referenced by tags, regenerate module examples, upload refreshed dev units to PDD Cloud grounding.
  • Refines behavior discovered during validation:
    dry-run is non-mutating during scan/pair resolution, example/upload pipeline uses current processed pair prompt/code paths as source of truth to avoid path/context mismatch issues.

Tests

  • Added/updated CLI tests in tests/test_update_command.py for:
    --all behavior and validation, --dry-run pass-through and repo-only enforcement, --budget pass-through, positivity validation, and repo-only enforcement.
  • Added/updated repo-mode tests in tests/test_update_main.py for:
    flat dry-run cost estimate, included-doc aggregation in drift report, budget-cap stopping behavior, dependency ordering under budget, included markdown doc regeneration after successful updates, example regeneration + _auto_submit_example submission, dry-run skipping side-effecting work, dry-run scan not creating missing prompts,
    example submit path using current pair prompt/code paths.

Manual Tests

  • Ran uv run pdd update --all --dry-run
    Verified repository drift report output (counts, cost range, drifted mappings, included docs).
    Verified no LLM/prompt write/arch/PRD update path execution.
    Confirmed no “Created missing prompt file” spam during dry-run scan.

  • Ran uv run pdd --force update --all --simple --budget 0.031
    Verified real prompt update flow executed.
    Verified example regeneration executed successfully.
    Verified dev unit grounding submission executed successfully.
    Verified budget cap message printed and run stopped early.
    Observed expected slight overshoot behavior (cap checked between items, not mid-item).

Checklist

  • Add explicit --all for repo-wide update mode
  • Add --dry-run repo drift report mode
  • Use flat dry-run estimate $0.50–$1.00 per drifted pair
  • Add repo-wide --budget cap and stop-on-cap behavior
  • Process changed items in dependency-aware order
  • Regenerate -ed markdown docs after successful updates
  • Regenerate examples + upload dev units to PDD Cloud grounding
  • Keep post-update doc/example/cloud steps best-effort (non-fatal)
  • Align prompt spec (modify_python.prompt) with implemented behavior
  • Add automated regression coverage for new behavior and follow-up fixes
  • [x ] Submit PR request for changed prompts in PDD_CAP

Fixes #727

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enhances pdd update repository-wide mode into an explicit, safer “drift healing” workflow by adding repo-wide flags, non-mutating dry-run reporting, budget-capped execution, dependency-aware ordering, and best-effort post-update maintenance (doc/example regeneration + grounding submission).

Changes:

  • Add --all, --dry-run, and --budget to pdd update, with repo-only enforcement and argument validation.
  • Implement repo-mode dry-run drift reporting, dependency-ordered processing of changed pairs, and a budget cap that stops between items.
  • After successful repo updates, regenerate included markdown docs, regenerate examples, and auto-submit refreshed dev units for grounding (best-effort).

Reviewed changes

Copilot reviewed 12 out of 13 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
pdd/commands/modify.py Adds CLI flags (--all, --dry-run, --budget) and enforces repo-only options/modes.
pdd/update_main.py Implements dry-run drift report, dependency ordering, budget cap, and post-update doc/example/grounding maintenance.
tests/test_update_command.py Adds CLI tests for --all, --dry-run, and --budget pass-through + validation.
tests/test_update_main.py Adds repo-mode tests for dry-run non-mutation, cost estimation, dependency ordering, budget stopping, doc regen, example regen, and auto-submit.
tests/test_sync_code_main.py Updates an assertion to match the new find_and_resolve_all_pairs(..., create_missing_prompts=...) call signature.
pdd/prompts/commands/pdd/commands/modify_python.prompt Updates the prompt-spec contract to match new CLI behavior and constraints.
pdd/prompts/agentic_e2e_fix_orchestrator_python.prompt Adds convergence-guard requirements to the orchestrator prompt.
project_dependencies.csv Adds a CSV snapshot of dependency metadata (appears to be auto-deps output).
pdd/core_dump_smoke.py Adds a standalone “add()” test plan/test module (appears unrelated to drift-healing).
pdd/prompts/core_dump_smoke_python.prompt Adds a generic “write add(a, b)” smoke prompt (appears unrelated).
pdd/prompts/core_dump_requirements_LLM.prompt Adds a large core-dump requirements spec prompt (appears unrelated).
context/simple_math_example.py Adds a standalone pytest-running example snippet (appears unrelated).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Contributor

@gltanaka gltanaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work on this PR! The core drift-healing feature is really well thought out — the --dry-run, --budget, --all, dependency ordering, and post-update doc/example regeneration are all solid additions that fill real gaps in the pdd update workflow. The test coverage for the new logic is thorough too. Nice job.

A few things to address before we merge:


Prompts location

modify_python.prompt and core_dump_smoke_python.prompt should be submitted to pdd_cap per .sync-config.yml and docs/ONBOARDING.md. (Also heads up — the path pdd/prompts/commands/pdd/commands/modify_python.prompt has a double-nesting that looks unintentional.)

project_dependencies.csv — needs a full scan

Love that this is being committed as a cache — but right now it only has 11 root-level config files (.env.example through Makefile). It looks like the pdd auto-deps run was interrupted after scanning just the top of the alphabetical git ls-files output. No Python source modules made it in. Could you re-run it scoped to at least the pdd/ source tree (~161 Python files) and commit the complete result?

context/simple_math_example.py

This looks like it was accidentally included — mind removing it?

Question: pdd/core_dump_smoke.py

Is this intended to be part of this PR? It's a z3 formal verification suite for an add() function that imports from a solution module — seems like it might belong with a separate feature. Totally fine if there's a reason it's here, just want to make sure.

Small cleanup

A couple things the linter would catch:

  • _read_text_byte_len() in update_main.py is defined but never called — remove or wire it in?
  • Counter is imported but unused
  • Minor: .mdx files are detected in the drift report but excluded from doc regeneration (.md only) — is that intentional?

Validation

The public CI unit tests pass, which is great. Two things that would increase confidence before merge:

  1. Could we trigger the private repo full test suite (Test Public PR workflow)? It hasn't been run for this PR yet, and it would catch any integration issues with the real prompts/symlink setup.
  2. The post-update side effects (doc regen, example regen, Cloud grounding upload) are well-tested via mocks, but haven't been verified end-to-end. The manual tests in the description cover --dry-run and --budget nicely — could you also confirm the doc/example/upload pipeline works against a real repo?

Also — pdd/prompts/core_dump_requirements_LLM.prompt looks useful, just want to confirm it's intentionally scoped to this PR.


Really appreciate the effort here — the feature design, the manual testing writeup, and the comprehensive checklist in the description all make this easy to review. Looking forward to getting this merged! 🙏

@vishalramvelu
Copy link
Copy Markdown
Contributor Author

Great work on this PR! The core drift-healing feature is really well thought out — the --dry-run, --budget, --all, dependency ordering, and post-update doc/example regeneration are all solid additions that fill real gaps in the pdd update workflow. The test coverage for the new logic is thorough too. Nice job.

A few things to address before we merge:

Prompts location

modify_python.prompt and core_dump_smoke_python.prompt should be submitted to pdd_cap per .sync-config.yml and docs/ONBOARDING.md. (Also heads up — the path pdd/prompts/commands/pdd/commands/modify_python.prompt has a double-nesting that looks unintentional.)

project_dependencies.csv — needs a full scan

Love that this is being committed as a cache — but right now it only has 11 root-level config files (.env.example through Makefile). It looks like the pdd auto-deps run was interrupted after scanning just the top of the alphabetical git ls-files output. No Python source modules made it in. Could you re-run it scoped to at least the pdd/ source tree (~161 Python files) and commit the complete result?

context/simple_math_example.py

This looks like it was accidentally included — mind removing it?

Question: pdd/core_dump_smoke.py

Is this intended to be part of this PR? It's a z3 formal verification suite for an add() function that imports from a solution module — seems like it might belong with a separate feature. Totally fine if there's a reason it's here, just want to make sure.

Small cleanup

A couple things the linter would catch:

  • _read_text_byte_len() in update_main.py is defined but never called — remove or wire it in?
  • Counter is imported but unused
  • Minor: .mdx files are detected in the drift report but excluded from doc regeneration (.md only) — is that intentional?

Validation

The public CI unit tests pass, which is great. Two things that would increase confidence before merge:

  1. Could we trigger the private repo full test suite (Test Public PR workflow)? It hasn't been run for this PR yet, and it would catch any integration issues with the real prompts/symlink setup.
  2. The post-update side effects (doc regen, example regen, Cloud grounding upload) are well-tested via mocks, but haven't been verified end-to-end. The manual tests in the description cover --dry-run and --budget nicely — could you also confirm the doc/example/upload pipeline works against a real repo?

Also — pdd/prompts/core_dump_requirements_LLM.prompt looks useful, just want to confirm it's intentionally scoped to this PR.

Really appreciate the effort here — the feature design, the manual testing writeup, and the comprehensive checklist in the description all make this easy to review. Looking forward to getting this merged! 🙏

Thanks for the review. I removed modify_python.prompt and core_dump_smoke_python.prompt. Also project_dependencies.csv was regenerated with pdd auto-deps --force-scan scoped to pdd/ (~161 Python files). context/simple_math_example.py, core_dump_requirements_LLM.prompt, and the core_dump_smoke* artifacts are removed from this PR. update_main.py cleanup: dropped unused _read_text_byte_len and Counter. .mdx is omitted from the drift “will regen” list on purpose since post-update regen is .md only. I’ll trigger the private Test Public PR workflow, and I’ve manually verified example + included-doc regen + cloud upload on a real repo.

Copy link
Copy Markdown
Contributor

@gltanaka gltanaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the follow-up commit — most of the review items are addressed and the core feature is solid. Two things I'd like fixed before we merge:

Budget cap doesn't account for post-update LLM costs

The --budget check runs at the top of the loop, but example regeneration and doc regeneration call context_generator_main (which makes LLM calls) after the budget check. A user passing --budget 5.00 could end up spending well past that with no warning. Could you either:

  • Check the budget again after the post-update steps, or
  • Skip example regen / doc regen when the budget is already close to exhausted

Silent except Exception: pass blocks

The post-update pipeline has 5 bare except Exception: pass blocks. If example regen, cloud upload, or doc regen fails, there's no trace — no log line, no warning. This makes debugging really hard, especially combined with the budget issue above (money spent with no record of what happened). Could you swap the bare pass for logging.warning(...) or console.print(...) so failures are at least visible?

Everything else looks good — the dependency ordering, dry-run report, and test coverage are all well done. Looking forward to getting this in!

@vishalramvelu
Copy link
Copy Markdown
Contributor Author

Thanks for the follow-up commit — most of the review items are addressed and the core feature is solid. Two things I'd like fixed before we merge:

Budget cap doesn't account for post-update LLM costs

The --budget check runs at the top of the loop, but example regeneration and doc regeneration call context_generator_main (which makes LLM calls) after the budget check. A user passing --budget 5.00 could end up spending well past that with no warning. Could you either:

  • Check the budget again after the post-update steps, or
  • Skip example regen / doc regen when the budget is already close to exhausted

Silent except Exception: pass blocks

The post-update pipeline has 5 bare except Exception: pass blocks. If example regen, cloud upload, or doc regen fails, there's no trace — no log line, no warning. This makes debugging really hard, especially combined with the budget issue above (money spent with no record of what happened). Could you swap the bare pass for logging.warning(...) or console.print(...) so failures are at least visible?

Everything else looks good — the dependency ordering, dry-run report, and test coverage are all well done. Looking forward to getting this in!

Thanks for the follow-up. I’ve addressed both items: --budget now applies to post-update work—before example regeneration, cloud submit, and each included-doc regeneration we check cumulative spend against the cap and skip those steps when the budget is already reached (with a short user-visible message when not in quiet mode), and we warn if cumulative cost ends up above the cap after post-update LLM steps. I also replaced the bare except: pass blocks in that pipeline with logging.warning(..., exc_info=True) plus print warnings when not quiet, so example/regen/submit failures are visible in logs and the console. Tests in tests/test_update_main.py were updated for the new budget behavior. Let me know if any other changes needed.

@gltanaka
Copy link
Copy Markdown
Contributor

gltanaka commented Mar 30, 2026

target 3/31

Made-with: Cursor

# Conflicts:
#	pdd/prompts/agentic_e2e_fix_orchestrator_python.prompt
#	pdd/update_main.py
#	tests/test_update_main.py
Copy link
Copy Markdown
Contributor

@gltanaka gltanaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review (Round 3)

CI is green, R1/R2 feedback on budget enforcement and error logging in the post-update pipeline (example regen, cloud submit, doc regen) is addressed — those except Exception: pass blocks now properly log warnings, and budget is checked before each post-update LLM call. Nice work.

Two blocking issues and a few smaller items before we can merge:


Blocking

1. Key collision in _order_changed_items_by_dependency silently drops items

update_main.py around line 897:

key_to_item[key] = (idx, item)

If two changed items produce the same _dependency_order_key (same basename::language), the dict overwrite silently discards the first item from the output. This is a data-loss bug — a changed file disappears from the update run with no warning.

Fix: Either detect the collision and bail out (return the original list, like the key is None case), or accumulate into a list-valued dict.

2. PRD sync LLM call not gated by --budget

The PRD sync block (around line 1465) calls run_agentic_task() after the main update loop with no budget check. A user passing --budget 5.00 could hit the cap during file updates, then still incur an unbounded LLM call for PRD sync. Since this PR is adding --budget as a contract with the user, all LLM calls in the repo-wide path should respect it.

Fix: Add a _budget_allows_post_update_llm(budget, total_repo_cost) guard before the run_agentic_task call in the PRD sync block.


Non-blocking (address if easy, otherwise fine for follow-up)

3. _estimate_dry_run_cost_range has 3 unused parameters

ctx, repo_obj, and simple are declared but never used. Remove or prefix with _.

4. _dependency_order_key swallows all exceptions silently

except Exception:
    return None

At minimum logger.debug(...) so failures can be diagnosed. Returning None causes the entire ordering to bail out to the original list, which is safe but invisible.

5. core_dump_requirements_LLM.prompt — scope question (carried over from R1)

This 110-line prompt spec for debug snapshots is unrelated to drift-healing. Still in the PR. Could you either remove it (separate PR) or confirm it's intentionally scoped here?

6. uv.lock — 3,637 new lines

The project's CLAUDE.md references pip install and requirements.txt. Is the project migrating to uv, or was this accidentally committed? If uv is intended, that's fine — just want to make sure we're not introducing a parallel lock file that drifts.


Test coverage notes (non-blocking, good areas for follow-up)

  • No test for dependency ordering with cycles (the production code has a fallback at line ~954 but it's untested)
  • No test for post-update error logging — the logger.warning() calls added per R2 feedback are correct but have no test asserting they fire on exception
  • No test for _budget_allows_post_update_llm in isolation (cheap to add, catches off-by-one)
  • No test for updated_doc_paths deduplication (two prompts referencing the same .md doc)
  • Budget cost assertions (e.g. result[1] == pytest.approx(1.3)) are fragile — they encode internal cost-accumulation ordering. Consider asserting behavioral invariants instead (e.g. "total cost > budget", "fewer than all items processed")

What looks good

  • Budget enforcement in the per-file post-update pipeline is thorough — re-checked before example regen, cloud submit, and each included-doc regen
  • Kahn's algorithm for dependency ordering is correct (modulo the collision issue above)
  • Dry-run is properly non-mutating: create_missing_prompts=not dry_run, no LLM calls, no prompt writes
  • _included_docs_for_drift_report filtering logic is correct
  • The merge from main (4/3) resolved cleanly — no artifacts
  • project_dependencies.csv now has 258 entries — looks like a complete scan

Once #1 and #2 are addressed, I'm happy to approve. Everything else can be follow-ups.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Heal Prompt Drift (code + tests)

3 participants