Conversation
There was a problem hiding this comment.
Pull request overview
This PR enhances pdd update repository-wide mode into an explicit, safer “drift healing” workflow by adding repo-wide flags, non-mutating dry-run reporting, budget-capped execution, dependency-aware ordering, and best-effort post-update maintenance (doc/example regeneration + grounding submission).
Changes:
- Add
--all,--dry-run, and--budgettopdd update, with repo-only enforcement and argument validation. - Implement repo-mode dry-run drift reporting, dependency-ordered processing of changed pairs, and a budget cap that stops between items.
- After successful repo updates, regenerate included markdown docs, regenerate examples, and auto-submit refreshed dev units for grounding (best-effort).
Reviewed changes
Copilot reviewed 12 out of 13 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
pdd/commands/modify.py |
Adds CLI flags (--all, --dry-run, --budget) and enforces repo-only options/modes. |
pdd/update_main.py |
Implements dry-run drift report, dependency ordering, budget cap, and post-update doc/example/grounding maintenance. |
tests/test_update_command.py |
Adds CLI tests for --all, --dry-run, and --budget pass-through + validation. |
tests/test_update_main.py |
Adds repo-mode tests for dry-run non-mutation, cost estimation, dependency ordering, budget stopping, doc regen, example regen, and auto-submit. |
tests/test_sync_code_main.py |
Updates an assertion to match the new find_and_resolve_all_pairs(..., create_missing_prompts=...) call signature. |
pdd/prompts/commands/pdd/commands/modify_python.prompt |
Updates the prompt-spec contract to match new CLI behavior and constraints. |
pdd/prompts/agentic_e2e_fix_orchestrator_python.prompt |
Adds convergence-guard requirements to the orchestrator prompt. |
project_dependencies.csv |
Adds a CSV snapshot of dependency metadata (appears to be auto-deps output). |
pdd/core_dump_smoke.py |
Adds a standalone “add()” test plan/test module (appears unrelated to drift-healing). |
pdd/prompts/core_dump_smoke_python.prompt |
Adds a generic “write add(a, b)” smoke prompt (appears unrelated). |
pdd/prompts/core_dump_requirements_LLM.prompt |
Adds a large core-dump requirements spec prompt (appears unrelated). |
context/simple_math_example.py |
Adds a standalone pytest-running example snippet (appears unrelated). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
gltanaka
left a comment
There was a problem hiding this comment.
Great work on this PR! The core drift-healing feature is really well thought out — the --dry-run, --budget, --all, dependency ordering, and post-update doc/example regeneration are all solid additions that fill real gaps in the pdd update workflow. The test coverage for the new logic is thorough too. Nice job.
A few things to address before we merge:
Prompts location
modify_python.prompt and core_dump_smoke_python.prompt should be submitted to pdd_cap per .sync-config.yml and docs/ONBOARDING.md. (Also heads up — the path pdd/prompts/commands/pdd/commands/modify_python.prompt has a double-nesting that looks unintentional.)
project_dependencies.csv — needs a full scan
Love that this is being committed as a cache — but right now it only has 11 root-level config files (.env.example through Makefile). It looks like the pdd auto-deps run was interrupted after scanning just the top of the alphabetical git ls-files output. No Python source modules made it in. Could you re-run it scoped to at least the pdd/ source tree (~161 Python files) and commit the complete result?
context/simple_math_example.py
This looks like it was accidentally included — mind removing it?
Question: pdd/core_dump_smoke.py
Is this intended to be part of this PR? It's a z3 formal verification suite for an add() function that imports from a solution module — seems like it might belong with a separate feature. Totally fine if there's a reason it's here, just want to make sure.
Small cleanup
A couple things the linter would catch:
_read_text_byte_len()inupdate_main.pyis defined but never called — remove or wire it in?Counteris imported but unused- Minor:
.mdxfiles are detected in the drift report but excluded from doc regeneration (.mdonly) — is that intentional?
Validation
The public CI unit tests pass, which is great. Two things that would increase confidence before merge:
- Could we trigger the private repo full test suite (
Test Public PRworkflow)? It hasn't been run for this PR yet, and it would catch any integration issues with the real prompts/symlink setup. - The post-update side effects (doc regen, example regen, Cloud grounding upload) are well-tested via mocks, but haven't been verified end-to-end. The manual tests in the description cover
--dry-runand--budgetnicely — could you also confirm the doc/example/upload pipeline works against a real repo?
Also — pdd/prompts/core_dump_requirements_LLM.prompt looks useful, just want to confirm it's intentionally scoped to this PR.
Really appreciate the effort here — the feature design, the manual testing writeup, and the comprehensive checklist in the description all make this easy to review. Looking forward to getting this merged! 🙏
Thanks for the review. I removed modify_python.prompt and core_dump_smoke_python.prompt. Also project_dependencies.csv was regenerated with pdd auto-deps --force-scan scoped to pdd/ (~161 Python files). context/simple_math_example.py, core_dump_requirements_LLM.prompt, and the core_dump_smoke* artifacts are removed from this PR. update_main.py cleanup: dropped unused _read_text_byte_len and Counter. .mdx is omitted from the drift “will regen” list on purpose since post-update regen is .md only. I’ll trigger the private Test Public PR workflow, and I’ve manually verified example + included-doc regen + cloud upload on a real repo. |
gltanaka
left a comment
There was a problem hiding this comment.
Thanks for the follow-up commit — most of the review items are addressed and the core feature is solid. Two things I'd like fixed before we merge:
Budget cap doesn't account for post-update LLM costs
The --budget check runs at the top of the loop, but example regeneration and doc regeneration call context_generator_main (which makes LLM calls) after the budget check. A user passing --budget 5.00 could end up spending well past that with no warning. Could you either:
- Check the budget again after the post-update steps, or
- Skip example regen / doc regen when the budget is already close to exhausted
Silent except Exception: pass blocks
The post-update pipeline has 5 bare except Exception: pass blocks. If example regen, cloud upload, or doc regen fails, there's no trace — no log line, no warning. This makes debugging really hard, especially combined with the budget issue above (money spent with no record of what happened). Could you swap the bare pass for logging.warning(...) or console.print(...) so failures are at least visible?
Everything else looks good — the dependency ordering, dry-run report, and test coverage are all well done. Looking forward to getting this in!
Thanks for the follow-up. I’ve addressed both items: --budget now applies to post-update work—before example regeneration, cloud submit, and each included-doc regeneration we check cumulative spend against the cap and skip those steps when the budget is already reached (with a short user-visible message when not in quiet mode), and we warn if cumulative cost ends up above the cap after post-update LLM steps. I also replaced the bare except: pass blocks in that pipeline with logging.warning(..., exc_info=True) plus print warnings when not quiet, so example/regen/submit failures are visible in logs and the console. Tests in tests/test_update_main.py were updated for the new budget behavior. Let me know if any other changes needed. |
|
target 3/31 |
Made-with: Cursor # Conflicts: # pdd/prompts/agentic_e2e_fix_orchestrator_python.prompt # pdd/update_main.py # tests/test_update_main.py
gltanaka
left a comment
There was a problem hiding this comment.
Re-review (Round 3)
CI is green, R1/R2 feedback on budget enforcement and error logging in the post-update pipeline (example regen, cloud submit, doc regen) is addressed — those except Exception: pass blocks now properly log warnings, and budget is checked before each post-update LLM call. Nice work.
Two blocking issues and a few smaller items before we can merge:
Blocking
1. Key collision in _order_changed_items_by_dependency silently drops items
update_main.py around line 897:
key_to_item[key] = (idx, item)If two changed items produce the same _dependency_order_key (same basename::language), the dict overwrite silently discards the first item from the output. This is a data-loss bug — a changed file disappears from the update run with no warning.
Fix: Either detect the collision and bail out (return the original list, like the key is None case), or accumulate into a list-valued dict.
2. PRD sync LLM call not gated by --budget
The PRD sync block (around line 1465) calls run_agentic_task() after the main update loop with no budget check. A user passing --budget 5.00 could hit the cap during file updates, then still incur an unbounded LLM call for PRD sync. Since this PR is adding --budget as a contract with the user, all LLM calls in the repo-wide path should respect it.
Fix: Add a _budget_allows_post_update_llm(budget, total_repo_cost) guard before the run_agentic_task call in the PRD sync block.
Non-blocking (address if easy, otherwise fine for follow-up)
3. _estimate_dry_run_cost_range has 3 unused parameters
ctx, repo_obj, and simple are declared but never used. Remove or prefix with _.
4. _dependency_order_key swallows all exceptions silently
except Exception:
return NoneAt minimum logger.debug(...) so failures can be diagnosed. Returning None causes the entire ordering to bail out to the original list, which is safe but invisible.
5. core_dump_requirements_LLM.prompt — scope question (carried over from R1)
This 110-line prompt spec for debug snapshots is unrelated to drift-healing. Still in the PR. Could you either remove it (separate PR) or confirm it's intentionally scoped here?
6. uv.lock — 3,637 new lines
The project's CLAUDE.md references pip install and requirements.txt. Is the project migrating to uv, or was this accidentally committed? If uv is intended, that's fine — just want to make sure we're not introducing a parallel lock file that drifts.
Test coverage notes (non-blocking, good areas for follow-up)
- No test for dependency ordering with cycles (the production code has a fallback at line ~954 but it's untested)
- No test for post-update error logging — the
logger.warning()calls added per R2 feedback are correct but have no test asserting they fire on exception - No test for
_budget_allows_post_update_llmin isolation (cheap to add, catches off-by-one) - No test for
updated_doc_pathsdeduplication (two prompts referencing the same.mddoc) - Budget cost assertions (e.g.
result[1] == pytest.approx(1.3)) are fragile — they encode internal cost-accumulation ordering. Consider asserting behavioral invariants instead (e.g. "total cost > budget", "fewer than all items processed")
What looks good
- Budget enforcement in the per-file post-update pipeline is thorough — re-checked before example regen, cloud submit, and each included-doc regen
- Kahn's algorithm for dependency ordering is correct (modulo the collision issue above)
- Dry-run is properly non-mutating:
create_missing_prompts=not dry_run, no LLM calls, no prompt writes _included_docs_for_drift_reportfiltering logic is correct- The merge from main (4/3) resolved cleanly — no artifacts
project_dependencies.csvnow has 258 entries — looks like a complete scan
Once #1 and #2 are addressed, I'm happy to approve. Everything else can be follow-ups.
Summary
regenerate included markdown docs referenced by tags, regenerate module examples, upload refreshed dev units to PDD Cloud grounding.
dry-run is non-mutating during scan/pair resolution, example/upload pipeline uses current processed pair prompt/code paths as source of truth to avoid path/context mismatch issues.
Tests
--all behavior and validation, --dry-run pass-through and repo-only enforcement, --budget pass-through, positivity validation, and repo-only enforcement.
flat dry-run cost estimate, included-doc aggregation in drift report, budget-cap stopping behavior, dependency ordering under budget, included markdown doc regeneration after successful updates, example regeneration + _auto_submit_example submission, dry-run skipping side-effecting work, dry-run scan not creating missing prompts,
example submit path using current pair prompt/code paths.
Manual Tests
Ran uv run pdd update --all --dry-run
Verified repository drift report output (counts, cost range, drifted mappings, included docs).
Verified no LLM/prompt write/arch/PRD update path execution.
Confirmed no “Created missing prompt file” spam during dry-run scan.
Ran uv run pdd --force update --all --simple --budget 0.031
Verified real prompt update flow executed.
Verified example regeneration executed successfully.
Verified dev unit grounding submission executed successfully.
Verified budget cap message printed and run stopped early.
Observed expected slight overshoot behavior (cap checked between items, not mid-item).
Checklist
Fixes #727