feat: Pr 4159 fixes #4417

atchernych · 2025-11-17T23:44:07Z

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

closes GitHub issue: #xxx

Summary by CodeRabbit

Release Notes

Documentation
- Reorganized TensorRT-LLM engine configuration files to a new centralized location within the examples directory for improved accessibility and consistency.
- Updated all deployment guides, launch scripts, and Kubernetes examples to reference the new configuration paths.
- Added comprehensive documentation for engine configurations and multi-node setups with clearer setup instructions.
Chores
- Restructured recipe deployment workflow to emphasize Kubernetes-centric deployments with updated prerequisites and step-by-step guidance.

Remove incomplete model directories and non-Kubernetes configurations to streamline the recipes directory for production Kubernetes deployments. Changes: - Remove 5 incomplete model directories (deepseek-r1-distill-llama-8b, gemma3, llama4, qwen2-vl-7b-instruct, qwen3) that lack proper Kubernetes deployment manifests - Delete run.sh script (non-Kubernetes automation tool) - Remove standalone engine config YAMLs from deepseek-r1/trtllm that were not wrapped in Kubernetes manifests - Document incomplete gpt-oss-120b disagg recipe with README explaining missing components README improvements: - Restructure Available Recipes table with 'Deployment' and 'Benchmark Recipe' columns to clarify that perf.yaml files are tools for users to run benchmarks, not published performance results - Add comprehensive quick start guide with prerequisites - Link to correct Kubernetes deployment guides - Add troubleshooting section - Remove extraneous links (docs.nvidia.com, license section) Result: 4 models with 10 complete deployment recipes (7 with benchmark scripts), focused exclusively on Kubernetes deployments. Signed-off-by: Ben Hamm <ben.hamm@gmail.com>

Address feedback to make the README less AI-generated looking by removing decorative emojis from section headings while keeping status indicators (✅ ❌) in tables and content. Signed-off-by: Ben Hamm <ben.hamm@gmail.com>

Co-authored-by: Anant Sharma <anants@nvidia.com> Signed-off-by: Tanmay Verma <tanmay2592@gmail.com>

Signed-off-by: Anna Tchernych <atchernych@nvidia.com>

coderabbitai · 2025-11-17T23:51:24Z

Walkthrough

This pull request consolidates TensorRT-LLM engine configuration file paths from scattered recipes/*/trtllm/ directories to a centralized location under examples/backends/trtllm/engine_configs/. Updates path references across deployment scripts, launch scripts, YAML configs, and documentation. Deletes the legacy recipes/run.sh script and rebrands recipes documentation.

Changes

Cohort / File(s)	Summary
Shell Launch Scripts `examples/backends/trtllm/launch/{agg,agg_metrics,agg_router,disagg,disagg_router,disagg_same_gpu,epd_disagg,gpt_oss_disagg}.sh`	Updated default values for exported environment variables (`AGG_ENGINE_ARGS`, `PREFILL_ENGINE_ARGS`, `DECODE_ENGINE_ARGS`) to point from `recipes//trtllm/` to `examples/backends/trtllm/engine_configs//`.
Deployment YAML Configs `examples/backends/trtllm/deploy/{agg,agg-with-config,agg_router,disagg,disagg_planner,disagg_router}.yaml`	Updated `--extra-engine-args` paths from `recipes/qwen3/trtllm/` or `recipes/gpt-oss-120b/trtllm/disagg/` to `examples/backends/trtllm/engine_configs/qwen3/` or `examples/backends/trtllm/engine_configs/gpt-oss-120b/`.
Multinode Launch Scripts `examples/basics/multinode/trtllm/{srun_aggregated,srun_disaggregated}.sh`	Updated default `ENGINE_CONFIG`, `PREFILL_ENGINE_CONFIG`, and `DECODE_ENGINE_CONFIG` paths from `/mnt/recipes/deepseek-r1/trtllm/` to `/mnt/examples/backends/trtllm/engine_configs/deepseek-r1/`.
Router Benchmark Script `benchmarks/router/run_engines.sh`	Renamed variable `RECIPE_PATH` to `ENGINE_CONFIG_PATH` and updated all YAML config path lookups (agg.yaml, decode.yaml, prefill.yaml) from `recipes/deepseek-r1-distill-llama-8b/trtllm` to `examples/backends/trtllm/engine_configs/deepseek-r1-distill-llama-8b`.
New Engine Config Files `examples/backends/trtllm/engine_configs/{gpt-oss-120b/{decode,prefill}.yaml, deepseek-r1/{agg/wide_ep/wide_ep_agg,disagg/wide_ep/{wide_ep_decode,wide_ep_prefill}}.yaml}`	Added or updated configuration files with optimizer settings, MOE backend configs, CUDA graph settings, and load balancer path references updated to new engine_configs locations.
TensorRT-LLM Backend Documentation `docs/backends/trtllm/{README,gemma3_sliding_window_attention,gpt-oss,llama4_plus_eagle,multimodal_support}.md`	Updated engine configuration file paths in examples and instructions from `recipes//trtllm/` to `examples/backends/trtllm/engine_configs//`.
Multinode Documentation `docs/backends/trtllm/multinode/{multinode-examples,multinode-multimodal-example}.md`	Updated engine config paths and mount references from `/recipes/` to `/examples/backends/trtllm/engine_configs/`. Added introductory notes and corrected heading numbering.
General Documentation `docs/kubernetes/README.md, examples/kubernetes/deploy/agg.yaml`	Updated `--extra-engine-args` paths from `recipes/deepseek-r1-distill-llama-8b/` to `examples/backends/trtllm/engine_configs/deepseek-r1-distill-llama-8b/`.
New Configuration Documentation `examples/backends/trtllm/engine_configs/README.md, examples/basics/multinode/trtllm/README.md`	Added new README files documenting engine configuration structure, usage instructions, and multinode TRTLLM example setup.
Recipes Documentation `recipes/README.md`	Rebranded from "Dynamo Model Serving Recipes" to "Dynamo Production-Ready Recipes". Restructured content to emphasize Kubernetes deployment, added recipe quality standards, and consolidated prerequisites and quick-start guidance.
Recipes Registry `recipes/gpt-oss-120b/trtllm/disagg/README.md`	Added new README documenting GPT-OSS-120B disaggregated mode recipe status, listing engine configuration files and identifying missing deployment components.
Deleted `recipes/run.sh`	Removed legacy Bash orchestration script that handled recipe deployment, model downloading, and benchmarking workflows via kubectl. Functionality consolidated into Kubernetes-centric documentation and manifests.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Areas requiring close attention:

Path consistency verification: Confirm all recipes/*/trtllm/ references have been replaced with examples/backends/trtllm/engine_configs/*/ across shell scripts and YAML files; spot-check a few launch scripts to ensure path construction logic is correct.
Engine config file references in YAML: Verify MOE load balancer and internal path references within engine config files (particularly wide_ep_agg.yaml, wide_ep_prefill.yaml, wide_ep_decode.yaml) point to correct relocated paths.
Documentation completeness: Ensure all documentation updates reflect the new paths accurately and that no stale references to old recipe paths remain in examples or guides.
Deployment YAML syntax: Validate YAML config files for syntax correctness and that indentation/structure remained intact during path updates.
Script deletion impact: Confirm recipes/run.sh removal does not break any external scripts or CI/CD pipelines that may have depended on it.

Poem

🐰 Paths once scattered through the recipes of old,
Now consolidated in engine_configs, bright and bold!
From trtllm depths to examples' gleaming shore,
The configs hop together—organized evermore! ✨
One home for all, no more shall they roam,
In backends/trtllm, they've found their perfect home!

Pre-merge checks

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The pull request description is entirely a placeholder template with no actual content filled in; all sections remain empty with only template comments.	Fill in the description template with actual details about the changes: Overview of the path migration, specific files affected, and related issue #4159.
Title check	❓ Inconclusive	The title 'feat: Pr 4159 fixes' is vague and generic, referring only to a PR number without clearly conveying what was actually changed in the codebase.	Use a more descriptive title that clearly summarizes the main change, such as 'refactor: migrate TensorRT-LLM config paths from recipes to engine_configs'.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

Provide your own instructions using the high_level_summary_instructions setting.
Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:

📝 Description — Summarize the main change in 50–60 words, explaining why this PR is needed, why this solution was chosen, and what was done.

📓 References — List relevant issues, discussions, documentation, or related PRs.

📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.

📊 Contributor Summary — Include a Markdown table showing contributions:
| Contributor | Lines Added | Lines Removed | Files Changed |

✔️ Additional Notes — Add any extra reviewer context.
Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 6

🧹 Nitpick comments (3)

recipes/gpt-oss-120b/trtllm/disagg/README.md (1)
1-25: Documentation clarity and warnings.

The warning on line 3 effectively communicates the incomplete status, and the structure is clear. However, the document should explicitly reference where the centralized configs are located (e.g., examples/backends/trtllm/engine_configs/gpt-oss-120b/) rather than implying they exist in the recipes directory, to prevent confusion during the repo's transition to the new structure.

Consider adding a clarification section:
 This directory contains TensorRT-LLM engine configurations for disaggregated serving:
-- `decode.yaml` - Decode worker engine configuration
-- `prefill.yaml` - Prefill worker engine configuration
+- `decode.yaml` - Decode worker engine configuration (see also: `examples/backends/trtllm/engine_configs/gpt-oss-120b/`)
+- `prefill.yaml` - Prefill worker engine configuration (see also: `examples/backends/trtllm/engine_configs/gpt-oss-120b/`)
Or, if the files no longer exist here:
 This directory contains TensorRT-LLM engine configurations for disaggregated serving:
-- `decode.yaml` - Decode worker engine configuration
-- `prefill.yaml` - Prefill worker engine configuration
+Engine configuration files are now centralized in `examples/backends/trtllm/engine_configs/gpt-oss-120b/`:
+- `decode.yaml` - Decode worker engine configuration
+- `prefill.yaml` - Prefill worker engine configuration
recipes/README.md (2)
31-31: Specify language for fenced code block.

Code blocks should declare their language for syntax highlighting. Since this shows a directory structure, use ```tree or ```text.
-```
+```tree
47-47: Replace bold emphasis with proper markdown headings.

Lines using bold text for section titles (e.g., **1. Dynamo Platform Installed**, **Step 1: Download Model**) should be converted to markdown headings (###) for proper document structure and accessibility.

Examples:
-**1. Dynamo Platform Installed**
+### 1. Dynamo Platform Installed
-**Step 1: Download Model**
+### Step 1: Download Model
Apply similar changes to all 9 instances flagged by the linter.

Also applies to: 54-54, 61-61, 75-75, 90-90, 103-103, 118-118, 137-137, 150-150

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 09bb1c6 and 9410f19.

📒 Files selected for processing (35)

benchmarks/router/run_engines.sh (2 hunks)
docs/backends/trtllm/README.md (1 hunks)
docs/backends/trtllm/gemma3_sliding_window_attention.md (4 hunks)
docs/backends/trtllm/gpt-oss.md (3 hunks)
docs/backends/trtllm/llama4_plus_eagle.md (2 hunks)
docs/backends/trtllm/multimodal_support.md (2 hunks)
docs/backends/trtllm/multinode/multinode-examples.md (4 hunks)
docs/backends/trtllm/multinode/multinode-multimodal-example.md (5 hunks)
docs/kubernetes/README.md (1 hunks)
examples/backends/trtllm/deploy/agg-with-config.yaml (1 hunks)
examples/backends/trtllm/deploy/agg.yaml (1 hunks)
examples/backends/trtllm/deploy/agg_router.yaml (1 hunks)
examples/backends/trtllm/deploy/disagg.yaml (2 hunks)
examples/backends/trtllm/deploy/disagg_planner.yaml (2 hunks)
examples/backends/trtllm/deploy/disagg_router.yaml (2 hunks)
examples/backends/trtllm/engine_configs/README.md (1 hunks)
examples/backends/trtllm/engine_configs/deepseek-r1/agg/wide_ep/wide_ep_agg.yaml (1 hunks)
examples/backends/trtllm/engine_configs/deepseek-r1/disagg/wide_ep/wide_ep_decode.yaml (1 hunks)
examples/backends/trtllm/engine_configs/deepseek-r1/disagg/wide_ep/wide_ep_prefill.yaml (1 hunks)
examples/backends/trtllm/engine_configs/gpt-oss-120b/decode.yaml (1 hunks)
examples/backends/trtllm/engine_configs/gpt-oss-120b/prefill.yaml (1 hunks)
examples/backends/trtllm/launch/agg.sh (1 hunks)
examples/backends/trtllm/launch/agg_metrics.sh (1 hunks)
examples/backends/trtllm/launch/agg_router.sh (1 hunks)
examples/backends/trtllm/launch/disagg.sh (1 hunks)
examples/backends/trtllm/launch/disagg_router.sh (1 hunks)
examples/backends/trtllm/launch/disagg_same_gpu.sh (1 hunks)
examples/backends/trtllm/launch/epd_disagg.sh (1 hunks)
examples/backends/trtllm/launch/gpt_oss_disagg.sh (1 hunks)
examples/basics/multinode/trtllm/README.md (1 hunks)
examples/basics/multinode/trtllm/srun_aggregated.sh (1 hunks)
examples/basics/multinode/trtllm/srun_disaggregated.sh (1 hunks)
recipes/README.md (1 hunks)
recipes/gpt-oss-120b/trtllm/disagg/README.md (1 hunks)
recipes/run.sh (0 hunks)

💤 Files with no reviewable changes (1)

recipes/run.sh

🧰 Additional context used

🧠 Learnings (7)

📓 Common learnings

Learnt from: biswapanda
Repo: ai-dynamo/dynamo PR: 3858
File: recipes/deepseek-r1/model-cache/model-download.yaml:18-32
Timestamp: 2025-10-24T04:21:08.751Z
Learning: In the recipes directory structure, model-specific recipes (e.g., recipes/deepseek-r1/, recipes/llama-3-70b/) contain hardcoded model names and revisions in their Kubernetes manifests (like model-download.yaml). Each recipe directory is deployment-specific and self-contained, so hardcoding model-specific values is the intended design pattern.

📚 Learning: 2025-10-24T04:21:08.751Z

Learnt from: biswapanda
Repo: ai-dynamo/dynamo PR: 3858
File: recipes/deepseek-r1/model-cache/model-download.yaml:18-32
Timestamp: 2025-10-24T04:21:08.751Z
Learning: In the recipes directory structure, model-specific recipes (e.g., recipes/deepseek-r1/, recipes/llama-3-70b/) contain hardcoded model names and revisions in their Kubernetes manifests (like model-download.yaml). Each recipe directory is deployment-specific and self-contained, so hardcoding model-specific values is the intended design pattern.

Applied to files:

recipes/gpt-oss-120b/trtllm/disagg/README.md
examples/backends/trtllm/deploy/agg-with-config.yaml
benchmarks/router/run_engines.sh
docs/backends/trtllm/multinode/multinode-multimodal-example.md
examples/backends/trtllm/deploy/disagg.yaml
examples/backends/trtllm/deploy/disagg_planner.yaml
examples/backends/trtllm/deploy/disagg_router.yaml
recipes/README.md
examples/backends/trtllm/deploy/agg.yaml

📚 Learning: 2025-07-02T13:20:28.800Z

Learnt from: fsaady
Repo: ai-dynamo/dynamo PR: 1730
File: examples/sglang/slurm_jobs/job_script_template.j2:59-59
Timestamp: 2025-07-02T13:20:28.800Z
Learning: In the SLURM job script template at examples/sglang/slurm_jobs/job_script_template.j2, the `--total_nodes` parameter represents the total nodes per worker type (prefill or decode), not the total nodes in the entire cluster. Each worker type needs to know its own group size for distributed coordination.

Applied to files:

examples/basics/multinode/trtllm/README.md
docs/backends/trtllm/multinode/multinode-examples.md
docs/backends/trtllm/llama4_plus_eagle.md
docs/backends/trtllm/multinode/multinode-multimodal-example.md
examples/basics/multinode/trtllm/srun_disaggregated.sh

📚 Learning: 2025-07-03T10:14:30.570Z

Learnt from: fsaady
Repo: ai-dynamo/dynamo PR: 1730
File: examples/sglang/slurm_jobs/scripts/worker_setup.py:230-244
Timestamp: 2025-07-03T10:14:30.570Z
Learning: In examples/sglang/slurm_jobs/scripts/worker_setup.py, background processes (like nats-server, etcd) are intentionally left running even if later processes fail. This design choice allows users to manually connect to nodes and debug issues without having to restart the entire SLURM job from scratch, providing operational flexibility for troubleshooting in cluster environments.

Applied to files:

examples/basics/multinode/trtllm/README.md
docs/backends/trtllm/multinode/multinode-examples.md
docs/backends/trtllm/multinode/multinode-multimodal-example.md

📚 Learning: 2025-07-03T09:44:41.470Z

Learnt from: fsaady
Repo: ai-dynamo/dynamo PR: 1730
File: examples/sglang/slurm_jobs/scripts/worker_setup.py:113-116
Timestamp: 2025-07-03T09:44:41.470Z
Learning: In examples/sglang/slurm_jobs/scripts/worker_setup.py, logging the full configuration file content is acceptable because the config file is public, contains only placeholder replacements (no sensitive data), and provides debugging benefits for users who may want to create configurations based on the logged output.

Applied to files:

examples/basics/multinode/trtllm/README.md

📚 Learning: 2025-06-05T01:46:15.509Z

Learnt from: GuanLuo
Repo: ai-dynamo/dynamo PR: 1371
File: examples/llm/benchmarks/vllm_multinode_setup.sh:18-25
Timestamp: 2025-06-05T01:46:15.509Z
Learning: In multi-node setups with head/worker architecture, the head node typically doesn't need environment variables pointing to its own services (like NATS_SERVER, ETCD_ENDPOINTS) because local processes can access them via localhost. Only worker nodes need these environment variables to connect to the head node's external IP address.

Applied to files:

docs/backends/trtllm/multinode/multinode-examples.md
examples/basics/multinode/trtllm/srun_aggregated.sh
examples/basics/multinode/trtllm/srun_disaggregated.sh

📚 Learning: 2025-07-31T11:26:48.422Z

Learnt from: KrishnanPrash
Repo: ai-dynamo/dynamo PR: 2217
File: components/backends/trtllm/engine_configs/deepseek_r1/wide_ep/wide_ep_prefill.yaml:18-0
Timestamp: 2025-07-31T11:26:48.422Z
Learning: TRTLLM LLM-API expects all caps for backend field names in configuration files. When migrating TRTLLM configurations, backend values like "WideEP" should be changed to "WIDEEP" to comply with the API requirements.

Applied to files:

examples/backends/trtllm/engine_configs/deepseek-r1/disagg/wide_ep/wide_ep_decode.yaml
examples/backends/trtllm/engine_configs/deepseek-r1/disagg/wide_ep/wide_ep_prefill.yaml
docs/backends/trtllm/multinode/multinode-multimodal-example.md
examples/backends/trtllm/engine_configs/README.md
examples/backends/trtllm/deploy/disagg.yaml
docs/backends/trtllm/README.md
examples/backends/trtllm/engine_configs/deepseek-r1/agg/wide_ep/wide_ep_agg.yaml
examples/backends/trtllm/launch/epd_disagg.sh

🪛 markdownlint-cli2 (0.18.1)

recipes/README.md

13-13: Table column count
Expected: 8; Actual: 7; Too few cells, row will be missing data

(MD056, table-column-count)

14-14: Table column count
Expected: 8; Actual: 7; Too few cells, row will be missing data

(MD056, table-column-count)

15-15: Table column count
Expected: 8; Actual: 7; Too few cells, row will be missing data

(MD056, table-column-count)

16-16: Table column count
Expected: 8; Actual: 7; Too few cells, row will be missing data

(MD056, table-column-count)

17-17: Table column count
Expected: 8; Actual: 7; Too few cells, row will be missing data

(MD056, table-column-count)

18-18: Table column count
Expected: 8; Actual: 7; Too few cells, row will be missing data

(MD056, table-column-count)

19-19: Table column count
Expected: 8; Actual: 7; Too few cells, row will be missing data

(MD056, table-column-count)

20-20: Table column count
Expected: 8; Actual: 7; Too few cells, row will be missing data

(MD056, table-column-count)

21-21: Table column count
Expected: 8; Actual: 7; Too few cells, row will be missing data

(MD056, table-column-count)

31-31: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

47-47: Emphasis used instead of a heading