-
Notifications
You must be signed in to change notification settings - Fork 712
feat: Pr 4159 fixes #4417
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: Pr 4159 fixes #4417
Conversation
Remove incomplete model directories and non-Kubernetes configurations to streamline the recipes directory for production Kubernetes deployments. Changes: - Remove 5 incomplete model directories (deepseek-r1-distill-llama-8b, gemma3, llama4, qwen2-vl-7b-instruct, qwen3) that lack proper Kubernetes deployment manifests - Delete run.sh script (non-Kubernetes automation tool) - Remove standalone engine config YAMLs from deepseek-r1/trtllm that were not wrapped in Kubernetes manifests - Document incomplete gpt-oss-120b disagg recipe with README explaining missing components README improvements: - Restructure Available Recipes table with 'Deployment' and 'Benchmark Recipe' columns to clarify that perf.yaml files are tools for users to run benchmarks, not published performance results - Add comprehensive quick start guide with prerequisites - Link to correct Kubernetes deployment guides - Add troubleshooting section - Remove extraneous links (docs.nvidia.com, license section) Result: 4 models with 10 complete deployment recipes (7 with benchmark scripts), focused exclusively on Kubernetes deployments. Signed-off-by: Ben Hamm <ben.hamm@gmail.com>
Address feedback to make the README less AI-generated looking by removing decorative emojis from section headings while keeping status indicators (✅ ❌) in tables and content. Signed-off-by: Ben Hamm <ben.hamm@gmail.com>
Co-authored-by: Anant Sharma <anants@nvidia.com> Signed-off-by: Tanmay Verma <tanmay2592@gmail.com>
Signed-off-by: Anna Tchernych <atchernych@nvidia.com>
WalkthroughThis pull request consolidates TensorRT-LLM engine configuration file paths from scattered Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Areas requiring close attention:
Poem
Pre-merge checks❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
Tip 📝 Customizable high-level summaries are now available in beta!You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.
Example instruction:
Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
🧹 Nitpick comments (3)
recipes/gpt-oss-120b/trtllm/disagg/README.md (1)
1-25: Documentation clarity and warnings.The warning on line 3 effectively communicates the incomplete status, and the structure is clear. However, the document should explicitly reference where the centralized configs are located (e.g.,
examples/backends/trtllm/engine_configs/gpt-oss-120b/) rather than implying they exist in the recipes directory, to prevent confusion during the repo's transition to the new structure.Consider adding a clarification section:
This directory contains TensorRT-LLM engine configurations for disaggregated serving: -- `decode.yaml` - Decode worker engine configuration -- `prefill.yaml` - Prefill worker engine configuration +- `decode.yaml` - Decode worker engine configuration (see also: `examples/backends/trtllm/engine_configs/gpt-oss-120b/`) +- `prefill.yaml` - Prefill worker engine configuration (see also: `examples/backends/trtllm/engine_configs/gpt-oss-120b/`)Or, if the files no longer exist here:
This directory contains TensorRT-LLM engine configurations for disaggregated serving: -- `decode.yaml` - Decode worker engine configuration -- `prefill.yaml` - Prefill worker engine configuration +Engine configuration files are now centralized in `examples/backends/trtllm/engine_configs/gpt-oss-120b/`: +- `decode.yaml` - Decode worker engine configuration +- `prefill.yaml` - Prefill worker engine configurationrecipes/README.md (2)
31-31: Specify language for fenced code block.Code blocks should declare their language for syntax highlighting. Since this shows a directory structure, use
```treeor```text.-``` +```tree
47-47: Replace bold emphasis with proper markdown headings.Lines using bold text for section titles (e.g.,
**1. Dynamo Platform Installed**,**Step 1: Download Model**) should be converted to markdown headings (###) for proper document structure and accessibility.Examples:
-**1. Dynamo Platform Installed** +### 1. Dynamo Platform Installed-**Step 1: Download Model** +### Step 1: Download ModelApply similar changes to all 9 instances flagged by the linter.
Also applies to: 54-54, 61-61, 75-75, 90-90, 103-103, 118-118, 137-137, 150-150
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (35)
benchmarks/router/run_engines.sh(2 hunks)docs/backends/trtllm/README.md(1 hunks)docs/backends/trtllm/gemma3_sliding_window_attention.md(4 hunks)docs/backends/trtllm/gpt-oss.md(3 hunks)docs/backends/trtllm/llama4_plus_eagle.md(2 hunks)docs/backends/trtllm/multimodal_support.md(2 hunks)docs/backends/trtllm/multinode/multinode-examples.md(4 hunks)docs/backends/trtllm/multinode/multinode-multimodal-example.md(5 hunks)docs/kubernetes/README.md(1 hunks)examples/backends/trtllm/deploy/agg-with-config.yaml(1 hunks)examples/backends/trtllm/deploy/agg.yaml(1 hunks)examples/backends/trtllm/deploy/agg_router.yaml(1 hunks)examples/backends/trtllm/deploy/disagg.yaml(2 hunks)examples/backends/trtllm/deploy/disagg_planner.yaml(2 hunks)examples/backends/trtllm/deploy/disagg_router.yaml(2 hunks)examples/backends/trtllm/engine_configs/README.md(1 hunks)examples/backends/trtllm/engine_configs/deepseek-r1/agg/wide_ep/wide_ep_agg.yaml(1 hunks)examples/backends/trtllm/engine_configs/deepseek-r1/disagg/wide_ep/wide_ep_decode.yaml(1 hunks)examples/backends/trtllm/engine_configs/deepseek-r1/disagg/wide_ep/wide_ep_prefill.yaml(1 hunks)examples/backends/trtllm/engine_configs/gpt-oss-120b/decode.yaml(1 hunks)examples/backends/trtllm/engine_configs/gpt-oss-120b/prefill.yaml(1 hunks)examples/backends/trtllm/launch/agg.sh(1 hunks)examples/backends/trtllm/launch/agg_metrics.sh(1 hunks)examples/backends/trtllm/launch/agg_router.sh(1 hunks)examples/backends/trtllm/launch/disagg.sh(1 hunks)examples/backends/trtllm/launch/disagg_router.sh(1 hunks)examples/backends/trtllm/launch/disagg_same_gpu.sh(1 hunks)examples/backends/trtllm/launch/epd_disagg.sh(1 hunks)examples/backends/trtllm/launch/gpt_oss_disagg.sh(1 hunks)examples/basics/multinode/trtllm/README.md(1 hunks)examples/basics/multinode/trtllm/srun_aggregated.sh(1 hunks)examples/basics/multinode/trtllm/srun_disaggregated.sh(1 hunks)recipes/README.md(1 hunks)recipes/gpt-oss-120b/trtllm/disagg/README.md(1 hunks)recipes/run.sh(0 hunks)
💤 Files with no reviewable changes (1)
- recipes/run.sh
🧰 Additional context used
🧠 Learnings (7)
📓 Common learnings
Learnt from: biswapanda
Repo: ai-dynamo/dynamo PR: 3858
File: recipes/deepseek-r1/model-cache/model-download.yaml:18-32
Timestamp: 2025-10-24T04:21:08.751Z
Learning: In the recipes directory structure, model-specific recipes (e.g., recipes/deepseek-r1/, recipes/llama-3-70b/) contain hardcoded model names and revisions in their Kubernetes manifests (like model-download.yaml). Each recipe directory is deployment-specific and self-contained, so hardcoding model-specific values is the intended design pattern.
📚 Learning: 2025-10-24T04:21:08.751Z
Learnt from: biswapanda
Repo: ai-dynamo/dynamo PR: 3858
File: recipes/deepseek-r1/model-cache/model-download.yaml:18-32
Timestamp: 2025-10-24T04:21:08.751Z
Learning: In the recipes directory structure, model-specific recipes (e.g., recipes/deepseek-r1/, recipes/llama-3-70b/) contain hardcoded model names and revisions in their Kubernetes manifests (like model-download.yaml). Each recipe directory is deployment-specific and self-contained, so hardcoding model-specific values is the intended design pattern.
Applied to files:
recipes/gpt-oss-120b/trtllm/disagg/README.mdexamples/backends/trtllm/deploy/agg-with-config.yamlbenchmarks/router/run_engines.shdocs/backends/trtllm/multinode/multinode-multimodal-example.mdexamples/backends/trtllm/deploy/disagg.yamlexamples/backends/trtllm/deploy/disagg_planner.yamlexamples/backends/trtllm/deploy/disagg_router.yamlrecipes/README.mdexamples/backends/trtllm/deploy/agg.yaml
📚 Learning: 2025-07-02T13:20:28.800Z
Learnt from: fsaady
Repo: ai-dynamo/dynamo PR: 1730
File: examples/sglang/slurm_jobs/job_script_template.j2:59-59
Timestamp: 2025-07-02T13:20:28.800Z
Learning: In the SLURM job script template at examples/sglang/slurm_jobs/job_script_template.j2, the `--total_nodes` parameter represents the total nodes per worker type (prefill or decode), not the total nodes in the entire cluster. Each worker type needs to know its own group size for distributed coordination.
Applied to files:
examples/basics/multinode/trtllm/README.mddocs/backends/trtllm/multinode/multinode-examples.mddocs/backends/trtllm/llama4_plus_eagle.mddocs/backends/trtllm/multinode/multinode-multimodal-example.mdexamples/basics/multinode/trtllm/srun_disaggregated.sh
📚 Learning: 2025-07-03T10:14:30.570Z
Learnt from: fsaady
Repo: ai-dynamo/dynamo PR: 1730
File: examples/sglang/slurm_jobs/scripts/worker_setup.py:230-244
Timestamp: 2025-07-03T10:14:30.570Z
Learning: In examples/sglang/slurm_jobs/scripts/worker_setup.py, background processes (like nats-server, etcd) are intentionally left running even if later processes fail. This design choice allows users to manually connect to nodes and debug issues without having to restart the entire SLURM job from scratch, providing operational flexibility for troubleshooting in cluster environments.
Applied to files:
examples/basics/multinode/trtllm/README.mddocs/backends/trtllm/multinode/multinode-examples.mddocs/backends/trtllm/multinode/multinode-multimodal-example.md
📚 Learning: 2025-07-03T09:44:41.470Z
Learnt from: fsaady
Repo: ai-dynamo/dynamo PR: 1730
File: examples/sglang/slurm_jobs/scripts/worker_setup.py:113-116
Timestamp: 2025-07-03T09:44:41.470Z
Learning: In examples/sglang/slurm_jobs/scripts/worker_setup.py, logging the full configuration file content is acceptable because the config file is public, contains only placeholder replacements (no sensitive data), and provides debugging benefits for users who may want to create configurations based on the logged output.
Applied to files:
examples/basics/multinode/trtllm/README.md
📚 Learning: 2025-06-05T01:46:15.509Z
Learnt from: GuanLuo
Repo: ai-dynamo/dynamo PR: 1371
File: examples/llm/benchmarks/vllm_multinode_setup.sh:18-25
Timestamp: 2025-06-05T01:46:15.509Z
Learning: In multi-node setups with head/worker architecture, the head node typically doesn't need environment variables pointing to its own services (like NATS_SERVER, ETCD_ENDPOINTS) because local processes can access them via localhost. Only worker nodes need these environment variables to connect to the head node's external IP address.
Applied to files:
docs/backends/trtllm/multinode/multinode-examples.mdexamples/basics/multinode/trtllm/srun_aggregated.shexamples/basics/multinode/trtllm/srun_disaggregated.sh
📚 Learning: 2025-07-31T11:26:48.422Z
Learnt from: KrishnanPrash
Repo: ai-dynamo/dynamo PR: 2217
File: components/backends/trtllm/engine_configs/deepseek_r1/wide_ep/wide_ep_prefill.yaml:18-0
Timestamp: 2025-07-31T11:26:48.422Z
Learning: TRTLLM LLM-API expects all caps for backend field names in configuration files. When migrating TRTLLM configurations, backend values like "WideEP" should be changed to "WIDEEP" to comply with the API requirements.
Applied to files:
examples/backends/trtllm/engine_configs/deepseek-r1/disagg/wide_ep/wide_ep_decode.yamlexamples/backends/trtllm/engine_configs/deepseek-r1/disagg/wide_ep/wide_ep_prefill.yamldocs/backends/trtllm/multinode/multinode-multimodal-example.mdexamples/backends/trtllm/engine_configs/README.mdexamples/backends/trtllm/deploy/disagg.yamldocs/backends/trtllm/README.mdexamples/backends/trtllm/engine_configs/deepseek-r1/agg/wide_ep/wide_ep_agg.yamlexamples/backends/trtllm/launch/epd_disagg.sh
🪛 markdownlint-cli2 (0.18.1)
recipes/README.md
13-13: Table column count
Expected: 8; Actual: 7; Too few cells, row will be missing data
(MD056, table-column-count)
14-14: Table column count
Expected: 8; Actual: 7; Too few cells, row will be missing data
(MD056, table-column-count)
15-15: Table column count
Expected: 8; Actual: 7; Too few cells, row will be missing data
(MD056, table-column-count)
16-16: Table column count
Expected: 8; Actual: 7; Too few cells, row will be missing data
(MD056, table-column-count)
17-17: Table column count
Expected: 8; Actual: 7; Too few cells, row will be missing data
(MD056, table-column-count)
18-18: Table column count
Expected: 8; Actual: 7; Too few cells, row will be missing data
(MD056, table-column-count)
19-19: Table column count
Expected: 8; Actual: 7; Too few cells, row will be missing data
(MD056, table-column-count)
20-20: Table column count
Expected: 8; Actual: 7; Too few cells, row will be missing data
(MD056, table-column-count)
21-21: Table column count
Expected: 8; Actual: 7; Too few cells, row will be missing data
(MD056, table-column-count)
31-31: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
47-47: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
54-54: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
61-61: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
75-75: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
90-90: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
103-103: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
118-118: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
137-137: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
150-150: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: trtllm (amd64)
- GitHub Check: trtllm (arm64)
- GitHub Check: sglang (arm64)
- GitHub Check: operator (arm64)
- GitHub Check: sglang (amd64)
- GitHub Check: vllm (amd64)
- GitHub Check: operator (amd64)
- GitHub Check: vllm (arm64)
🔇 Additional comments (37)
recipes/gpt-oss-120b/trtllm/disagg/README.md (2)
24-24: Relative path is valid—no issues found.The file
recipes/CONTRIBUTING.mdexists at the referenced location, confirming that the relative path../../../CONTRIBUTING.mdfromrecipes/gpt-oss-120b/trtllm/disagg/resolves correctly. The link will not break.
8-9: The original review comment's concern was based on an incorrect assumption. Verification confirms thatdecode.yamlandprefill.yamldo exist inrecipes/gpt-oss-120b/trtllm/disagg/(alongside copies in the centralized location atexamples/backends/trtllm/engine_configs/gpt-oss-120b/). The README documentation at lines 8-9 is accurate—these files are present in the directory as stated. No changes are needed.Likely an incorrect or invalid review comment.
docs/backends/trtllm/multinode/multinode-examples.md (4)
170-171: Disaggregated engine config paths verified as correct.Both
wide_ep_prefill.yamlandwide_ep_decode.yamlexist at the expected locations in the repository underexamples/backends/trtllm/engine_configs/deepseek-r1/disagg/wide_ep/. The commented paths are accurate and the migration from recipes/ has been completed successfully.
141-141: Path verification confirms the commented example is correct.The file
examples/backends/trtllm/engine_configs/deepseek-r1/agg/wide_ep/wide_ep_agg.yamlexists and the path structure is consistent with the consolidation from therecipes/structure. The/mnt/examples/prefix in the commented example is appropriate for containerized deployments where this path would be mounted.
20-20: Path reference is valid; no issues found.All paths referenced in the documentation have been verified:
- The script directory
examples/basics/multinode/trtllm/exists and contains the referenced scripts (srun_aggregated.shandsrun_disaggregated.sh)- The engine config directory
examples/backends/trtllm/engine_configs/exists with all referenced configuration files
111-112: MOUNTS path configuration is correct and provides full access to all referenced engine config files.All three engine config files exist at their expected locations in the repository and are accessible via the MOUNTS configuration:
examples/backends/trtllm/engine_configs/deepseek-r1/agg/wide_ep/wide_ep_agg.yaml(line 141)examples/backends/trtllm/engine_configs/deepseek-r1/disagg/wide_ep/wide_ep_prefill.yaml(line 170)examples/backends/trtllm/engine_configs/deepseek-r1/disagg/wide_ep/wide_ep_decode.yaml(line 171)The MOUNTS configuration
${PWD}/../../../../:/mntcorrectly mounts the repository root to/mntin the container, making all referenced paths accessible.examples/backends/trtllm/deploy/agg-with-config.yaml (1)
54-70: Verify path resolution relative to working directory.The working directory is
/workspace/examples/backends/trtllm(line 54), and the--extra-engine-argspath on line 70 uses./examples/backends/.... This resolves to/workspace/examples/backends/trtllm/./examples/backends/trtllm/engine_configs/qwen3/agg.yaml, which appears to have a redundant path segment.Confirm whether the intended path is:
examples/backends/trtllm/engine_configs/qwen3/agg.yaml(relative to/workspace), or../../../examples/backends/trtllm/engine_configs/qwen3/agg.yaml(relative to/workspace/examples/backends/trtllm), or- Another variant.
examples/backends/trtllm/engine_configs/deepseek-r1/disagg/wide_ep/wide_ep_prefill.yaml (1)
18-20: Verify path format consistency across configuration and deployment files.The
load_balancerpath uses an absolute/mnt/prefix on line 20, whereas deployment files use relative paths (./examples/...). Confirm this difference is intentional and that:
- The
/mnt/path is valid within the TensorRT-LLM engine configuration context- The path is consistent with how the
load_balancerfield is resolved by TensorRT-LLMAdditionally, verify that the referenced
eplb.yamlfile exists or will be created at this location.examples/backends/trtllm/engine_configs/deepseek-r1/disagg/wide_ep/wide_ep_decode.yaml (1)
18-20: Verify path format consistency across configuration and deployment files.Same concern as
wide_ep_prefill.yaml: the absolute/mnt/prefix on line 20 differs from relative paths in deployment files. Verify this path format is valid in the TensorRT-LLM engine configuration context and that the referencedeplb.yamlfile exists at the expected location.examples/backends/trtllm/deploy/disagg.yaml (2)
29-29: Verify path resolution for prefill worker configuration (line 40).The working directory is
/workspace/(line 29), and the path on line 40 uses./examples/backends/trtllm/engine_configs/qwen3/prefill.yaml. Verify this resolves correctly to the intended file location.Also applies to: 40-40
55-55: Verify path resolution for decode worker configuration (line 66).The working directory is
/workspace/(line 55), and the path on line 66 uses./examples/backends/trtllm/engine_configs/qwen3/decode.yaml. Verify this resolves correctly to the intended file location.Also applies to: 66-66
examples/backends/trtllm/engine_configs/deepseek-r1/agg/wide_ep/wide_ep_agg.yaml (1)
6-14: Verify path format consistency across configuration and deployment files.The
load_balancerpath on line 14 uses the absolute/mnt/prefix, consistent with otherwide_epconfigurations. Verify that this path format is valid in the TensorRT-LLM engine configuration context and that the referencedeplb.yamlfile exists at this location.docs/kubernetes/README.md (1)
201-207: Clarify path format in documentation example.The documentation example on line 206 uses
/workspace/examples/backends/trtllm/engine_configs/deepseek-r1-distill-llama-8b/agg.yamlwith an absolute/workspaceprefix. This differs from:
- Deployment files using
./examples/backends/...- Engine config files using
/mnt/examples/backends/...Confirm whether
/workspaceis the correct prefix for a documentation example or if it should match one of the other patterns (relative path or/mnt/). Additionally, verify that the file path uses dashes (deepseek-r1-distill-llama-8b) and not underscores.examples/backends/trtllm/deploy/agg_router.yaml (1)
31-31: Verify path resolution for worker configuration (line 42).The working directory is
/workspace/(line 31), and the path on line 42 uses./examples/backends/trtllm/engine_configs/qwen3/agg.yaml. Verify this resolves correctly to the intended file location.Also applies to: 42-42
examples/backends/trtllm/deploy/disagg_router.yaml (2)
31-31: Verify path resolution for prefill worker configuration (line 42).The working directory is
/workspace/(line 31), and the path on line 42 uses./examples/backends/trtllm/engine_configs/qwen3/prefill.yaml. Verify this resolves correctly to the intended file location.Also applies to: 42-42
57-57: Verify path resolution for decode worker configuration (line 68).The working directory is
/workspace/(line 57), and the path on line 68 uses./examples/backends/trtllm/engine_configs/qwen3/decode.yaml. Verify this resolves correctly to the intended file location.Also applies to: 68-68
examples/backends/trtllm/launch/agg_metrics.sh (1)
9-9: Path consolidation approved.The default
AGG_ENGINE_ARGSpath has been correctly updated to the new centralized engine_configs location. This aligns with the PR's objective to consolidate configuration files.examples/basics/multinode/trtllm/srun_aggregated.sh (1)
21-21: Container path correctly updated for multinode deployment.The
ENGINE_CONFIGpath has been updated to the new location with the appropriate/mntcontainer mount prefix. The full subdirectory hierarchy (agg/wide_ep/) is preserved in the new engine_configs location.examples/basics/multinode/trtllm/README.md (1)
1-20: New reference README is appropriate.Minimal reference documentation with proper licensing. The link structure allows readers to find the detailed setup guide.
examples/backends/trtllm/launch/epd_disagg.sh (1)
9-11: EPD disaggregated configuration paths properly consolidated.All three engine configuration paths for encode, prefill, and decode workers have been updated to the new engine_configs location. The separation of encode functionality for multimodal models is appropriate for this disaggregated setup.
docs/backends/trtllm/multimodal_support.md (1)
30-30: Path consolidation with standard extensions approved.All engine configuration paths for both aggregated and disaggregated multimodal deployments have been updated to the new engine_configs location with standard
.yamlextensions.Also applies to: 82-83
docs/backends/trtllm/multinode/multinode-multimodal-example.md (1)
20-20: Multinode multimodal example properly updated with important fixes.Multiple improvements here:
- Line 20: Helpful note pointing to example script locations
- Line 39: Important sed command fixing backend field case (DEFAULT → default) aligns with TensorRT-LLM API requirements
- Lines 105-106: Path consolidation with standard
.yamlextensions- Line 128: Formatting fix for output enumeration
These changes support the path consolidation objective while addressing necessary runtime compatibility fixes.
Also applies to: 39-39, 105-106, 128-128
docs/backends/trtllm/gemma3_sliding_window_attention.md (1)
33-33: Gemma3 VSWA configuration paths comprehensively updated.All engine configuration paths have been properly consolidated to the new engine_configs location across all serving modes (aggregated, disaggregated, with/without KV routing). Standard
.yamlextensions and$DYNAMO_HOMEprefix used consistently.Also applies to: 42-42, 51-52, 61-62
docs/backends/trtllm/llama4_plus_eagle.md (1)
55-55: Fix broken file references to actual config filenames.Lines 63 and 65 reference config files with
.ymlextension, but the actual files use.yaml. This will cause runtime failures when the scripts attempt to load these configurations.Line 55 is correct as-is (eagle_agg.yml exists). Only lines 63 and 65 need correction:
-export PREFILL_ENGINE_CONFIG="/mnt/examples/backends/trtllm/engine_configs/llama4/eagle/eagle_prefill.yml" +export PREFILL_ENGINE_CONFIG="/mnt/examples/backends/trtllm/engine_configs/llama4/eagle/eagle_prefill.yaml"-export DECODE_ENGINE_CONFIG="/mnt/examples/backends/trtllm/engine_configs/llama4/eagle/eagle_decode.yml" +export DECODE_ENGINE_CONFIG="/mnt/examples/backends/trtllm/engine_configs/llama4/eagle/eagle_decode.yaml"⛔ Skipped due to learnings
Learnt from: KrishnanPrash Repo: ai-dynamo/dynamo PR: 2217 File: components/backends/trtllm/engine_configs/deepseek_r1/wide_ep/wide_ep_prefill.yaml:18-0 Timestamp: 2025-07-31T11:26:48.422Z Learning: TRTLLM LLM-API expects all caps for backend field names in configuration files. When migrating TRTLLM configurations, backend values like "WideEP" should be changed to "WIDEEP" to comply with the API requirements.examples/basics/multinode/trtllm/srun_disaggregated.sh (1)
20-20: Engine config paths consolidated correctly; verify YAML files exist at new locations.The path updates from
recipes/deepseek-r1/trtllm/disagg/toexamples/backends/trtllm/engine_configs/deepseek-r1/disagg/are consistent with the PR objectives. Variables are properly used downstream at lines 65 and 85.Please verify that the actual engine configuration YAML files (
wide_ep_prefill.yamlandwide_ep_decode.yaml) exist at the new centralized locations.Also applies to: 24-24
examples/backends/trtllm/deploy/agg.yaml (1)
39-39: Path relocated to centralized engine configs; verify YAML file exists.The engine config path has been properly updated to reference the centralized location. Verify that
examples/backends/trtllm/engine_configs/qwen3/agg.yamlexists.examples/backends/trtllm/launch/agg_router.sh (1)
9-9: Engine config path consolidated consistently.The default AGG_ENGINE_ARGS path has been updated correctly to the centralized location. This is consistent with similar updates in other launch scripts.
Verify that
$DYNAMO_HOME/examples/backends/trtllm/engine_configs/qwen3/agg.yamlexists at deployment time.examples/backends/trtllm/launch/disagg_router.sh (1)
9-10: Disaggregated config paths consolidated correctly.Both PREFILL_ENGINE_ARGS and DECODE_ENGINE_ARGS have been updated to reference the centralized engine_configs location, maintaining the separate prefill/decode configuration pattern.
Verify that both
prefill.yamlanddecode.yamlexist at the new paths underexamples/backends/trtllm/engine_configs/qwen3/.examples/backends/trtllm/engine_configs/README.md (1)
1-44: Documentation for centralized engine_configs directory is clear and comprehensive.The new README provides users with effective guidance on using the engine configuration files, including usage examples, configuration categories, key parameters, and important notes on WideEP and dtype matching.
docs/backends/trtllm/README.md (1)
161-161: Documentation path updated correctly for MTP example.The AGG_ENGINE_ARGS path in the DeepSeek R1 MTP example has been properly updated to the centralized engine_configs location. The relative path format is appropriate for documentation showing commands to run from
examples/backends/trtllm.examples/backends/trtllm/launch/agg.sh (1)
9-9: Engine config path consolidated to centralized location.The default AGG_ENGINE_ARGS path has been updated consistently with other launch scripts to reference the centralized engine_configs directory.
Verify that
$DYNAMO_HOME/examples/backends/trtllm/engine_configs/qwen3/agg.yamlexists.examples/backends/trtllm/deploy/disagg_planner.yaml (1)
100-100: Disaggregated deployment paths consolidated correctly for both workers.The --extra-engine-args paths for both TRTLLMDecodeWorker and TRTLLMPrefillWorker have been updated to reference the centralized engine_configs location. The disaggregation modes are properly maintained.
Verify that the engine config files (
decode.yamlandprefill.yaml) exist at./examples/backends/trtllm/engine_configs/qwen3/relative to the Kubernetes working directory.Also applies to: 127-127
docs/backends/trtllm/gpt-oss.md (1)
93-93: Documentation path references updated correctly.All path references to engine configuration files have been updated consistently to the new centralized location (
examples/backends/trtllm/engine_configs/). The documentation accurately reflects the actual file locations and will guide users correctly.Also applies to: 100-100, 148-148, 164-164
benchmarks/router/run_engines.sh (2)
87-101: Engine config path construction is sound.The conditional logic correctly constructs paths to agg.yaml, prefill.yaml, or decode.yaml based on the MODE variable. The default path correctly points to the new centralized engine_configs directory.
10-10: Migration of RECIPE_PATH to ENGINE_CONFIG_PATH is complete and verified.Verification confirms:
- No remaining references to
RECIPE_PATHexist elsewhere in the codebaseENGINE_CONFIG_PATHis properly introduced and consistently used across all five locations in the file (definition + four uses in config path construction)- The variable is scoped to this file only, with no cross-file dependencies
examples/backends/trtllm/launch/gpt_oss_disagg.sh (1)
9-10: Launch script paths updated correctly for GPT-OSS.The PREFILL_ENGINE_ARGS and DECODE_ENGINE_ARGS variables now reference the correct paths in the new centralized engine_configs directory. These will resolve to the prefill.yaml and decode.yaml files created in this PR.
examples/backends/trtllm/launch/disagg.sh (1)
9-10: Launch script paths updated correctly for Qwen3.The PREFILL_ENGINE_ARGS and DECODE_ENGINE_ARGS variables now reference the new centralized engine_configs directory paths for the qwen3 backend.
| cache_transceiver_config: | ||
| backend: UCX | ||
| max_tokens_in_buffer: 65536 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix YAML indentation inconsistency in cache_transceiver_config.
Lines 22-23 use 2-space indentation while other nested configs (moe_config, cuda_graph_config) use 4-space indentation. Standardize to 4-space indentation for consistency.
cache_transceiver_config:
- backend: UCX
- max_tokens_in_buffer: 65536
+ backend: UCX
+ max_tokens_in_buffer: 65536📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| cache_transceiver_config: | |
| backend: UCX | |
| max_tokens_in_buffer: 65536 | |
| cache_transceiver_config: | |
| backend: UCX | |
| max_tokens_in_buffer: 65536 |
🤖 Prompt for AI Agents
In examples/backends/trtllm/engine_configs/gpt-oss-120b/decode.yaml around lines
21 to 23, the cache_transceiver_config block uses 2-space indentation for its
nested keys while other nested blocks use 4-space indentation; update the nested
lines (backend and max_tokens_in_buffer) to use 4-space indentation so the YAML
nesting matches the rest of the file and remains consistent.
| cache_transceiver_config: | ||
| backend: UCX | ||
| max_tokens_in_buffer: 65536 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix YAML indentation inconsistency in cache_transceiver_config.
Lines 24-25 use 2-space indentation while other nested configs (moe_config, cuda_graph_config) use 4-space indentation. Standardize to 4-space indentation for consistency.
cache_transceiver_config:
- backend: UCX
- max_tokens_in_buffer: 65536
+ backend: UCX
+ max_tokens_in_buffer: 65536📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| cache_transceiver_config: | |
| backend: UCX | |
| max_tokens_in_buffer: 65536 | |
| cache_transceiver_config: | |
| backend: UCX | |
| max_tokens_in_buffer: 65536 |
🤖 Prompt for AI Agents
In examples/backends/trtllm/engine_configs/gpt-oss-120b/prefill.yaml around
lines 23 to 25, the cache_transceiver_config block uses 2-space indentation for
its nested keys while other nested configs use 4 spaces; update the indentation
of the keys under cache_transceiver_config (backend and max_tokens_in_buffer) to
use 4 spaces so the file uses a consistent 4-space indentation for nested
configuration blocks.
| export PREFILL_ENGINE_ARGS=${PREFILL_ENGINE_ARGS:-"$DYNAMO_HOME/tests/serve/trtllm/engine_configs/qwen3/prefill.yaml"} | ||
| export DECODE_ENGINE_ARGS=${DECODE_ENGINE_ARGS:-"$DYNAMO_HOME/tests/serve/trtllm/engine_configs/qwen3/decode.yaml"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix path inconsistency: disagg_same_gpu.sh uses wrong engine_configs location.
Lines 35-36 reference tests/serve/trtllm/engine_configs while all other launch scripts in this PR (gpt_oss_disagg.sh, disagg.sh) reference examples/backends/trtllm/engine_configs. This breaks consistency with the PR objective to consolidate configs under examples/backends/trtllm/engine_configs/.
-export PREFILL_ENGINE_ARGS=${PREFILL_ENGINE_ARGS:-"$DYNAMO_HOME/tests/serve/trtllm/engine_configs/qwen3/prefill.yaml"}
-export DECODE_ENGINE_ARGS=${DECODE_ENGINE_ARGS:-"$DYNAMO_HOME/tests/serve/trtllm/engine_configs/qwen3/decode.yaml"}
+export PREFILL_ENGINE_ARGS=${PREFILL_ENGINE_ARGS:-"$DYNAMO_HOME/examples/backends/trtllm/engine_configs/qwen3/prefill.yaml"}
+export DECODE_ENGINE_ARGS=${DECODE_ENGINE_ARGS:-"$DYNAMO_HOME/examples/backends/trtllm/engine_configs/qwen3/decode.yaml"}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| export PREFILL_ENGINE_ARGS=${PREFILL_ENGINE_ARGS:-"$DYNAMO_HOME/tests/serve/trtllm/engine_configs/qwen3/prefill.yaml"} | |
| export DECODE_ENGINE_ARGS=${DECODE_ENGINE_ARGS:-"$DYNAMO_HOME/tests/serve/trtllm/engine_configs/qwen3/decode.yaml"} | |
| export PREFILL_ENGINE_ARGS=${PREFILL_ENGINE_ARGS:-"$DYNAMO_HOME/examples/backends/trtllm/engine_configs/qwen3/prefill.yaml"} | |
| export DECODE_ENGINE_ARGS=${DECODE_ENGINE_ARGS:-"$DYNAMO_HOME/examples/backends/trtllm/engine_configs/qwen3/decode.yaml"} |
🤖 Prompt for AI Agents
In examples/backends/trtllm/launch/disagg_same_gpu.sh around lines 35 to 36, the
PREFILL_ENGINE_ARGS and DECODE_ENGINE_ARGS default paths point to
tests/serve/trtllm/engine_configs/... which is inconsistent with other launch
scripts; update both defaults to use examples/backends/trtllm/engine_configs/...
(i.e., replace the tests/serve/trtllm path prefix with examples/backends/trtllm)
so the script references the consolidated engine_configs location used across
the PR.
| | Model | Framework | Mode | GPUs | Deployment | Benchmark Recipe | Notes | GAIE integration | | ||
| |-------|-----------|------|------|------------|------------------|-------|------------------| | ||
| | **[Llama-3-70B](llama-3-70b/vllm/agg/)** | vLLM | Aggregated | 4x H100/H200 | ✅ | ✅ | FP8 dynamic quantization | ✅ | | ||
| | **[Llama-3-70B](llama-3-70b/vllm/disagg-single-node/)** | vLLM | Disagg (Single-Node) | 8x H100/H200 | ✅ | ✅ | Prefill + Decode separation | | ||
| | **[Llama-3-70B](llama-3-70b/vllm/disagg-multi-node/)** | vLLM | Disagg (Multi-Node) | 16x H100/H200 | ✅ | ✅ | 2 nodes, 8 GPUs each | | ||
| | **[Qwen3-32B-FP8](qwen3-32b-fp8/trtllm/agg/)** | TensorRT-LLM | Aggregated | 4x GPU | ✅ | ✅ | FP8 quantization | | ||
| | **[Qwen3-32B-FP8](qwen3-32b-fp8/trtllm/disagg/)** | TensorRT-LLM | Disaggregated | 8x GPU | ✅ | ✅ | Prefill + Decode separation | | ||
| | **[GPT-OSS-120B](gpt-oss-120b/trtllm/agg/)** | TensorRT-LLM | Aggregated | 4x GB200 | ✅ | ✅ | Blackwell only, WideEP | | ||
| | **[GPT-OSS-120B](gpt-oss-120b/trtllm/disagg/)** | TensorRT-LLM | Disaggregated | TBD | ❌ | ❌ | Engine configs only, no K8s manifest | | ||
| | **[DeepSeek-R1](deepseek-r1/sglang/disagg-8gpu/)** | SGLang | Disagg WideEP | 8x H200 | ✅ | ❌ | Benchmark recipe pending | | ||
| | **[DeepSeek-R1](deepseek-r1/sglang/disagg-16gpu/)** | SGLang | Disagg WideEP | 16x H200 | ✅ | ❌ | Benchmark recipe pending | | ||
| | **[DeepSeek-R1](deepseek-r1/trtllm/disagg/wide_ep/gb200/)** | TensorRT-LLM | Disagg WideEP (GB200) | 32+4 GB200 | ✅ | ✅ | Multi-node: 8 decode + 1 prefill nodes | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix table column alignment—most rows missing GAIE integration column.
The table header declares 8 columns (including "GAIE integration"), but rows 13–21 only have 7 cells. Either add empty cells (| |) or remove the GAIE column header to match the data rows. Most models do not have GAIE integration values; consider whether that column is necessary or if placeholder content should be added consistently.
-| **[Llama-3-70B](llama-3-70b/vllm/disagg-single-node/)** | vLLM | Disagg (Single-Node) | 8x H100/H200 | ✅ | ✅ | Prefill + Decode separation |
+| **[Llama-3-70B](llama-3-70b/vllm/disagg-single-node/)** | vLLM | Disagg (Single-Node) | 8x H100/H200 | ✅ | ✅ | Prefill + Decode separation | |Apply the same fix (add trailing | |) to lines 14–21.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| | Model | Framework | Mode | GPUs | Deployment | Benchmark Recipe | Notes | GAIE integration | | |
| |-------|-----------|------|------|------------|------------------|-------|------------------| | |
| | **[Llama-3-70B](llama-3-70b/vllm/agg/)** | vLLM | Aggregated | 4x H100/H200 | ✅ | ✅ | FP8 dynamic quantization | ✅ | | |
| | **[Llama-3-70B](llama-3-70b/vllm/disagg-single-node/)** | vLLM | Disagg (Single-Node) | 8x H100/H200 | ✅ | ✅ | Prefill + Decode separation | | |
| | **[Llama-3-70B](llama-3-70b/vllm/disagg-multi-node/)** | vLLM | Disagg (Multi-Node) | 16x H100/H200 | ✅ | ✅ | 2 nodes, 8 GPUs each | | |
| | **[Qwen3-32B-FP8](qwen3-32b-fp8/trtllm/agg/)** | TensorRT-LLM | Aggregated | 4x GPU | ✅ | ✅ | FP8 quantization | | |
| | **[Qwen3-32B-FP8](qwen3-32b-fp8/trtllm/disagg/)** | TensorRT-LLM | Disaggregated | 8x GPU | ✅ | ✅ | Prefill + Decode separation | | |
| | **[GPT-OSS-120B](gpt-oss-120b/trtllm/agg/)** | TensorRT-LLM | Aggregated | 4x GB200 | ✅ | ✅ | Blackwell only, WideEP | | |
| | **[GPT-OSS-120B](gpt-oss-120b/trtllm/disagg/)** | TensorRT-LLM | Disaggregated | TBD | ❌ | ❌ | Engine configs only, no K8s manifest | | |
| | **[DeepSeek-R1](deepseek-r1/sglang/disagg-8gpu/)** | SGLang | Disagg WideEP | 8x H200 | ✅ | ❌ | Benchmark recipe pending | | |
| | **[DeepSeek-R1](deepseek-r1/sglang/disagg-16gpu/)** | SGLang | Disagg WideEP | 16x H200 | ✅ | ❌ | Benchmark recipe pending | | |
| | **[DeepSeek-R1](deepseek-r1/trtllm/disagg/wide_ep/gb200/)** | TensorRT-LLM | Disagg WideEP (GB200) | 32+4 GB200 | ✅ | ✅ | Multi-node: 8 decode + 1 prefill nodes | | |
| | Model | Framework | Mode | GPUs | Deployment | Benchmark Recipe | Notes | GAIE integration | | |
| |-------|-----------|------|------|------------|------------------|-------|------------------| | |
| | **[Llama-3-70B](llama-3-70b/vllm/agg/)** | vLLM | Aggregated | 4x H100/H200 | ✅ | ✅ | FP8 dynamic quantization | ✅ | | |
| | **[Llama-3-70B](llama-3-70b/vllm/disagg-single-node/)** | vLLM | Disagg (Single-Node) | 8x H100/H200 | ✅ | ✅ | Prefill + Decode separation | | | |
| | **[Llama-3-70B](llama-3-70b/vllm/disagg-multi-node/)** | vLLM | Disagg (Multi-Node) | 16x H100/H200 | ✅ | ✅ | 2 nodes, 8 GPUs each | | | |
| | **[Qwen3-32B-FP8](qwen3-32b-fp8/trtllm/agg/)** | TensorRT-LLM | Aggregated | 4x GPU | ✅ | ✅ | FP8 quantization | | | |
| | **[Qwen3-32B-FP8](qwen3-32b-fp8/trtllm/disagg/)** | TensorRT-LLM | Disaggregated | 8x GPU | ✅ | ✅ | Prefill + Decode separation | | | |
| | **[GPT-OSS-120B](gpt-oss-120b/trtllm/agg/)** | TensorRT-LLM | Aggregated | 4x GB200 | ✅ | ✅ | Blackwell only, WideEP | | | |
| | **[GPT-OSS-120B](gpt-oss-120b/trtllm/disagg/)** | TensorRT-LLM | Disaggregated | TBD | ❌ | ❌ | Engine configs only, no K8s manifest | | | |
| | **[DeepSeek-R1](deepseek-r1/sglang/disagg-8gpu/)** | SGLang | Disagg WideEP | 8x H200 | ✅ | ❌ | Benchmark recipe pending | | | |
| | **[DeepSeek-R1](deepseek-r1/sglang/disagg-16gpu/)** | SGLang | Disagg WideEP | 16x H200 | ✅ | ❌ | Benchmark recipe pending | | | |
| | **[DeepSeek-R1](deepseek-r1/trtllm/disagg/wide_ep/gb200/)** | TensorRT-LLM | Disagg WideEP (GB200) | 32+4 GB200 | ✅ | ✅ | Multi-node: 8 decode + 1 prefill nodes | | |
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)
13-13: Table column count
Expected: 8; Actual: 7; Too few cells, row will be missing data
(MD056, table-column-count)
14-14: Table column count
Expected: 8; Actual: 7; Too few cells, row will be missing data
(MD056, table-column-count)
15-15: Table column count
Expected: 8; Actual: 7; Too few cells, row will be missing data
(MD056, table-column-count)
16-16: Table column count
Expected: 8; Actual: 7; Too few cells, row will be missing data
(MD056, table-column-count)
17-17: Table column count
Expected: 8; Actual: 7; Too few cells, row will be missing data
(MD056, table-column-count)
18-18: Table column count
Expected: 8; Actual: 7; Too few cells, row will be missing data
(MD056, table-column-count)
19-19: Table column count
Expected: 8; Actual: 7; Too few cells, row will be missing data
(MD056, table-column-count)
20-20: Table column count
Expected: 8; Actual: 7; Too few cells, row will be missing data
(MD056, table-column-count)
21-21: Table column count
Expected: 8; Actual: 7; Too few cells, row will be missing data
(MD056, table-column-count)
🤖 Prompt for AI Agents
recipes/README.md around lines 10 to 21: the markdown table header declares 8
columns including "GAIE integration" but the data rows on lines 14–21 only have
7 cells, so add a trailing empty cell to each of those rows; edit lines 14–21 to
append " | |" (i.e., add a final empty column cell) to ensure every row has the
same 8 columns as the header.
|
|
||
| # Deploy GPT-OSS-120B with TensorRT-LLM | ||
| ./run.sh --model gpt-oss-120b --framework trtllm --deployment agg | ||
| **Step 4: Run Benchmark (Optional)** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct step numbering—both are labeled "Step 4".
Line 137 is "Step 4: Run Benchmark" and line 150 is "Step 4: GAIE Integration". The GAIE section should be "Step 5" to maintain sequential numbering.
-**Step 4: GAIE Integration (Optional)**
+**Step 5: GAIE Integration (Optional)**Also applies to: 150-150
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)
137-137: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
🤖 Prompt for AI Agents
In recipes/README.md around lines 137 and 150, the heading "Step 4: Run
Benchmark (Optional)" and "Step 4: GAIE Integration" use the same step number;
update the GAIE Integration heading on line 150 to "Step 5: GAIE Integration" so
the steps are sequential and consistent, and scan nearby headings to ensure
subsequent step numbers (if any) are adjusted accordingly.
| For Llama-3-70B with vLLM (Aggregated), an example of integration with the Inference Gateway is provided. | ||
|
|
||
| 2. Apply manifests by running a script. | ||
| Follow to Follow [Deploy Inference Gateway Section 2](../deploy/inference-gateway/README.md#2-deploy-inference-gateway) to install GAIE. Then apply manifests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix duplicate word "Follow".
Line 154 contains "Follow to Follow"—remove the duplicate.
-Follow to Follow [Deploy Inference Gateway Section 2](../deploy/inference-gateway/README.md#2-deploy-inference-gateway) to install GAIE.
+Follow [Deploy Inference Gateway Section 2](../deploy/inference-gateway/README.md#2-deploy-inference-gateway) to install GAIE.🤖 Prompt for AI Agents
In recipes/README.md around line 154, the sentence contains the duplicated word
"Follow to Follow"; remove the extra "Follow" so the line reads correctly (e.g.,
"Follow to Deploy Inference Gateway Section 2..." or better "Follow Deploy
Inference Gateway Section 2...") ensuring grammar and the link remain intact.
|
Add detail to the title and description please. |
Overview:
Details:
Where should the reviewer start?
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
Summary by CodeRabbit
Release Notes
Documentation
Chores