Skip to content

Conversation

@atchernych
Copy link
Contributor

@atchernych atchernych commented Nov 17, 2025

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

  • closes GitHub issue: #xxx

Summary by CodeRabbit

Release Notes

  • Documentation

    • Reorganized TensorRT-LLM engine configuration files to a new centralized location within the examples directory for improved accessibility and consistency.
    • Updated all deployment guides, launch scripts, and Kubernetes examples to reference the new configuration paths.
    • Added comprehensive documentation for engine configurations and multi-node setups with clearer setup instructions.
  • Chores

    • Restructured recipe deployment workflow to emphasize Kubernetes-centric deployments with updated prerequisites and step-by-step guidance.

BenHamm and others added 11 commits November 6, 2025 09:20
Remove incomplete model directories and non-Kubernetes configurations to
streamline the recipes directory for production Kubernetes deployments.

Changes:
- Remove 5 incomplete model directories (deepseek-r1-distill-llama-8b,
  gemma3, llama4, qwen2-vl-7b-instruct, qwen3) that lack proper
  Kubernetes deployment manifests
- Delete run.sh script (non-Kubernetes automation tool)
- Remove standalone engine config YAMLs from deepseek-r1/trtllm that
  were not wrapped in Kubernetes manifests
- Document incomplete gpt-oss-120b disagg recipe with README explaining
  missing components

README improvements:
- Restructure Available Recipes table with 'Deployment' and 'Benchmark
  Recipe' columns to clarify that perf.yaml files are tools for users
  to run benchmarks, not published performance results
- Add comprehensive quick start guide with prerequisites
- Link to correct Kubernetes deployment guides
- Add troubleshooting section
- Remove extraneous links (docs.nvidia.com, license section)

Result: 4 models with 10 complete deployment recipes (7 with benchmark
scripts), focused exclusively on Kubernetes deployments.

Signed-off-by: Ben Hamm <ben.hamm@gmail.com>
Address feedback to make the README less AI-generated looking by removing
decorative emojis from section headings while keeping status indicators
(✅ ❌) in tables and content.

Signed-off-by: Ben Hamm <ben.hamm@gmail.com>
Co-authored-by: Anant Sharma <anants@nvidia.com>
Signed-off-by: Tanmay Verma <tanmay2592@gmail.com>
Signed-off-by: Anna Tchernych <atchernych@nvidia.com>
@atchernych atchernych requested review from a team as code owners November 17, 2025 23:44
@github-actions github-actions bot added the feat label Nov 17, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 17, 2025

Walkthrough

This pull request consolidates TensorRT-LLM engine configuration file paths from scattered recipes/*/trtllm/ directories to a centralized location under examples/backends/trtllm/engine_configs/. Updates path references across deployment scripts, launch scripts, YAML configs, and documentation. Deletes the legacy recipes/run.sh script and rebrands recipes documentation.

Changes

Cohort / File(s) Summary
Shell Launch Scripts
examples/backends/trtllm/launch/{agg,agg_metrics,agg_router,disagg,disagg_router,disagg_same_gpu,epd_disagg,gpt_oss_disagg}.sh
Updated default values for exported environment variables (AGG_ENGINE_ARGS, PREFILL_ENGINE_ARGS, DECODE_ENGINE_ARGS) to point from recipes/*/trtllm/ to examples/backends/trtllm/engine_configs/*/.
Deployment YAML Configs
examples/backends/trtllm/deploy/{agg,agg-with-config,agg_router,disagg,disagg_planner,disagg_router}.yaml
Updated --extra-engine-args paths from recipes/qwen3/trtllm/ or recipes/gpt-oss-120b/trtllm/disagg/ to examples/backends/trtllm/engine_configs/qwen3/ or examples/backends/trtllm/engine_configs/gpt-oss-120b/.
Multinode Launch Scripts
examples/basics/multinode/trtllm/{srun_aggregated,srun_disaggregated}.sh
Updated default ENGINE_CONFIG, PREFILL_ENGINE_CONFIG, and DECODE_ENGINE_CONFIG paths from /mnt/recipes/deepseek-r1/trtllm/ to /mnt/examples/backends/trtllm/engine_configs/deepseek-r1/.
Router Benchmark Script
benchmarks/router/run_engines.sh
Renamed variable RECIPE_PATH to ENGINE_CONFIG_PATH and updated all YAML config path lookups (agg.yaml, decode.yaml, prefill.yaml) from recipes/deepseek-r1-distill-llama-8b/trtllm to examples/backends/trtllm/engine_configs/deepseek-r1-distill-llama-8b.
New Engine Config Files
examples/backends/trtllm/engine_configs/{gpt-oss-120b/{decode,prefill}.yaml, deepseek-r1/{agg/wide_ep/wide_ep_agg,disagg/wide_ep/{wide_ep_decode,wide_ep_prefill}}.yaml}
Added or updated configuration files with optimizer settings, MOE backend configs, CUDA graph settings, and load balancer path references updated to new engine_configs locations.
TensorRT-LLM Backend Documentation
docs/backends/trtllm/{README,gemma3_sliding_window_attention,gpt-oss,llama4_plus_eagle,multimodal_support}.md
Updated engine configuration file paths in examples and instructions from recipes/*/trtllm/ to examples/backends/trtllm/engine_configs/*/.
Multinode Documentation
docs/backends/trtllm/multinode/{multinode-examples,multinode-multimodal-example}.md
Updated engine config paths and mount references from /recipes/ to /examples/backends/trtllm/engine_configs/. Added introductory notes and corrected heading numbering.
General Documentation
docs/kubernetes/README.md, examples/kubernetes/deploy/agg.yaml
Updated --extra-engine-args paths from recipes/deepseek-r1-distill-llama-8b/ to examples/backends/trtllm/engine_configs/deepseek-r1-distill-llama-8b/.
New Configuration Documentation
examples/backends/trtllm/engine_configs/README.md, examples/basics/multinode/trtllm/README.md
Added new README files documenting engine configuration structure, usage instructions, and multinode TRTLLM example setup.
Recipes Documentation
recipes/README.md
Rebranded from "Dynamo Model Serving Recipes" to "Dynamo Production-Ready Recipes". Restructured content to emphasize Kubernetes deployment, added recipe quality standards, and consolidated prerequisites and quick-start guidance.
Recipes Registry
recipes/gpt-oss-120b/trtllm/disagg/README.md
Added new README documenting GPT-OSS-120B disaggregated mode recipe status, listing engine configuration files and identifying missing deployment components.
Deleted
recipes/run.sh
Removed legacy Bash orchestration script that handled recipe deployment, model downloading, and benchmarking workflows via kubectl. Functionality consolidated into Kubernetes-centric documentation and manifests.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Areas requiring close attention:

  • Path consistency verification: Confirm all recipes/*/trtllm/ references have been replaced with examples/backends/trtllm/engine_configs/*/ across shell scripts and YAML files; spot-check a few launch scripts to ensure path construction logic is correct.
  • Engine config file references in YAML: Verify MOE load balancer and internal path references within engine config files (particularly wide_ep_agg.yaml, wide_ep_prefill.yaml, wide_ep_decode.yaml) point to correct relocated paths.
  • Documentation completeness: Ensure all documentation updates reflect the new paths accurately and that no stale references to old recipe paths remain in examples or guides.
  • Deployment YAML syntax: Validate YAML config files for syntax correctness and that indentation/structure remained intact during path updates.
  • Script deletion impact: Confirm recipes/run.sh removal does not break any external scripts or CI/CD pipelines that may have depended on it.

Poem

🐰 Paths once scattered through the recipes of old,
Now consolidated in engine_configs, bright and bold!
From trtllm depths to examples' gleaming shore,
The configs hop together—organized evermore!
One home for all, no more shall they roam,
In backends/trtllm, they've found their perfect home!

Pre-merge checks

❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Description check ⚠️ Warning The pull request description is entirely a placeholder template with no actual content filled in; all sections remain empty with only template comments. Fill in the description template with actual details about the changes: Overview of the path migration, specific files affected, and related issue #4159.
Title check ❓ Inconclusive The title 'feat: Pr 4159 fixes' is vague and generic, referring only to a PR number without clearly conveying what was actually changed in the codebase. Use a more descriptive title that clearly summarizes the main change, such as 'refactor: migrate TensorRT-LLM config paths from recipes to engine_configs'.
✅ Passed checks (1 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

  • Provide your own instructions using the high_level_summary_instructions setting.
  • Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
  • Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:

  1. 📝 Description — Summarize the main change in 50–60 words, explaining why this PR is needed, why this solution was chosen, and what was done.
  2. 📓 References — List relevant issues, discussions, documentation, or related PRs.
  3. 📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.
  4. 📊 Contributor Summary — Include a Markdown table showing contributions:
    | Contributor | Lines Added | Lines Removed | Files Changed |
  5. ✔️ Additional Notes — Add any extra reviewer context.
    Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (3)
recipes/gpt-oss-120b/trtllm/disagg/README.md (1)

1-25: Documentation clarity and warnings.

The warning on line 3 effectively communicates the incomplete status, and the structure is clear. However, the document should explicitly reference where the centralized configs are located (e.g., examples/backends/trtllm/engine_configs/gpt-oss-120b/) rather than implying they exist in the recipes directory, to prevent confusion during the repo's transition to the new structure.

Consider adding a clarification section:

 This directory contains TensorRT-LLM engine configurations for disaggregated serving:
-- `decode.yaml` - Decode worker engine configuration
-- `prefill.yaml` - Prefill worker engine configuration
+- `decode.yaml` - Decode worker engine configuration (see also: `examples/backends/trtllm/engine_configs/gpt-oss-120b/`)
+- `prefill.yaml` - Prefill worker engine configuration (see also: `examples/backends/trtllm/engine_configs/gpt-oss-120b/`)

Or, if the files no longer exist here:

 This directory contains TensorRT-LLM engine configurations for disaggregated serving:
-- `decode.yaml` - Decode worker engine configuration
-- `prefill.yaml` - Prefill worker engine configuration
+Engine configuration files are now centralized in `examples/backends/trtllm/engine_configs/gpt-oss-120b/`:
+- `decode.yaml` - Decode worker engine configuration
+- `prefill.yaml` - Prefill worker engine configuration
recipes/README.md (2)

31-31: Specify language for fenced code block.

Code blocks should declare their language for syntax highlighting. Since this shows a directory structure, use ```tree or ```text.

-```
+```tree

47-47: Replace bold emphasis with proper markdown headings.

Lines using bold text for section titles (e.g., **1. Dynamo Platform Installed**, **Step 1: Download Model**) should be converted to markdown headings (###) for proper document structure and accessibility.

Examples:

-**1. Dynamo Platform Installed**
+### 1. Dynamo Platform Installed
-**Step 1: Download Model**
+### Step 1: Download Model

Apply similar changes to all 9 instances flagged by the linter.

Also applies to: 54-54, 61-61, 75-75, 90-90, 103-103, 118-118, 137-137, 150-150

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 09bb1c6 and 9410f19.

📒 Files selected for processing (35)
  • benchmarks/router/run_engines.sh (2 hunks)
  • docs/backends/trtllm/README.md (1 hunks)
  • docs/backends/trtllm/gemma3_sliding_window_attention.md (4 hunks)
  • docs/backends/trtllm/gpt-oss.md (3 hunks)
  • docs/backends/trtllm/llama4_plus_eagle.md (2 hunks)
  • docs/backends/trtllm/multimodal_support.md (2 hunks)
  • docs/backends/trtllm/multinode/multinode-examples.md (4 hunks)
  • docs/backends/trtllm/multinode/multinode-multimodal-example.md (5 hunks)
  • docs/kubernetes/README.md (1 hunks)
  • examples/backends/trtllm/deploy/agg-with-config.yaml (1 hunks)
  • examples/backends/trtllm/deploy/agg.yaml (1 hunks)
  • examples/backends/trtllm/deploy/agg_router.yaml (1 hunks)
  • examples/backends/trtllm/deploy/disagg.yaml (2 hunks)
  • examples/backends/trtllm/deploy/disagg_planner.yaml (2 hunks)
  • examples/backends/trtllm/deploy/disagg_router.yaml (2 hunks)
  • examples/backends/trtllm/engine_configs/README.md (1 hunks)
  • examples/backends/trtllm/engine_configs/deepseek-r1/agg/wide_ep/wide_ep_agg.yaml (1 hunks)
  • examples/backends/trtllm/engine_configs/deepseek-r1/disagg/wide_ep/wide_ep_decode.yaml (1 hunks)
  • examples/backends/trtllm/engine_configs/deepseek-r1/disagg/wide_ep/wide_ep_prefill.yaml (1 hunks)
  • examples/backends/trtllm/engine_configs/gpt-oss-120b/decode.yaml (1 hunks)
  • examples/backends/trtllm/engine_configs/gpt-oss-120b/prefill.yaml (1 hunks)
  • examples/backends/trtllm/launch/agg.sh (1 hunks)
  • examples/backends/trtllm/launch/agg_metrics.sh (1 hunks)
  • examples/backends/trtllm/launch/agg_router.sh (1 hunks)
  • examples/backends/trtllm/launch/disagg.sh (1 hunks)
  • examples/backends/trtllm/launch/disagg_router.sh (1 hunks)
  • examples/backends/trtllm/launch/disagg_same_gpu.sh (1 hunks)
  • examples/backends/trtllm/launch/epd_disagg.sh (1 hunks)
  • examples/backends/trtllm/launch/gpt_oss_disagg.sh (1 hunks)
  • examples/basics/multinode/trtllm/README.md (1 hunks)
  • examples/basics/multinode/trtllm/srun_aggregated.sh (1 hunks)
  • examples/basics/multinode/trtllm/srun_disaggregated.sh (1 hunks)
  • recipes/README.md (1 hunks)
  • recipes/gpt-oss-120b/trtllm/disagg/README.md (1 hunks)
  • recipes/run.sh (0 hunks)
💤 Files with no reviewable changes (1)
  • recipes/run.sh
🧰 Additional context used
🧠 Learnings (7)
📓 Common learnings
Learnt from: biswapanda
Repo: ai-dynamo/dynamo PR: 3858
File: recipes/deepseek-r1/model-cache/model-download.yaml:18-32
Timestamp: 2025-10-24T04:21:08.751Z
Learning: In the recipes directory structure, model-specific recipes (e.g., recipes/deepseek-r1/, recipes/llama-3-70b/) contain hardcoded model names and revisions in their Kubernetes manifests (like model-download.yaml). Each recipe directory is deployment-specific and self-contained, so hardcoding model-specific values is the intended design pattern.
📚 Learning: 2025-10-24T04:21:08.751Z
Learnt from: biswapanda
Repo: ai-dynamo/dynamo PR: 3858
File: recipes/deepseek-r1/model-cache/model-download.yaml:18-32
Timestamp: 2025-10-24T04:21:08.751Z
Learning: In the recipes directory structure, model-specific recipes (e.g., recipes/deepseek-r1/, recipes/llama-3-70b/) contain hardcoded model names and revisions in their Kubernetes manifests (like model-download.yaml). Each recipe directory is deployment-specific and self-contained, so hardcoding model-specific values is the intended design pattern.

Applied to files:

  • recipes/gpt-oss-120b/trtllm/disagg/README.md
  • examples/backends/trtllm/deploy/agg-with-config.yaml
  • benchmarks/router/run_engines.sh
  • docs/backends/trtllm/multinode/multinode-multimodal-example.md
  • examples/backends/trtllm/deploy/disagg.yaml
  • examples/backends/trtllm/deploy/disagg_planner.yaml
  • examples/backends/trtllm/deploy/disagg_router.yaml
  • recipes/README.md
  • examples/backends/trtllm/deploy/agg.yaml
📚 Learning: 2025-07-02T13:20:28.800Z
Learnt from: fsaady
Repo: ai-dynamo/dynamo PR: 1730
File: examples/sglang/slurm_jobs/job_script_template.j2:59-59
Timestamp: 2025-07-02T13:20:28.800Z
Learning: In the SLURM job script template at examples/sglang/slurm_jobs/job_script_template.j2, the `--total_nodes` parameter represents the total nodes per worker type (prefill or decode), not the total nodes in the entire cluster. Each worker type needs to know its own group size for distributed coordination.

Applied to files:

  • examples/basics/multinode/trtllm/README.md
  • docs/backends/trtllm/multinode/multinode-examples.md
  • docs/backends/trtllm/llama4_plus_eagle.md
  • docs/backends/trtllm/multinode/multinode-multimodal-example.md
  • examples/basics/multinode/trtllm/srun_disaggregated.sh
📚 Learning: 2025-07-03T10:14:30.570Z
Learnt from: fsaady
Repo: ai-dynamo/dynamo PR: 1730
File: examples/sglang/slurm_jobs/scripts/worker_setup.py:230-244
Timestamp: 2025-07-03T10:14:30.570Z
Learning: In examples/sglang/slurm_jobs/scripts/worker_setup.py, background processes (like nats-server, etcd) are intentionally left running even if later processes fail. This design choice allows users to manually connect to nodes and debug issues without having to restart the entire SLURM job from scratch, providing operational flexibility for troubleshooting in cluster environments.

Applied to files:

  • examples/basics/multinode/trtllm/README.md
  • docs/backends/trtllm/multinode/multinode-examples.md
  • docs/backends/trtllm/multinode/multinode-multimodal-example.md
📚 Learning: 2025-07-03T09:44:41.470Z
Learnt from: fsaady
Repo: ai-dynamo/dynamo PR: 1730
File: examples/sglang/slurm_jobs/scripts/worker_setup.py:113-116
Timestamp: 2025-07-03T09:44:41.470Z
Learning: In examples/sglang/slurm_jobs/scripts/worker_setup.py, logging the full configuration file content is acceptable because the config file is public, contains only placeholder replacements (no sensitive data), and provides debugging benefits for users who may want to create configurations based on the logged output.

Applied to files:

  • examples/basics/multinode/trtllm/README.md
📚 Learning: 2025-06-05T01:46:15.509Z
Learnt from: GuanLuo
Repo: ai-dynamo/dynamo PR: 1371
File: examples/llm/benchmarks/vllm_multinode_setup.sh:18-25
Timestamp: 2025-06-05T01:46:15.509Z
Learning: In multi-node setups with head/worker architecture, the head node typically doesn't need environment variables pointing to its own services (like NATS_SERVER, ETCD_ENDPOINTS) because local processes can access them via localhost. Only worker nodes need these environment variables to connect to the head node's external IP address.

Applied to files:

  • docs/backends/trtllm/multinode/multinode-examples.md
  • examples/basics/multinode/trtllm/srun_aggregated.sh
  • examples/basics/multinode/trtllm/srun_disaggregated.sh
📚 Learning: 2025-07-31T11:26:48.422Z
Learnt from: KrishnanPrash
Repo: ai-dynamo/dynamo PR: 2217
File: components/backends/trtllm/engine_configs/deepseek_r1/wide_ep/wide_ep_prefill.yaml:18-0
Timestamp: 2025-07-31T11:26:48.422Z
Learning: TRTLLM LLM-API expects all caps for backend field names in configuration files. When migrating TRTLLM configurations, backend values like "WideEP" should be changed to "WIDEEP" to comply with the API requirements.

Applied to files:

  • examples/backends/trtllm/engine_configs/deepseek-r1/disagg/wide_ep/wide_ep_decode.yaml
  • examples/backends/trtllm/engine_configs/deepseek-r1/disagg/wide_ep/wide_ep_prefill.yaml
  • docs/backends/trtllm/multinode/multinode-multimodal-example.md
  • examples/backends/trtllm/engine_configs/README.md
  • examples/backends/trtllm/deploy/disagg.yaml
  • docs/backends/trtllm/README.md
  • examples/backends/trtllm/engine_configs/deepseek-r1/agg/wide_ep/wide_ep_agg.yaml
  • examples/backends/trtllm/launch/epd_disagg.sh
🪛 markdownlint-cli2 (0.18.1)
recipes/README.md

13-13: Table column count
Expected: 8; Actual: 7; Too few cells, row will be missing data

(MD056, table-column-count)


14-14: Table column count
Expected: 8; Actual: 7; Too few cells, row will be missing data

(MD056, table-column-count)


15-15: Table column count
Expected: 8; Actual: 7; Too few cells, row will be missing data

(MD056, table-column-count)


16-16: Table column count
Expected: 8; Actual: 7; Too few cells, row will be missing data

(MD056, table-column-count)


17-17: Table column count
Expected: 8; Actual: 7; Too few cells, row will be missing data

(MD056, table-column-count)


18-18: Table column count
Expected: 8; Actual: 7; Too few cells, row will be missing data

(MD056, table-column-count)


19-19: Table column count
Expected: 8; Actual: 7; Too few cells, row will be missing data

(MD056, table-column-count)


20-20: Table column count
Expected: 8; Actual: 7; Too few cells, row will be missing data

(MD056, table-column-count)


21-21: Table column count
Expected: 8; Actual: 7; Too few cells, row will be missing data

(MD056, table-column-count)


31-31: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


47-47: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


54-54: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


61-61: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


75-75: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


90-90: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


103-103: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


118-118: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


137-137: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


150-150: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: trtllm (amd64)
  • GitHub Check: trtllm (arm64)
  • GitHub Check: sglang (arm64)
  • GitHub Check: operator (arm64)
  • GitHub Check: sglang (amd64)
  • GitHub Check: vllm (amd64)
  • GitHub Check: operator (amd64)
  • GitHub Check: vllm (arm64)
🔇 Additional comments (37)
recipes/gpt-oss-120b/trtllm/disagg/README.md (2)

24-24: Relative path is valid—no issues found.

The file recipes/CONTRIBUTING.md exists at the referenced location, confirming that the relative path ../../../CONTRIBUTING.md from recipes/gpt-oss-120b/trtllm/disagg/ resolves correctly. The link will not break.


8-9: The original review comment's concern was based on an incorrect assumption. Verification confirms that decode.yaml and prefill.yaml do exist in recipes/gpt-oss-120b/trtllm/disagg/ (alongside copies in the centralized location at examples/backends/trtllm/engine_configs/gpt-oss-120b/). The README documentation at lines 8-9 is accurate—these files are present in the directory as stated. No changes are needed.

Likely an incorrect or invalid review comment.

docs/backends/trtllm/multinode/multinode-examples.md (4)

170-171: Disaggregated engine config paths verified as correct.

Both wide_ep_prefill.yaml and wide_ep_decode.yaml exist at the expected locations in the repository under examples/backends/trtllm/engine_configs/deepseek-r1/disagg/wide_ep/. The commented paths are accurate and the migration from recipes/ has been completed successfully.


141-141: Path verification confirms the commented example is correct.

The file examples/backends/trtllm/engine_configs/deepseek-r1/agg/wide_ep/wide_ep_agg.yaml exists and the path structure is consistent with the consolidation from the recipes/ structure. The /mnt/examples/ prefix in the commented example is appropriate for containerized deployments where this path would be mounted.


20-20: Path reference is valid; no issues found.

All paths referenced in the documentation have been verified:

  • The script directory examples/basics/multinode/trtllm/ exists and contains the referenced scripts (srun_aggregated.sh and srun_disaggregated.sh)
  • The engine config directory examples/backends/trtllm/engine_configs/ exists with all referenced configuration files

111-112: MOUNTS path configuration is correct and provides full access to all referenced engine config files.

All three engine config files exist at their expected locations in the repository and are accessible via the MOUNTS configuration:

  • examples/backends/trtllm/engine_configs/deepseek-r1/agg/wide_ep/wide_ep_agg.yaml (line 141)
  • examples/backends/trtllm/engine_configs/deepseek-r1/disagg/wide_ep/wide_ep_prefill.yaml (line 170)
  • examples/backends/trtllm/engine_configs/deepseek-r1/disagg/wide_ep/wide_ep_decode.yaml (line 171)

The MOUNTS configuration ${PWD}/../../../../:/mnt correctly mounts the repository root to /mnt in the container, making all referenced paths accessible.

examples/backends/trtllm/deploy/agg-with-config.yaml (1)

54-70: Verify path resolution relative to working directory.

The working directory is /workspace/examples/backends/trtllm (line 54), and the --extra-engine-args path on line 70 uses ./examples/backends/.... This resolves to /workspace/examples/backends/trtllm/./examples/backends/trtllm/engine_configs/qwen3/agg.yaml, which appears to have a redundant path segment.

Confirm whether the intended path is:

  • examples/backends/trtllm/engine_configs/qwen3/agg.yaml (relative to /workspace), or
  • ../../../examples/backends/trtllm/engine_configs/qwen3/agg.yaml (relative to /workspace/examples/backends/trtllm), or
  • Another variant.
examples/backends/trtllm/engine_configs/deepseek-r1/disagg/wide_ep/wide_ep_prefill.yaml (1)

18-20: Verify path format consistency across configuration and deployment files.

The load_balancer path uses an absolute /mnt/ prefix on line 20, whereas deployment files use relative paths (./examples/...). Confirm this difference is intentional and that:

  1. The /mnt/ path is valid within the TensorRT-LLM engine configuration context
  2. The path is consistent with how the load_balancer field is resolved by TensorRT-LLM

Additionally, verify that the referenced eplb.yaml file exists or will be created at this location.

examples/backends/trtllm/engine_configs/deepseek-r1/disagg/wide_ep/wide_ep_decode.yaml (1)

18-20: Verify path format consistency across configuration and deployment files.

Same concern as wide_ep_prefill.yaml: the absolute /mnt/ prefix on line 20 differs from relative paths in deployment files. Verify this path format is valid in the TensorRT-LLM engine configuration context and that the referenced eplb.yaml file exists at the expected location.

examples/backends/trtllm/deploy/disagg.yaml (2)

29-29: Verify path resolution for prefill worker configuration (line 40).

The working directory is /workspace/ (line 29), and the path on line 40 uses ./examples/backends/trtllm/engine_configs/qwen3/prefill.yaml. Verify this resolves correctly to the intended file location.

Also applies to: 40-40


55-55: Verify path resolution for decode worker configuration (line 66).

The working directory is /workspace/ (line 55), and the path on line 66 uses ./examples/backends/trtllm/engine_configs/qwen3/decode.yaml. Verify this resolves correctly to the intended file location.

Also applies to: 66-66

examples/backends/trtllm/engine_configs/deepseek-r1/agg/wide_ep/wide_ep_agg.yaml (1)

6-14: Verify path format consistency across configuration and deployment files.

The load_balancer path on line 14 uses the absolute /mnt/ prefix, consistent with other wide_ep configurations. Verify that this path format is valid in the TensorRT-LLM engine configuration context and that the referenced eplb.yaml file exists at this location.

docs/kubernetes/README.md (1)

201-207: Clarify path format in documentation example.

The documentation example on line 206 uses /workspace/examples/backends/trtllm/engine_configs/deepseek-r1-distill-llama-8b/agg.yaml with an absolute /workspace prefix. This differs from:

  • Deployment files using ./examples/backends/...
  • Engine config files using /mnt/examples/backends/...

Confirm whether /workspace is the correct prefix for a documentation example or if it should match one of the other patterns (relative path or /mnt/). Additionally, verify that the file path uses dashes (deepseek-r1-distill-llama-8b) and not underscores.

examples/backends/trtllm/deploy/agg_router.yaml (1)

31-31: Verify path resolution for worker configuration (line 42).

The working directory is /workspace/ (line 31), and the path on line 42 uses ./examples/backends/trtllm/engine_configs/qwen3/agg.yaml. Verify this resolves correctly to the intended file location.

Also applies to: 42-42

examples/backends/trtllm/deploy/disagg_router.yaml (2)

31-31: Verify path resolution for prefill worker configuration (line 42).

The working directory is /workspace/ (line 31), and the path on line 42 uses ./examples/backends/trtllm/engine_configs/qwen3/prefill.yaml. Verify this resolves correctly to the intended file location.

Also applies to: 42-42


57-57: Verify path resolution for decode worker configuration (line 68).

The working directory is /workspace/ (line 57), and the path on line 68 uses ./examples/backends/trtllm/engine_configs/qwen3/decode.yaml. Verify this resolves correctly to the intended file location.

Also applies to: 68-68

examples/backends/trtllm/launch/agg_metrics.sh (1)

9-9: Path consolidation approved.

The default AGG_ENGINE_ARGS path has been correctly updated to the new centralized engine_configs location. This aligns with the PR's objective to consolidate configuration files.

examples/basics/multinode/trtllm/srun_aggregated.sh (1)

21-21: Container path correctly updated for multinode deployment.

The ENGINE_CONFIG path has been updated to the new location with the appropriate /mnt container mount prefix. The full subdirectory hierarchy (agg/wide_ep/) is preserved in the new engine_configs location.

examples/basics/multinode/trtllm/README.md (1)

1-20: New reference README is appropriate.

Minimal reference documentation with proper licensing. The link structure allows readers to find the detailed setup guide.

examples/backends/trtllm/launch/epd_disagg.sh (1)

9-11: EPD disaggregated configuration paths properly consolidated.

All three engine configuration paths for encode, prefill, and decode workers have been updated to the new engine_configs location. The separation of encode functionality for multimodal models is appropriate for this disaggregated setup.

docs/backends/trtllm/multimodal_support.md (1)

30-30: Path consolidation with standard extensions approved.

All engine configuration paths for both aggregated and disaggregated multimodal deployments have been updated to the new engine_configs location with standard .yaml extensions.

Also applies to: 82-83

docs/backends/trtllm/multinode/multinode-multimodal-example.md (1)

20-20: Multinode multimodal example properly updated with important fixes.

Multiple improvements here:

  • Line 20: Helpful note pointing to example script locations
  • Line 39: Important sed command fixing backend field case (DEFAULT → default) aligns with TensorRT-LLM API requirements
  • Lines 105-106: Path consolidation with standard .yaml extensions
  • Line 128: Formatting fix for output enumeration

These changes support the path consolidation objective while addressing necessary runtime compatibility fixes.

Also applies to: 39-39, 105-106, 128-128

docs/backends/trtllm/gemma3_sliding_window_attention.md (1)

33-33: Gemma3 VSWA configuration paths comprehensively updated.

All engine configuration paths have been properly consolidated to the new engine_configs location across all serving modes (aggregated, disaggregated, with/without KV routing). Standard .yaml extensions and $DYNAMO_HOME prefix used consistently.

Also applies to: 42-42, 51-52, 61-62

docs/backends/trtllm/llama4_plus_eagle.md (1)

55-55: Fix broken file references to actual config filenames.

Lines 63 and 65 reference config files with .yml extension, but the actual files use .yaml. This will cause runtime failures when the scripts attempt to load these configurations.

Line 55 is correct as-is (eagle_agg.yml exists). Only lines 63 and 65 need correction:

-export PREFILL_ENGINE_CONFIG="/mnt/examples/backends/trtllm/engine_configs/llama4/eagle/eagle_prefill.yml"
+export PREFILL_ENGINE_CONFIG="/mnt/examples/backends/trtllm/engine_configs/llama4/eagle/eagle_prefill.yaml"
-export DECODE_ENGINE_CONFIG="/mnt/examples/backends/trtllm/engine_configs/llama4/eagle/eagle_decode.yml"
+export DECODE_ENGINE_CONFIG="/mnt/examples/backends/trtllm/engine_configs/llama4/eagle/eagle_decode.yaml"
⛔ Skipped due to learnings
Learnt from: KrishnanPrash
Repo: ai-dynamo/dynamo PR: 2217
File: components/backends/trtllm/engine_configs/deepseek_r1/wide_ep/wide_ep_prefill.yaml:18-0
Timestamp: 2025-07-31T11:26:48.422Z
Learning: TRTLLM LLM-API expects all caps for backend field names in configuration files. When migrating TRTLLM configurations, backend values like "WideEP" should be changed to "WIDEEP" to comply with the API requirements.
examples/basics/multinode/trtllm/srun_disaggregated.sh (1)

20-20: Engine config paths consolidated correctly; verify YAML files exist at new locations.

The path updates from recipes/deepseek-r1/trtllm/disagg/ to examples/backends/trtllm/engine_configs/deepseek-r1/disagg/ are consistent with the PR objectives. Variables are properly used downstream at lines 65 and 85.

Please verify that the actual engine configuration YAML files (wide_ep_prefill.yaml and wide_ep_decode.yaml) exist at the new centralized locations.

Also applies to: 24-24

examples/backends/trtllm/deploy/agg.yaml (1)

39-39: Path relocated to centralized engine configs; verify YAML file exists.

The engine config path has been properly updated to reference the centralized location. Verify that examples/backends/trtllm/engine_configs/qwen3/agg.yaml exists.

examples/backends/trtllm/launch/agg_router.sh (1)

9-9: Engine config path consolidated consistently.

The default AGG_ENGINE_ARGS path has been updated correctly to the centralized location. This is consistent with similar updates in other launch scripts.

Verify that $DYNAMO_HOME/examples/backends/trtllm/engine_configs/qwen3/agg.yaml exists at deployment time.

examples/backends/trtllm/launch/disagg_router.sh (1)

9-10: Disaggregated config paths consolidated correctly.

Both PREFILL_ENGINE_ARGS and DECODE_ENGINE_ARGS have been updated to reference the centralized engine_configs location, maintaining the separate prefill/decode configuration pattern.

Verify that both prefill.yaml and decode.yaml exist at the new paths under examples/backends/trtllm/engine_configs/qwen3/.

examples/backends/trtllm/engine_configs/README.md (1)

1-44: Documentation for centralized engine_configs directory is clear and comprehensive.

The new README provides users with effective guidance on using the engine configuration files, including usage examples, configuration categories, key parameters, and important notes on WideEP and dtype matching.

docs/backends/trtllm/README.md (1)

161-161: Documentation path updated correctly for MTP example.

The AGG_ENGINE_ARGS path in the DeepSeek R1 MTP example has been properly updated to the centralized engine_configs location. The relative path format is appropriate for documentation showing commands to run from examples/backends/trtllm.

examples/backends/trtllm/launch/agg.sh (1)

9-9: Engine config path consolidated to centralized location.

The default AGG_ENGINE_ARGS path has been updated consistently with other launch scripts to reference the centralized engine_configs directory.

Verify that $DYNAMO_HOME/examples/backends/trtllm/engine_configs/qwen3/agg.yaml exists.

examples/backends/trtllm/deploy/disagg_planner.yaml (1)

100-100: Disaggregated deployment paths consolidated correctly for both workers.

The --extra-engine-args paths for both TRTLLMDecodeWorker and TRTLLMPrefillWorker have been updated to reference the centralized engine_configs location. The disaggregation modes are properly maintained.

Verify that the engine config files (decode.yaml and prefill.yaml) exist at ./examples/backends/trtllm/engine_configs/qwen3/ relative to the Kubernetes working directory.

Also applies to: 127-127

docs/backends/trtllm/gpt-oss.md (1)

93-93: Documentation path references updated correctly.

All path references to engine configuration files have been updated consistently to the new centralized location (examples/backends/trtllm/engine_configs/). The documentation accurately reflects the actual file locations and will guide users correctly.

Also applies to: 100-100, 148-148, 164-164

benchmarks/router/run_engines.sh (2)

87-101: Engine config path construction is sound.

The conditional logic correctly constructs paths to agg.yaml, prefill.yaml, or decode.yaml based on the MODE variable. The default path correctly points to the new centralized engine_configs directory.


10-10: Migration of RECIPE_PATH to ENGINE_CONFIG_PATH is complete and verified.

Verification confirms:

  • No remaining references to RECIPE_PATH exist elsewhere in the codebase
  • ENGINE_CONFIG_PATH is properly introduced and consistently used across all five locations in the file (definition + four uses in config path construction)
  • The variable is scoped to this file only, with no cross-file dependencies
examples/backends/trtllm/launch/gpt_oss_disagg.sh (1)

9-10: Launch script paths updated correctly for GPT-OSS.

The PREFILL_ENGINE_ARGS and DECODE_ENGINE_ARGS variables now reference the correct paths in the new centralized engine_configs directory. These will resolve to the prefill.yaml and decode.yaml files created in this PR.

examples/backends/trtllm/launch/disagg.sh (1)

9-10: Launch script paths updated correctly for Qwen3.

The PREFILL_ENGINE_ARGS and DECODE_ENGINE_ARGS variables now reference the new centralized engine_configs directory paths for the qwen3 backend.

Comment on lines +21 to +23
cache_transceiver_config:
backend: UCX
max_tokens_in_buffer: 65536
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix YAML indentation inconsistency in cache_transceiver_config.

Lines 22-23 use 2-space indentation while other nested configs (moe_config, cuda_graph_config) use 4-space indentation. Standardize to 4-space indentation for consistency.

 cache_transceiver_config:
-  backend: UCX
-  max_tokens_in_buffer: 65536
+    backend: UCX
+    max_tokens_in_buffer: 65536
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
cache_transceiver_config:
backend: UCX
max_tokens_in_buffer: 65536
cache_transceiver_config:
backend: UCX
max_tokens_in_buffer: 65536
🤖 Prompt for AI Agents
In examples/backends/trtllm/engine_configs/gpt-oss-120b/decode.yaml around lines
21 to 23, the cache_transceiver_config block uses 2-space indentation for its
nested keys while other nested blocks use 4-space indentation; update the nested
lines (backend and max_tokens_in_buffer) to use 4-space indentation so the YAML
nesting matches the rest of the file and remains consistent.

Comment on lines +23 to +25
cache_transceiver_config:
backend: UCX
max_tokens_in_buffer: 65536
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix YAML indentation inconsistency in cache_transceiver_config.

Lines 24-25 use 2-space indentation while other nested configs (moe_config, cuda_graph_config) use 4-space indentation. Standardize to 4-space indentation for consistency.

 cache_transceiver_config:
-  backend: UCX
-  max_tokens_in_buffer: 65536
+    backend: UCX
+    max_tokens_in_buffer: 65536
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
cache_transceiver_config:
backend: UCX
max_tokens_in_buffer: 65536
cache_transceiver_config:
backend: UCX
max_tokens_in_buffer: 65536
🤖 Prompt for AI Agents
In examples/backends/trtllm/engine_configs/gpt-oss-120b/prefill.yaml around
lines 23 to 25, the cache_transceiver_config block uses 2-space indentation for
its nested keys while other nested configs use 4 spaces; update the indentation
of the keys under cache_transceiver_config (backend and max_tokens_in_buffer) to
use 4 spaces so the file uses a consistent 4-space indentation for nested
configuration blocks.

Comment on lines +35 to +36
export PREFILL_ENGINE_ARGS=${PREFILL_ENGINE_ARGS:-"$DYNAMO_HOME/tests/serve/trtllm/engine_configs/qwen3/prefill.yaml"}
export DECODE_ENGINE_ARGS=${DECODE_ENGINE_ARGS:-"$DYNAMO_HOME/tests/serve/trtllm/engine_configs/qwen3/decode.yaml"}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fix path inconsistency: disagg_same_gpu.sh uses wrong engine_configs location.

Lines 35-36 reference tests/serve/trtllm/engine_configs while all other launch scripts in this PR (gpt_oss_disagg.sh, disagg.sh) reference examples/backends/trtllm/engine_configs. This breaks consistency with the PR objective to consolidate configs under examples/backends/trtllm/engine_configs/.

-export PREFILL_ENGINE_ARGS=${PREFILL_ENGINE_ARGS:-"$DYNAMO_HOME/tests/serve/trtllm/engine_configs/qwen3/prefill.yaml"}
-export DECODE_ENGINE_ARGS=${DECODE_ENGINE_ARGS:-"$DYNAMO_HOME/tests/serve/trtllm/engine_configs/qwen3/decode.yaml"}
+export PREFILL_ENGINE_ARGS=${PREFILL_ENGINE_ARGS:-"$DYNAMO_HOME/examples/backends/trtllm/engine_configs/qwen3/prefill.yaml"}
+export DECODE_ENGINE_ARGS=${DECODE_ENGINE_ARGS:-"$DYNAMO_HOME/examples/backends/trtllm/engine_configs/qwen3/decode.yaml"}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
export PREFILL_ENGINE_ARGS=${PREFILL_ENGINE_ARGS:-"$DYNAMO_HOME/tests/serve/trtllm/engine_configs/qwen3/prefill.yaml"}
export DECODE_ENGINE_ARGS=${DECODE_ENGINE_ARGS:-"$DYNAMO_HOME/tests/serve/trtllm/engine_configs/qwen3/decode.yaml"}
export PREFILL_ENGINE_ARGS=${PREFILL_ENGINE_ARGS:-"$DYNAMO_HOME/examples/backends/trtllm/engine_configs/qwen3/prefill.yaml"}
export DECODE_ENGINE_ARGS=${DECODE_ENGINE_ARGS:-"$DYNAMO_HOME/examples/backends/trtllm/engine_configs/qwen3/decode.yaml"}
🤖 Prompt for AI Agents
In examples/backends/trtllm/launch/disagg_same_gpu.sh around lines 35 to 36, the
PREFILL_ENGINE_ARGS and DECODE_ENGINE_ARGS default paths point to
tests/serve/trtllm/engine_configs/... which is inconsistent with other launch
scripts; update both defaults to use examples/backends/trtllm/engine_configs/...
(i.e., replace the tests/serve/trtllm path prefix with examples/backends/trtllm)
so the script references the consolidated engine_configs location used across
the PR.

Comment on lines +10 to +21
| Model | Framework | Mode | GPUs | Deployment | Benchmark Recipe | Notes | GAIE integration |
|-------|-----------|------|------|------------|------------------|-------|------------------|
| **[Llama-3-70B](llama-3-70b/vllm/agg/)** | vLLM | Aggregated | 4x H100/H200 ||| FP8 dynamic quantization ||
| **[Llama-3-70B](llama-3-70b/vllm/disagg-single-node/)** | vLLM | Disagg (Single-Node) | 8x H100/H200 ||| Prefill + Decode separation |
| **[Llama-3-70B](llama-3-70b/vllm/disagg-multi-node/)** | vLLM | Disagg (Multi-Node) | 16x H100/H200 ||| 2 nodes, 8 GPUs each |
| **[Qwen3-32B-FP8](qwen3-32b-fp8/trtllm/agg/)** | TensorRT-LLM | Aggregated | 4x GPU ||| FP8 quantization |
| **[Qwen3-32B-FP8](qwen3-32b-fp8/trtllm/disagg/)** | TensorRT-LLM | Disaggregated | 8x GPU ||| Prefill + Decode separation |
| **[GPT-OSS-120B](gpt-oss-120b/trtllm/agg/)** | TensorRT-LLM | Aggregated | 4x GB200 ||| Blackwell only, WideEP |
| **[GPT-OSS-120B](gpt-oss-120b/trtllm/disagg/)** | TensorRT-LLM | Disaggregated | TBD ||| Engine configs only, no K8s manifest |
| **[DeepSeek-R1](deepseek-r1/sglang/disagg-8gpu/)** | SGLang | Disagg WideEP | 8x H200 ||| Benchmark recipe pending |
| **[DeepSeek-R1](deepseek-r1/sglang/disagg-16gpu/)** | SGLang | Disagg WideEP | 16x H200 ||| Benchmark recipe pending |
| **[DeepSeek-R1](deepseek-r1/trtllm/disagg/wide_ep/gb200/)** | TensorRT-LLM | Disagg WideEP (GB200) | 32+4 GB200 ||| Multi-node: 8 decode + 1 prefill nodes |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fix table column alignment—most rows missing GAIE integration column.

The table header declares 8 columns (including "GAIE integration"), but rows 13–21 only have 7 cells. Either add empty cells (| |) or remove the GAIE column header to match the data rows. Most models do not have GAIE integration values; consider whether that column is necessary or if placeholder content should be added consistently.

-| **[Llama-3-70B](llama-3-70b/vllm/disagg-single-node/)** | vLLM | Disagg (Single-Node) | 8x H100/H200 | ✅ | ✅ | Prefill + Decode separation |
+| **[Llama-3-70B](llama-3-70b/vllm/disagg-single-node/)** | vLLM | Disagg (Single-Node) | 8x H100/H200 | ✅ | ✅ | Prefill + Decode separation | |

Apply the same fix (add trailing | |) to lines 14–21.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
| Model | Framework | Mode | GPUs | Deployment | Benchmark Recipe | Notes | GAIE integration |
|-------|-----------|------|------|------------|------------------|-------|------------------|
| **[Llama-3-70B](llama-3-70b/vllm/agg/)** | vLLM | Aggregated | 4x H100/H200 ||| FP8 dynamic quantization ||
| **[Llama-3-70B](llama-3-70b/vllm/disagg-single-node/)** | vLLM | Disagg (Single-Node) | 8x H100/H200 ||| Prefill + Decode separation |
| **[Llama-3-70B](llama-3-70b/vllm/disagg-multi-node/)** | vLLM | Disagg (Multi-Node) | 16x H100/H200 ||| 2 nodes, 8 GPUs each |
| **[Qwen3-32B-FP8](qwen3-32b-fp8/trtllm/agg/)** | TensorRT-LLM | Aggregated | 4x GPU ||| FP8 quantization |
| **[Qwen3-32B-FP8](qwen3-32b-fp8/trtllm/disagg/)** | TensorRT-LLM | Disaggregated | 8x GPU ||| Prefill + Decode separation |
| **[GPT-OSS-120B](gpt-oss-120b/trtllm/agg/)** | TensorRT-LLM | Aggregated | 4x GB200 ||| Blackwell only, WideEP |
| **[GPT-OSS-120B](gpt-oss-120b/trtllm/disagg/)** | TensorRT-LLM | Disaggregated | TBD ||| Engine configs only, no K8s manifest |
| **[DeepSeek-R1](deepseek-r1/sglang/disagg-8gpu/)** | SGLang | Disagg WideEP | 8x H200 ||| Benchmark recipe pending |
| **[DeepSeek-R1](deepseek-r1/sglang/disagg-16gpu/)** | SGLang | Disagg WideEP | 16x H200 ||| Benchmark recipe pending |
| **[DeepSeek-R1](deepseek-r1/trtllm/disagg/wide_ep/gb200/)** | TensorRT-LLM | Disagg WideEP (GB200) | 32+4 GB200 ||| Multi-node: 8 decode + 1 prefill nodes |
| Model | Framework | Mode | GPUs | Deployment | Benchmark Recipe | Notes | GAIE integration |
|-------|-----------|------|------|------------|------------------|-------|------------------|
| **[Llama-3-70B](llama-3-70b/vllm/agg/)** | vLLM | Aggregated | 4x H100/H200 ||| FP8 dynamic quantization ||
| **[Llama-3-70B](llama-3-70b/vllm/disagg-single-node/)** | vLLM | Disagg (Single-Node) | 8x H100/H200 ||| Prefill + Decode separation | |
| **[Llama-3-70B](llama-3-70b/vllm/disagg-multi-node/)** | vLLM | Disagg (Multi-Node) | 16x H100/H200 ||| 2 nodes, 8 GPUs each | |
| **[Qwen3-32B-FP8](qwen3-32b-fp8/trtllm/agg/)** | TensorRT-LLM | Aggregated | 4x GPU ||| FP8 quantization | |
| **[Qwen3-32B-FP8](qwen3-32b-fp8/trtllm/disagg/)** | TensorRT-LLM | Disaggregated | 8x GPU ||| Prefill + Decode separation | |
| **[GPT-OSS-120B](gpt-oss-120b/trtllm/agg/)** | TensorRT-LLM | Aggregated | 4x GB200 ||| Blackwell only, WideEP | |
| **[GPT-OSS-120B](gpt-oss-120b/trtllm/disagg/)** | TensorRT-LLM | Disaggregated | TBD ||| Engine configs only, no K8s manifest | |
| **[DeepSeek-R1](deepseek-r1/sglang/disagg-8gpu/)** | SGLang | Disagg WideEP | 8x H200 ||| Benchmark recipe pending | |
| **[DeepSeek-R1](deepseek-r1/sglang/disagg-16gpu/)** | SGLang | Disagg WideEP | 16x H200 ||| Benchmark recipe pending | |
| **[DeepSeek-R1](deepseek-r1/trtllm/disagg/wide_ep/gb200/)** | TensorRT-LLM | Disagg WideEP (GB200) | 32+4 GB200 ||| Multi-node: 8 decode + 1 prefill nodes | |
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)

13-13: Table column count
Expected: 8; Actual: 7; Too few cells, row will be missing data

(MD056, table-column-count)


14-14: Table column count
Expected: 8; Actual: 7; Too few cells, row will be missing data

(MD056, table-column-count)


15-15: Table column count
Expected: 8; Actual: 7; Too few cells, row will be missing data

(MD056, table-column-count)


16-16: Table column count
Expected: 8; Actual: 7; Too few cells, row will be missing data

(MD056, table-column-count)


17-17: Table column count
Expected: 8; Actual: 7; Too few cells, row will be missing data

(MD056, table-column-count)


18-18: Table column count
Expected: 8; Actual: 7; Too few cells, row will be missing data

(MD056, table-column-count)


19-19: Table column count
Expected: 8; Actual: 7; Too few cells, row will be missing data

(MD056, table-column-count)


20-20: Table column count
Expected: 8; Actual: 7; Too few cells, row will be missing data

(MD056, table-column-count)


21-21: Table column count
Expected: 8; Actual: 7; Too few cells, row will be missing data

(MD056, table-column-count)

🤖 Prompt for AI Agents
recipes/README.md around lines 10 to 21: the markdown table header declares 8
columns including "GAIE integration" but the data rows on lines 14–21 only have
7 cells, so add a trailing empty cell to each of those rows; edit lines 14–21 to
append " | |" (i.e., add a final empty column cell) to ensure every row has the
same 8 columns as the header.


# Deploy GPT-OSS-120B with TensorRT-LLM
./run.sh --model gpt-oss-120b --framework trtllm --deployment agg
**Step 4: Run Benchmark (Optional)**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Correct step numbering—both are labeled "Step 4".

Line 137 is "Step 4: Run Benchmark" and line 150 is "Step 4: GAIE Integration". The GAIE section should be "Step 5" to maintain sequential numbering.

-**Step 4: GAIE Integration (Optional)**
+**Step 5: GAIE Integration (Optional)**

Also applies to: 150-150

🧰 Tools
🪛 markdownlint-cli2 (0.18.1)

137-137: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

🤖 Prompt for AI Agents
In recipes/README.md around lines 137 and 150, the heading "Step 4: Run
Benchmark (Optional)" and "Step 4: GAIE Integration" use the same step number;
update the GAIE Integration heading on line 150 to "Step 5: GAIE Integration" so
the steps are sequential and consistent, and scan nearby headings to ensure
subsequent step numbers (if any) are adjusted accordingly.

For Llama-3-70B with vLLM (Aggregated), an example of integration with the Inference Gateway is provided.

2. Apply manifests by running a script.
Follow to Follow [Deploy Inference Gateway Section 2](../deploy/inference-gateway/README.md#2-deploy-inference-gateway) to install GAIE. Then apply manifests.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix duplicate word "Follow".

Line 154 contains "Follow to Follow"—remove the duplicate.

-Follow to Follow [Deploy Inference Gateway Section 2](../deploy/inference-gateway/README.md#2-deploy-inference-gateway) to install GAIE.
+Follow [Deploy Inference Gateway Section 2](../deploy/inference-gateway/README.md#2-deploy-inference-gateway) to install GAIE.
🤖 Prompt for AI Agents
In recipes/README.md around line 154, the sentence contains the duplicated word
"Follow to Follow"; remove the extra "Follow" so the line reads correctly (e.g.,
"Follow to Deploy Inference Gateway Section 2..." or better "Follow Deploy
Inference Gateway Section 2...") ensuring grammar and the link remain intact.

@grahamking
Copy link
Contributor

Add detail to the title and description please.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants