ci: Add support for `max_inflight_requests` parameter to prevent unbounded memory growth in ensemble models #8458

pskiran1 · 2025-10-13T17:22:37Z

What does the PR do?

This PR adds testing for a new parameter max_inflight_requests.
Added ensemble_backpressure_test.py with custom decoupled producer and slow consumer models to validate the new feature.

Checklist

Commit Type:

Check the conventional commit type
box here and add the label to the github PR.

Related PRs:

triton-inference-server/core#455
triton-inference-server/common#141

Where should the reviewer start?

Test plan:

CI Pipeline ID: 37719578

Caveats:

Background

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

closes GitHub issue: #xxx

Copilot

Pull Request Overview

This PR adds support for a new max_ensemble_inflight_responses parameter to ensemble models for preventing unbounded memory growth in scenarios with decoupled models and slow consumers.

Implements backpressure mechanism to limit concurrent responses in ensemble pipelines
Adds comprehensive test coverage including valid/invalid parameter validation
Creates new test models for decoupled producer and slow consumer scenarios

Reviewed Changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
qa/L0_simple_ensemble/test.sh	Adds backpressure testing logic and invalid parameter validation
qa/L0_simple_ensemble/models/slow_consumer/config.pbtxt	Configures Python backend model with intentional processing delay
qa/L0_simple_ensemble/models/slow_consumer/1/model.py	Implements model that adds 200ms delay per request to simulate slow processing
qa/L0_simple_ensemble/models/ensemble_enabled_max_inflight_responses/config.pbtxt	Ensemble configuration with backpressure parameter set to 4
qa/L0_simple_ensemble/models/ensemble_disabled_max_inflight_responses/config.pbtxt	Baseline ensemble configuration without backpressure parameter
qa/L0_simple_ensemble/models/decoupled_producer/config.pbtxt	Configures decoupled Python model for multiple response generation
qa/L0_simple_ensemble/models/decoupled_producer/1/model.py	Implements decoupled model that produces N responses based on input value
qa/L0_simple_ensemble/ensemble_backpressure_test.py	Comprehensive test suite for backpressure functionality

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

qa/L0_simple_ensemble/test.sh

qa/L0_simple_ensemble/ensemble_backpressure_test.py

qa/L0_simple_ensemble/models/slow_consumer/1/model.py

Copilot

Pull Request Overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

qa/L0_simple_ensemble/backpressure_test_models/decoupled_producer/1/model.py

qa/L0_simple_ensemble/ensemble_backpressure_test.py

qa/L0_simple_ensemble/test.sh

...imple_ensemble/backpressure_test_models/ensemble_disabled_max_inflight_requests/config.pbtxt

qa/L0_simple_ensemble/test.sh

qa/L0_simple_ensemble/models/slow_consumer/1/model.py

docs/user_guide/decoupled_models.md

qa/L0_simple_ensemble/ensemble_backpressure_test.py

Co-authored-by: Yingge He <157551214+yinggeh@users.noreply.github.com>

qa/L0_simple_ensemble/ensemble_backpressure_test.py

Co-authored-by: Yingge He <157551214+yinggeh@users.noreply.github.com>

docs/user_guide/ensemble_models.md

yinggeh · 2025-10-31T17:49:28Z

docs/user_guide/ensemble_models.md


 * **Zero overhead when disabled**: If `max_inflight_requests: 0` (default),
  no synchronization overhead is incurred.
 * **Minimal overhead when enabled**: Uses a blocking/wakeup mechanism per ensemble step, where upstream models are paused ("blocked") when the inflight requests limit is reached and resumed ("woken up") as downstream models complete processing them. This synchronization ensures memory usage stays within bounds, though it may increase latency.


Add a note here say that this delay will not cause intermediate inflight requests cancel/timeout internally, but the client should be aware of latency being added, etc.

And in model_config.proto

Added note in both places.

docs/user_guide/ensemble_models.md

docs/user_guide/decoupled_models.md

Co-authored-by: Yingge He <157551214+yinggeh@users.noreply.github.com>

yinggeh

Good work!

pskiran1 added 3 commits October 13, 2025 20:14

Update

240a223

Update

974aa25

Update

337e0a7

pskiran1 requested a review from Copilot October 13, 2025 17:24

Copilot AI reviewed Oct 13, 2025

View reviewed changes

qa/L0_simple_ensemble/test.sh Outdated Show resolved Hide resolved

github-advanced-security bot found potential problems Oct 13, 2025

View reviewed changes

qa/L0_simple_ensemble/ensemble_backpressure_test.py Fixed Show fixed Hide fixed

qa/L0_simple_ensemble/ensemble_backpressure_test.py Fixed Show fixed Hide fixed

qa/L0_simple_ensemble/models/slow_consumer/1/model.py Fixed Show fixed Hide fixed

pskiran1 added 5 commits October 13, 2025 23:09

Update

05dcb71

Fix pre-commit

4f379ed

Fix pre-commit errors

9ed216f

Update

78698fc

Update

8665a0d

pskiran1 added the PR: ci Changes to our CI configuration files and scripts label Oct 13, 2025

pskiran1 requested a review from Copilot October 13, 2025 18:04

Copilot AI reviewed Oct 13, 2025

View reviewed changes

qa/L0_simple_ensemble/backpressure_test_models/decoupled_producer/1/model.py Show resolved Hide resolved

qa/L0_simple_ensemble/ensemble_backpressure_test.py Show resolved Hide resolved

qa/L0_simple_ensemble/test.sh Outdated Show resolved Hide resolved

Update

0258eda

pskiran1 mentioned this pull request Oct 13, 2025

feat: Add support for max_inflight_requests parameter to prevent unbounded memory growth in ensemble models triton-inference-server/core#455

Merged

pskiran1 requested review from tanmayv25, whoisj and yinggeh October 14, 2025 05:44

pskiran1 added 5 commits October 14, 2025 18:54

Remove duplicate code and add request cancellation test

81561fd

Fix pre-commit

10dacec

Fix pre-commit

e2e48a3

Update

f8f1468

Update

3d8b848

yinggeh reviewed Oct 15, 2025

View reviewed changes

...imple_ensemble/backpressure_test_models/ensemble_disabled_max_inflight_requests/config.pbtxt Show resolved Hide resolved

qa/L0_simple_ensemble/test.sh Outdated Show resolved Hide resolved

yinggeh reviewed Oct 15, 2025

View reviewed changes

qa/L0_simple_ensemble/models/slow_consumer/1/model.py Show resolved Hide resolved

pskiran1 added 2 commits October 15, 2025 15:46

Improve model preparation

4a1a8fe

Update tests

554e1b9

pskiran1 changed the title ~~ci: Add support for max_ensemble_inflight_responses parameter to prevent unbounded memory growth in ensemble models~~ ci: Add support for max_inflight_responses parameter to prevent unbounded memory growth in ensemble models Oct 17, 2025

pskiran1 changed the title ~~ci: Add support for max_inflight_responses parameter to prevent unbounded memory growth in ensemble models~~ ci: Add support for max_inflight_responses parameter to prevent unbounded memory growth in ensemble models Oct 17, 2025

pskiran1 requested a review from yinggeh October 24, 2025 15:58

pskiran1 changed the title ~~ci: Add support for max_inflight_responses parameter to prevent unbounded memory growth in ensemble models~~ ci: Add support for max_inflight_requests parameter to prevent unbounded memory growth in ensemble models Oct 24, 2025

Update

81be2ff

yinggeh reviewed Oct 30, 2025

View reviewed changes

pskiran1 and others added 4 commits October 30, 2025 15:22

Update qa/L0_simple_ensemble/ensemble_backpressure_test.py

d28c7bb

Co-authored-by: Yingge He <157551214+yinggeh@users.noreply.github.com>

Update qa/L0_simple_ensemble/ensemble_backpressure_test.py

df6de1d

Co-authored-by: Yingge He <157551214+yinggeh@users.noreply.github.com>

Update

8645e4f

Update

beb8484

yinggeh reviewed Oct 30, 2025

View reviewed changes

qa/L0_simple_ensemble/ensemble_backpressure_test.py Outdated Show resolved Hide resolved

pskiran1 and others added 3 commits October 31, 2025 10:28

Update qa/L0_simple_ensemble/ensemble_backpressure_test.py

801ab01

Co-authored-by: Yingge He <157551214+yinggeh@users.noreply.github.com>

Fix typo

2ff1d0b

Update documentation

e5ed718

pskiran1 requested a review from yinggeh October 31, 2025 12:37

yinggeh reviewed Oct 31, 2025

View reviewed changes

docs/user_guide/ensemble_models.md Outdated Show resolved Hide resolved

yinggeh reviewed Oct 31, 2025

View reviewed changes

docs/user_guide/ensemble_models.md Outdated Show resolved Hide resolved

yinggeh reviewed Oct 31, 2025

View reviewed changes

docs/user_guide/decoupled_models.md Outdated Show resolved Hide resolved

pskiran1 and others added 5 commits November 1, 2025 00:19

Update docs/user_guide/decoupled_models.md

6d76e59

Co-authored-by: Yingge He <157551214+yinggeh@users.noreply.github.com>

Update docs/user_guide/ensemble_models.md

c686cea

Co-authored-by: Yingge He <157551214+yinggeh@users.noreply.github.com>

Update docs/user_guide/ensemble_models.md

70a457f

Co-authored-by: Yingge He <157551214+yinggeh@users.noreply.github.com>

Update

3a922ad

Fix pre-commit

8cc7a94

pskiran1 requested a review from yinggeh October 31, 2025 20:02

yinggeh previously approved these changes Oct 31, 2025

View reviewed changes

Fix test case errors

5be6ce4

pskiran1 dismissed yinggeh’s stale review via 5be6ce4 November 1, 2025 11:49

pskiran1 requested a review from yinggeh November 1, 2025 14:42

yinggeh approved these changes Nov 1, 2025

View reviewed changes

pskiran1 merged commit c8a1bca into main Nov 3, 2025
2 of 3 checks passed

pskiran1 deleted the spolisetty/tri-26-triton-dali-ensemble-model-memory-issue branch November 3, 2025 05:26

ci: Add support for max_inflight_requests parameter to prevent unbounded memory growth in ensemble models #8458

ci: Add support for max_inflight_requests parameter to prevent unbounded memory growth in ensemble models #8458

Uh oh!

Conversation

pskiran1 commented Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does the PR do?

Checklist

Commit Type:

Related PRs:

Where should the reviewer start?

Test plan:

Caveats:

Background

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yinggeh Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yinggeh Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

pskiran1 Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

yinggeh left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

4 participants

ci: Add support for `max_inflight_requests` parameter to prevent unbounded memory growth in ensemble models #8458

ci: Add support for `max_inflight_requests` parameter to prevent unbounded memory growth in ensemble models #8458

pskiran1 commented Oct 13, 2025 •

edited

Loading

yinggeh Oct 31, 2025 •

edited

Loading