Skip to content

Conversation

@cehongwang
Copy link
Collaborator

Description

Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. List any dependencies that are required for this change.

Fixes # (issue)

Type of change

Please delete options that are not relevant and/or add your own.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

Checklist:

  • My code follows the style guidelines of this project (You can use the linters)
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas and hacks
  • I have made corresponding changes to the documentation
  • I have added tests to verify my fix or my feature
  • New and existing unit tests pass locally with my changes
  • I have added the relevant labels to my PR in so that relevant reviewers are notified

@meta-cla meta-cla bot added the cla signed label Oct 16, 2025
@github-actions github-actions bot added component: tests Issues re: Tests component: lowering Issues re: The lowering / preprocessing passes component: conversion Issues re: Conversion stage component: api [Python] Issues re: Python API component: dynamo Issues relating to the `torch.compile` or `torch._dynamo.export` paths labels Oct 16, 2025
@github-actions github-actions bot requested a review from peri044 October 16, 2025 22:23
@github-actions github-actions bot added the documentation Improvements or additions to documentation label Oct 29, 2025
@cehongwang cehongwang force-pushed the cpu-memory-optimization-rebase-main branch from b9b6aeb to 51f64f0 Compare October 29, 2025 17:30
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure to add a link to the resource_management page in index.rst


.. code-block:: bash
export TRIM_CPU_MEMORY=1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets prefix this with TORCHTRT. So TORCHTRT_ENABLE_BUILDER_MALLOC_TRIM I think would be more clear

export TRIM_CPU_MEMORY=1
This reduces approximately **** of redundant model copies, limiting
total CPU memory usage to up to **** the model size.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3x

offload_module_to_cpu = False
This removes another **** model copy, reducing peak CPU memory
usage to about **** the model size.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2x

GPU Memory
^^^^^^^^^^

By default, Torch-TensorRT may consume up to **** the model size in GPU memory.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2x

offload_module_to_cpu = True
This shifts one model copy from GPU to CPU memory.
As a result, peak GPU memory usage decreases to about ****
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1x

This shifts one model copy from GPU to CPU memory.
As a result, peak GPU memory usage decreases to about ****
the model size, while CPU memory usage increases by roughly ****.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit confusing, can we say increases to roughly **2x** model size

for attr in dir(module):
if attr.startswith("_frozen_param"):
delattr(module, attr)
release_memory()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this function name a little more specific?


@needs_refit # type: ignore[misc]
def _insert_engine_to_cache(self, hash_val: str, serialized_engine: bytes) -> None:
def _insert_engine_to_cache(self, hash_val: str, engine: trt.ICudaEngine) -> None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zewenli98 when do these calls run? will this conflict with the goal of keeping mem usage under 3x?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we do caching in a post processing step?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like we can give the cache entry as one of the Interpreter Result fields

self.tag(subgraphs)
return self.split()

def calculate_num_of_break(self, subgraphs: List[Subgraph]) -> int:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

calculate_num_breaks

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this go in with the other malloc_trim things or be in the graph break pr?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed component: api [Python] Issues re: Python API component: conversion Issues re: Conversion stage component: dynamo Issues relating to the `torch.compile` or `torch._dynamo.export` paths component: lowering Issues re: The lowering / preprocessing passes component: tests Issues re: Tests documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants