Prevent double serialization inside Flask server by tdene · Pull Request #3653 · NVIDIA/Megatron-LM

tdene · 2026-03-02T11:00:02Z

What does this PR do ?

⚠️ For major changes (either in lines of code or in its impact), please make sure to first share a design doc with the team. If you're unsure what's the best way to do so, contact the @mcore-oncall.

Contribution process

flowchart LR
    A[Pre-checks] --> B[PR Tests]
    subgraph Code Review/Approval
        C1[Expert Review] --> C2[Final Review]
    end
    B --> C1
    C2 --> D[Merge]

Pre-checks

I want this PR in a versioned release and have added the appropriate Milestone (e.g., Core 0.8)
I have added relevant unit tests
I have added relevant functional tests
I have added proper typing to my code Typing guidelines
I have added relevant documentation
I have run the autoformatter.sh on my PR

Code review

The following process is enforced via the CODEOWNERS file for changes into megatron/core. For changes outside of megatron/core, it is up to the PR author whether or not to tag the Final Reviewer team.

For MRs into `main` branch

Feel free to message or comment the @mcore-oncall to help accelerate your merge into main. The less complex your PR is, the faster it will be approved and merged!

(Step 1): Add PR label `Expert Review`

(Step 2): Collect the expert reviewers reviews

Attach the Expert Review label when your PR is ready for review.
GitHub auto-assigns expert reviewers based on your changes. They will get notified and pick up your PR soon.

⚠️ Only proceed to the next step once all reviewers have approved, merge-conflict are resolved and the CI is passing.
Final Review might get declined if these requirements are not fulfilled.

(Step 3): Final Review

Add Final Review label
GitHub auto-assigns final reviewers based on your changes. They will get notified and pick up your PR soon.

(Optional Step 4): Cherry-pick into release branch

If this PR also needs to be merged into core_r* release branches, after this PR has been merged, select Cherry-pick to open a new PR into the release branch.

For MRs into `dev` branch

The proposed review process for `dev` branch is under active discussion.

MRs are mergable after one approval by either eharper@nvidia.com or zijiey@nvidia.com.

Merging your PR

Any member of core-adlr and core-nemo will be able to merge your PR.

copy-pr-bot · 2026-03-02T11:00:05Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

lmcafee-nvidia · 2026-03-02T14:50:12Z

tests/unit_tests/inference/test_data_parallel_inference_coordinator.py

@@ -223,7 +223,7 @@ async def run_coordinator_test(
                results = await asyncio.wait_for(asyncio.gather(*futures), timeout=10.0)

                for record in results:


rename record -> request

lmcafee-nvidia · 2026-03-02T14:53:37Z

.../core/inference/text_generation_server/dynamic_text_gen_server/endpoints/chat_completions.py

@@ -120,7 +120,7 @@ async def chat_completions():

        request_idx = 0
        for record in batch_results:


I believe this is a request now, so let's rename record -> request

lmcafee-nvidia · 2026-03-02T15:01:06Z

tests/unit_tests/inference/test_data_parallel_inference_coordinator.py


                for record in results:
-                    assert record[-1].status == Status.COMPLETED
+                    assert record.status == Status.COMPLETED.name


this file instantiates InferenceClient with the default deserialize = True as far as I can tell, so does this assert actually work? I would assume it should be Status.COMPLETED instead of Status.COMPLETED.name unless I'm missing something.

Addressed: I changed the default and made sure the whole test file was solid

lmcafee-nvidia · 2026-03-02T15:03:26Z

megatron/core/inference/inference_client.py

    """

-    def __init__(self, inference_coordinator_address: str):
+    def __init__(self, inference_coordinator_address: str, deserialize: bool = True):


why are we defaulting to True instead of False, since the goal is to avoid double serialization?

lmcafee-nvidia · 2026-03-02T15:04:53Z

tests/unit_tests/inference/test_data_parallel_inference_coordinator.py

we might want to make sure we have test coverage for both deserialize=True and deserialize=False, especially since we use isinstance(result, dict) to check the object type.

lmcafee-nvidia · 2026-03-02T15:13:22Z

examples/inference/gpt/gpt_dynamic_inference_with_coordinator.py

@@ -134,7 +134,7 @@ async def main(
            throughputs = []

            for record in results:


record -> request

lmcafee-nvidia · 2026-03-02T15:13:30Z

examples/inference/gpt/gpt_dynamic_inference_with_coordinator.py

@@ -158,7 +158,7 @@ async def main(
            print("Results:")
            unique_prompt_map = defaultdict(list)
            for record in results:


record -> request

lmcafee-nvidia · 2026-03-02T15:16:00Z

megatron/core/inference/text_generation_server/dynamic_text_gen_server/endpoints/completions.py

@@ -129,29 +129,27 @@ async def completions():

        request_idx = 0
        for record in batch_results:


record -> request

lmcafee-nvidia · 2026-03-02T15:20:29Z

megatron/core/inference/text_generation_server/dynamic_text_gen_server/endpoints/completions.py

        for record in batch_results:
-            result = record.merge()
-            full_text = result.generated_text or ""
+            record_dict = record if isinstance(record, dict) else record.serialize()


record_dict -> request_dict

tdene force-pushed the tde/fix_double_serialize branch from 97dd320 to beac7eb Compare March 2, 2026 11:18

Prevent double serialization

52e3d10

tdene force-pushed the tde/fix_double_serialize branch from beac7eb to 52e3d10 Compare March 2, 2026 12:32

lmcafee-nvidia reviewed Mar 2, 2026

View reviewed changes

Address reviewer comments

75d863d

tdene marked this pull request as ready for review March 2, 2026 18:28

tdene requested review from a team as code owners March 2, 2026 18:28

svcnvidia-nemo-ci requested a review from a team March 2, 2026 18:29

svcnvidia-nemo-ci added this to the Core 0.16 milestone Mar 2, 2026

copy-pr-bot bot temporarily deployed to test March 2, 2026 18:29 Inactive

tdene added the Expert Review Apply this label to indicate that your PR is ready for expert review. label Mar 2, 2026

		@@ -223,7 +223,7 @@ async def run_coordinator_test(
		results = await asyncio.wait_for(asyncio.gather(*futures), timeout=10.0)

		for record in results:

		@@ -120,7 +120,7 @@ async def chat_completions():

		request_idx = 0
		for record in batch_results:

		@@ -134,7 +134,7 @@ async def main(
		throughputs = []

		for record in results:

		@@ -129,29 +129,27 @@ async def completions():

		request_idx = 0
		for record in batch_results:

Conversation

tdene commented Mar 2, 2026

What does this PR do ?

Contribution process

Pre-checks

Code review

(Step 1): Add PR label Expert Review

(Step 2): Collect the expert reviewers reviews

(Step 3): Final Review

(Optional Step 4): Cherry-pick into release branch

Merging your PR

Uh oh!

copy-pr-bot bot commented Mar 2, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

(Step 1): Add PR label `Expert Review`