Prevent double serialization inside Flask server#3653
Prevent double serialization inside Flask server#3653tdene wants to merge 2 commits intoNVIDIA:mainfrom
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
97dd320 to
beac7eb
Compare
beac7eb to
52e3d10
Compare
| @@ -223,7 +223,7 @@ async def run_coordinator_test( | |||
| results = await asyncio.wait_for(asyncio.gather(*futures), timeout=10.0) | |||
|
|
|||
| for record in results: | |||
There was a problem hiding this comment.
rename record -> request
| @@ -120,7 +120,7 @@ async def chat_completions(): | |||
|
|
|||
| request_idx = 0 | |||
| for record in batch_results: | |||
There was a problem hiding this comment.
I believe this is a request now, so let's rename record -> request
|
|
||
| for record in results: | ||
| assert record[-1].status == Status.COMPLETED | ||
| assert record.status == Status.COMPLETED.name |
There was a problem hiding this comment.
this file instantiates InferenceClient with the default deserialize = True as far as I can tell, so does this assert actually work? I would assume it should be Status.COMPLETED instead of Status.COMPLETED.name unless I'm missing something.
There was a problem hiding this comment.
Addressed: I changed the default and made sure the whole test file was solid
| """ | ||
|
|
||
| def __init__(self, inference_coordinator_address: str): | ||
| def __init__(self, inference_coordinator_address: str, deserialize: bool = True): |
There was a problem hiding this comment.
why are we defaulting to True instead of False, since the goal is to avoid double serialization?
There was a problem hiding this comment.
we might want to make sure we have test coverage for both deserialize=True and deserialize=False, especially since we use isinstance(result, dict) to check the object type.
| @@ -134,7 +134,7 @@ async def main( | |||
| throughputs = [] | |||
|
|
|||
| for record in results: | |||
| @@ -158,7 +158,7 @@ async def main( | |||
| print("Results:") | |||
| unique_prompt_map = defaultdict(list) | |||
| for record in results: | |||
| @@ -129,29 +129,27 @@ async def completions(): | |||
|
|
|||
| request_idx = 0 | |||
| for record in batch_results: | |||
| for record in batch_results: | ||
| result = record.merge() | ||
| full_text = result.generated_text or "" | ||
| record_dict = record if isinstance(record, dict) else record.serialize() |
There was a problem hiding this comment.
record_dict -> request_dict
What does this PR do ?
Contribution process
flowchart LR A[Pre-checks] --> B[PR Tests] subgraph Code Review/Approval C1[Expert Review] --> C2[Final Review] end B --> C1 C2 --> D[Merge]Pre-checks
Core 0.8)Code review
The following process is enforced via the CODEOWNERS file for changes into
megatron/core. For changes outside ofmegatron/core, it is up to the PR author whether or not to tag the Final Reviewer team.For MRs into `main` branch
Feel free to message or comment the @mcore-oncall to help accelerate your merge into main. The less complex your PR is, the faster it will be approved and merged!
(Step 1): Add PR label
Expert Review(Step 2): Collect the expert reviewers reviews
Expert Reviewlabel when your PR is ready for review.Final Review might get declined if these requirements are not fulfilled.
(Step 3): Final Review
Final Reviewlabel(Optional Step 4): Cherry-pick into release branch
If this PR also needs to be merged into
core_r*release branches, after this PR has been merged, selectCherry-pickto open a new PR into the release branch.For MRs into `dev` branch
The proposed review process for `dev` branch is under active discussion.MRs are mergable after one approval by either
eharper@nvidia.comorzijiey@nvidia.com.Merging your PR
Any member of core-adlr and
core-nemowill be able to merge your PR.