Skip to content

Python: Fix handoff workflow with store=False sending server-assigned item IDs#4368

Open
LEDazzio01 wants to merge 2 commits intomicrosoft:mainfrom
LEDazzio01:fix/4357-strip-item-ids-when-store-false
Open

Python: Fix handoff workflow with store=False sending server-assigned item IDs#4368
LEDazzio01 wants to merge 2 commits intomicrosoft:mainfrom
LEDazzio01:fix/4357-strip-item-ids-when-store-false

Conversation

@LEDazzio01
Copy link
Contributor

Motivation and Context

Fixes #4357

When store=False, server-assigned item IDs (rs_* for reasoning, fc_* for function calls) reference non-existent server-persisted objects. During handoff workflows, these IDs are replayed in the input, causing Item not found API errors:

openai.BadRequestError: Error code: 400 - {
    'error': {
        'message': "Item 'rs_...' not found in the session's conversation.",
        'type': 'invalid_request_error',
        ...
    }
}

Description

This PR adds a post-processing step in _prepare_options() that strips the id field from reasoning and function_call input items when store=False. This ensures these items are replayed by value rather than by reference when server-side persistence is disabled.

Changes

_responses_client.py

  • Added ID-stripping logic at the end of _prepare_options(), right before returning run_options
  • Only activates when store is explicitly False
  • Targets only reasoning and function_call item types

test_openai_responses_client.py

  • Added test_prepare_options_strips_reasoning_and_function_call_ids_when_store_false — verifies IDs are removed from reasoning and function_call items when store=False
  • Added test_prepare_options_preserves_reasoning_and_function_call_ids_when_store_true — verifies IDs are preserved when store=True

Both tests simulate a realistic handoff conversation with reasoning + function_call + function_result + text messages.

Contribution Checklist

  • The code builds clean without any errors or warnings
  • The PR title follows the repo convention
  • Tests added to cover the change
  • All new and existing tests pass

…n store=False

When store is disabled, server-assigned item IDs (rs_*, fc_*) reference
non-existent server-persisted objects. During handoff workflows, these IDs
are replayed in the input, causing 'Item not found' API errors.

This fix adds a post-processing step in _prepare_options() that strips the
'id' field from reasoning and function_call input items when store=False,
so they are replayed by value rather than by reference.

Fixes microsoft#4357
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes handoff workflows when store=False by ensuring OpenAI Responses API input items are replayed by value (not by reference) to avoid “Item not found” errors caused by server-assigned item IDs (rs_*, fc_*) that were never persisted.

Changes:

  • Added a post-processing step in OpenAIResponsesClient._prepare_options() to remove id from reasoning and function_call input items when store is explicitly False.
  • Added unit tests validating that these IDs are stripped for store=False and preserved for store=True.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
python/packages/core/agent_framework/openai/_responses_client.py Strips id fields from reasoning / function_call input items when store=False to prevent invalid references during replay.
python/packages/core/tests/openai/test_openai_responses_client.py Adds coverage for ID stripping behavior under store=False and preservation under store=True.
Comments suppressed due to low confidence (1)

python/packages/core/tests/openai/test_openai_responses_client.py:2886

  • The file now introduces a third # region ... marker but still has only a single # endregion at the end of the file, which makes region folding ambiguous (the Store=False section ends up nested under the Background Response section). Consider adding a matching # endregion for this new region (and/or closing the previous region before opening this one) to keep region markers balanced like other test files.
# region Store=False ID Stripping Tests (Issue #4357)


@@ -852,6 +847,17 @@ async def _prepare_options(
if response_format:
run_options["text_format"] = response_format

# When store=False, strip server-assigned IDs from reasoning and function_call
# items. These IDs (rs_*, fc_*) reference server-persisted objects that don't exist
# when store is disabled, causing "Item not found" API errors during handoff workflows.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: thank you for your contributions! We don't need to mention handoff here, which is likely to create more confusion.

Copy link
Contributor Author

@LEDazzio01 LEDazzio01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! Pushed commit eef02f1 — removed "handoff" mentions from the comment in _responses_client.py and the test docstring/comment in test_openai_responses_client.py. Now uses generic wording ("causing 'Item not found' API errors" and "Simulate a multi-turn conversation").

Copy link
Contributor

@moonbox3 moonbox3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works, but I'm worried about it living here as a post-processing step. If someone adds a new item type with a server-assigned ID down the road, they'd have to remember to update this loop too — and mcp_approval_request already has an id field that isn't covered here.

Could we thread store into _prepare_content_for_openai instead and just not emit the id at serialization time? Something like:

# reasoning case:
if content.id and store is not False:
    ret["id"] = content.id

That way each item type owns its own ID logic and new types get the right behavior by default.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

6 participants