-
Notifications
You must be signed in to change notification settings - Fork 862
fix(openai-agents): add support for realtime #3533
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
📝 WalkthroughWalkthroughAdds optional realtime OpenTelemetry instrumentation for OpenAI Agents SDK sessions, new prompt/response extraction helpers and GenAI-aligned attributes in hooks, session wrappers that manage nested workflow/agent/tool/audio/LLM spans, tests and examples for realtime flows, and a development dependency. Changes
Sequence Diagram(s)sequenceDiagram
participant App as Application
participant RT as RealtimeSession (wrapped)
participant State as RealtimeTracingState
participant Tracer as OT Tracer
participant Exporter as Span Exporter
Note over App,RT: Session enter (wrapper applied)
App->>RT: __aenter__()
RT->>State: init tracing state
State->>Tracer: start workflow span
Note over App,RT: Events routed via _put_event
App->>RT: _put_event(agent_start)
RT->>State: start_agent_span()
State->>Tracer: start agent span
App->>RT: _put_event(audio)
RT->>State: start_audio_span()
State->>Tracer: start audio span
App->>RT: _put_event(raw_model_event)
RT->>State: record_prompt()/record_completion()
State->>State: buffer prompt/completion
State->>Tracer: create LLM span (with buffered data)
Tracer->>Exporter: export LLM span
App->>RT: _put_event(agent_end)
RT->>State: end_agent_span()
State->>Tracer: end agent span
App->>RT: __aexit__()
RT->>State: cleanup()
State->>Tracer: end workflow span
Tracer->>Exporter: export spans
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important
Looks good to me! 👍
Reviewed everything up to 30f3bc4 in 2 minutes and 11 seconds. Click for details.
- Reviewed
2199lines of code in8files - Skipped
2files when reviewing. - Skipped posting
5draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/__init__.py:44
- Draft comment:
Realtime instrumentation errors are silently ignored. Consider logging exception details for better diagnostics instead of using bare 'except Exception: pass'. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 30% vs. threshold = 50% The comment is about code that was added in the diff (lines 49-51), so it is about changes. However, this is a code quality suggestion about error handling patterns. The existing code already uses the same silent exception handling pattern (lines 40-42), so this isn't introducing a new anti-pattern - it's following the established convention in this file. The comment doesn't point to a bug or clear issue, it's suggesting a different approach to error handling. According to the rules, "Comments that suggest code quality refactors are good! But only if they are actionable and clear." This is actionable and clear, but it's also inconsistent to only flag the new code when the existing code uses the same pattern. This feels like a style preference rather than a clear code issue. The comment is technically correct that logging would provide better diagnostics. The suggestion is actionable and clear. If this is an instrumentation library, having visibility into why instrumentation failed could be valuable for debugging. Just because the existing code uses the same pattern doesn't mean the pattern is correct. While the suggestion has merit, the rules state to avoid comments that are "obvious or unimportant" and to focus on clear code changes required. This is more of a general code quality improvement that could apply to multiple places in the file. The comment singles out only the new code while ignoring the identical pattern in existing code (lines 40-42), which suggests this isn't a critical issue. If it were truly important, it should apply to all instances. This is a reasonable code quality suggestion, but it's not a clear bug or required change. The new code follows the existing pattern in the file. Since the comment only targets new code while ignoring identical existing patterns, and it's more of a style/quality preference than a clear issue, it should be deleted.
2. packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py:236
- Draft comment:
Realtime span creation for SpeechSpanData, TranscriptionSpanData, and SpeechGroupSpanData share similar logic. Consider refactoring common code to reduce duplication. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 35% vs. threshold = 50% This is a code quality refactoring suggestion. The rules state that "Comments that suggest code quality refactors are good! But only if they are actionable and clear." The comment is actionable - the developer could extract a helper method. However, I need to consider: 1) Is this duplication significant enough to warrant refactoring? The blocks are relatively short and have meaningful differences. 2) Is this the right time for this comment? The PR is adding new functionality (realtime span support), and asking for refactoring during feature addition might not be appropriate. 3) The duplication is limited to about 3 blocks with subtle differences - this is borderline whether it's worth refactoring. The comment doesn't provide specific guidance on how to refactor, just points out the duplication exists. While there is some duplication, the blocks are relatively short and each has unique characteristics. Refactoring might actually make the code harder to read by abstracting away the differences. Also, this is a "nice to have" suggestion rather than identifying a bug or serious issue. The rules emphasize removing comments unless there's STRONG EVIDENCE they're correct and necessary. The comment is technically correct that there is duplication, and it is actionable. However, it's a minor code quality suggestion on newly added code, and the duplication isn't severe enough to clearly warrant immediate action. The PR author made a reasonable choice to keep the code explicit and readable for these three different span types. This falls into the category of "not clearly necessary" feedback. This is a borderline case, but I lean toward removing it. The comment is a minor code quality suggestion on new functionality, and the duplication isn't severe. The author's approach of keeping the three handlers separate and explicit is reasonable and maintainable. This doesn't meet the bar of "STRONG EVIDENCE that the comment is correct and necessary."
3. packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py:316
- Draft comment:
Exceptions in traced_aenter are caught silently. It’s advisable to log exception details to aid debugging. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% This is a code quality suggestion about error handling. The comment points out that exceptions are being caught and silently ignored, which could make debugging difficult. However, I need to consider: 1) The function is already decorated with @dont_throw which likely handles exceptions at a higher level, 2) This is instrumentation code that wraps external library code, so silent failures might be intentional to avoid breaking the instrumented application, 3) The comment is suggesting a code improvement (adding logging) rather than pointing out a bug. Looking at the pattern, all the wrapped functions (traced_aenter, traced_aexit, traced_put_event, traced_send_message) use the same pattern of @dont_throw decorator plus internal try-except with pass. This appears to be an intentional design pattern for defensive instrumentation. The comment is making a reasonable code quality suggestion about observability. However, this appears to be an intentional design pattern used consistently throughout the file for defensive instrumentation. The @dont_throw decorator plus silent exception handling is likely meant to ensure instrumentation never breaks the application. Without seeing the implementation of @dont_throw or understanding the broader instrumentation strategy, I can't be certain this is a problem. While the suggestion has merit for general code quality, this is instrumentation code where silent failures are often intentional. The consistent pattern across all wrapped methods suggests this is a deliberate design choice. The comment doesn't provide strong evidence that this is actually causing a problem or that the current approach is incorrect. This comment should be deleted. It's a speculative code quality suggestion about adding logging, but the silent exception handling appears to be an intentional design pattern for defensive instrumentation. There's no evidence of an actual bug or problem, and the rules state not to make comments unless there's clearly a code change required.
4. packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py:327
- Draft comment:
Similarly, traced_aexit and traced_put_event suppress errors without logging. Consider adding logging to capture error details rather than silently passing. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% This is a new file being added (all lines are new). The comment is suggesting a code quality improvement - adding logging instead of silently suppressing errors. However, looking at the rules: 1) This is about code quality/observability, which could be good if actionable and clear. 2) The code uses both@dont_throwdecorator AND bare except blocks, suggesting intentional error suppression. 3) This is instrumentation code that wraps external library methods - silent error suppression is often intentional in instrumentation to avoid breaking the instrumented code. 4) The comment doesn't point to a bug or clear issue, just suggests "consider adding logging" which is speculative about what would be better. 5) Without seeing thedont_throwdecorator implementation or understanding the broader error handling strategy, it's hard to say if this is definitely correct. The instrumentation code appears to intentionally suppress errors to avoid breaking the wrapped functionality. The@dont_throwdecorator is explicitly imported and used, suggesting this is a deliberate design choice. The comment is speculative ("consider") rather than pointing to a definite issue. While the error suppression appears intentional for instrumentation code, the comment does raise a valid observability concern. However, without understanding the full error handling strategy (whatdont_throwdoes) and whether this is a standard pattern in this codebase, I cannot be confident this is the right change. The comment is also not clearly actionable - it says "consider" which violates the rule about making definite suggestions. This comment should be deleted. It's a speculative suggestion ("consider") about code quality rather than identifying a clear bug. The error suppression appears intentional for instrumentation code, and without more context about the codebase's error handling patterns, there's no strong evidence this change is needed.
5. packages/sample-app/sample_app/openai_agents_realtime_example.py:583
- Draft comment:
The check_requirements function relies on print statements for missing dependencies. Consider raising exceptions or logging warnings to improve error handling in production scenarios. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% This is a sample/demo application, not production code. The comment is suggesting a refactor for "production scenarios" but this is explicitly a demo/example file. The use of print statements for a CLI script is completely appropriate and standard. The comment is making speculative suggestions about production use cases that don't apply here. According to the rules, I should not keep comments that are speculative or suggest refactors that aren't clearly necessary. This comment also doesn't point to any actual bug or issue - it's just a style preference that doesn't apply to demo code. Could this be a legitimate concern about code quality even in a sample app? Maybe the author wants to demonstrate best practices even in examples. However, the comment specifically mentions "production scenarios" which suggests it acknowledges this isn't production code. While demonstrating best practices in examples can be valuable, the comment explicitly frames this as a concern for "production scenarios," acknowledging this isn't production code. For a CLI demo script, print statements are the standard and appropriate approach. The comment doesn't identify any actual problem with the current implementation for its intended use case. This comment should be deleted. It's suggesting a refactor for "production scenarios" on what is clearly a sample/demo application where print statements are appropriate for CLI usage. The comment is speculative and doesn't identify any actual issue with the code as written for its intended purpose.
Workflow ID: wflow_9pqYSMjSLLNalYks
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In @packages/sample-app/sample_app/openai_agents_realtime_example.py:
- Around line 284-294: The calculate function currently uses eval(expression)
which is unsafe; replace it with a safe parser/evaluator that uses Python's ast
module to parse the expression and evaluates only allowed node types (numbers,
BinOp with ast.Add, ast.Sub, ast.Mult, ast.Div, and parenthesis) using a mapped
operator table, explicitly disallow exponentiation (Pow) and other nodes, and
catch/return errors (including overly deep ASTs) instead of calling eval; modify
calculate to call this safe evaluator and keep the original input validation and
error handling around it.
🧹 Nitpick comments (5)
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py (1)
672-714: LGTM!The realtime span data extraction correctly handles different data types:
- Audio bytes are stored by length (avoiding raw bytes in attributes)
- Text content is properly mapped to prompt/completion attributes
- The
first_content_atattribute enables latency trackingNote: Content is not truncated here, unlike the
[:4096]limit applied in_realtime_wrappers.py. Consider adding truncation for consistency to prevent excessively large span attributes.packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py (2)
203-216: Consider prefixing unusedroleparameters with underscore.The
roleparameter inrecord_promptandrecord_completionis not used in the method bodies (hardcoded to "user" and "assistant" respectively). If this is intentional, prefix with underscore to signal intent.♻️ Suggested fix
- def record_prompt(self, role: str, content: str): + def record_prompt(self, _role: str, content: str): """Record a prompt message - buffers it for the LLM span.""" ... - def record_completion(self, role: str, content: str): + def record_completion(self, _role: str, content: str): """Record a completion message - creates an LLM span with prompt and completion.""" ...
316-323: Consider adding debug-level logging for silently caught exceptions.The
try/except/passblocks (lines 321-322, 339-340, 444-445, 474-475) intentionally swallow exceptions to prevent instrumentation from breaking user code. While this is a valid pattern for instrumentation, it makes debugging difficult when issues occur.Consider adding optional debug-level logging to aid troubleshooting:
♻️ Example improvement
import logging logger = logging.getLogger(__name__) # Then in the exception handlers: except Exception as e: logger.debug("Realtime instrumentation error: %s", e, exc_info=True)packages/sample-app/sample_app/openai_agents_realtime_example.py (1)
588-595: Remove unusednoqadirectives.Static analysis indicates the
# noqa: F401comments on lines 588 and 593 are unnecessary. The imports are used for existence checks, but the linter isn't flagging them.♻️ Suggested cleanup
- import agents # noqa: F401 + import agents except ImportError: missing.append("openai-agents") try: - from agents.realtime import RealtimeAgent # noqa: F401 + from agents.realtime import RealtimeAgent except ImportError:packages/opentelemetry-instrumentation-openai-agents/tests/test_realtime_session.py (1)
4-4: Remove unused imports.
MagicMock,AsyncMock, andpatchare imported but not used in this test file.♻️ Suggested fix
import pytest -from unittest.mock import MagicMock, AsyncMock, patch from opentelemetry.sdk.trace import TracerProvider
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (2)
packages/opentelemetry-instrumentation-openai-agents/poetry.lockis excluded by!**/*.lockpackages/sample-app/poetry.lockis excluded by!**/*.lock
📒 Files selected for processing (8)
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/__init__.pypackages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.pypackages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.pypackages/opentelemetry-instrumentation-openai-agents/pyproject.tomlpackages/opentelemetry-instrumentation-openai-agents/tests/test_realtime.pypackages/opentelemetry-instrumentation-openai-agents/tests/test_realtime_session.pypackages/sample-app/pyproject.tomlpackages/sample-app/sample_app/openai_agents_realtime_example.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules
Files:
packages/sample-app/sample_app/openai_agents_realtime_example.pypackages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.pypackages/opentelemetry-instrumentation-openai-agents/tests/test_realtime.pypackages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/__init__.pypackages/opentelemetry-instrumentation-openai-agents/tests/test_realtime_session.pypackages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py
🧠 Learnings (2)
📓 Common learnings
Learnt from: duanyutong
Repo: traceloop/openllmetry PR: 3487
File: packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/utils.py:177-178
Timestamp: 2025-12-02T21:09:48.690Z
Learning: The opentelemetry-instrumentation-openai and opentelemetry-instrumentation-openai-agents packages must remain independent and not share code, so code duplication between them is acceptable.
Learnt from: CR
Repo: traceloop/openllmetry PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-08-17T15:06:48.109Z
Learning: Instrumentation packages must leverage the semantic conventions package and emit OTel-compliant spans
📚 Learning: 2025-12-02T21:09:48.690Z
Learnt from: duanyutong
Repo: traceloop/openllmetry PR: 3487
File: packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/utils.py:177-178
Timestamp: 2025-12-02T21:09:48.690Z
Learning: The opentelemetry-instrumentation-openai and opentelemetry-instrumentation-openai-agents packages must remain independent and not share code, so code duplication between them is acceptable.
Applied to files:
packages/opentelemetry-instrumentation-openai-agents/pyproject.tomlpackages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.pypackages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/__init__.py
🧬 Code graph analysis (3)
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py (1)
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (2)
SpanAttributes(64-245)TraceloopSpanKindValues(285-290)
packages/opentelemetry-instrumentation-openai-agents/tests/test_realtime.py (4)
packages/traceloop-sdk/traceloop/sdk/utils/in_memory_span_exporter.py (3)
export(45-51)InMemorySpanExporter(22-61)get_finished_spans(40-43)packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)
SpanAttributes(64-245)packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py (4)
OpenTelemetryTracingProcessor(24-778)on_trace_start(41-53)on_span_start(65-317)on_trace_end(56-62)packages/opentelemetry-instrumentation-openai-agents/tests/test_realtime_session.py (1)
tracer(26-29)
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/__init__.py (2)
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py (2)
wrap_realtime_session(296-483)unwrap_realtime_session(486-502)packages/opentelemetry-instrumentation-openai-agents/tests/test_realtime_session.py (1)
tracer(26-29)
🪛 Flake8 (7.3.0)
packages/opentelemetry-instrumentation-openai-agents/tests/test_realtime_session.py
[error] 4-4: 'unittest.mock.MagicMock' imported but unused
(F401)
[error] 4-4: 'unittest.mock.AsyncMock' imported but unused
(F401)
[error] 4-4: 'unittest.mock.patch' imported but unused
(F401)
🪛 Ruff (0.14.10)
packages/sample-app/sample_app/openai_agents_realtime_example.py
231-231: Do not catch blind exception: Exception
(BLE001)
290-290: Use of possibly insecure function; consider using ast.literal_eval
(S307)
292-292: Consider moving this statement to an else block
(TRY300)
293-293: Do not catch blind exception: Exception
(BLE001)
421-421: Do not catch blind exception: Exception
(BLE001)
574-574: Do not catch blind exception: Exception
(BLE001)
588-588: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
593-593: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py
203-203: Unused method argument: role
(ARG002)
212-212: Unused method argument: role
(ARG002)
321-322: try-except-pass detected, consider logging the exception
(S110)
321-321: Do not catch blind exception: Exception
(BLE001)
339-340: try-except-pass detected, consider logging the exception
(S110)
339-339: Do not catch blind exception: Exception
(BLE001)
444-445: try-except-pass detected, consider logging the exception
(S110)
444-444: Do not catch blind exception: Exception
(BLE001)
474-475: try-except-pass detected, consider logging the exception
(S110)
474-474: Do not catch blind exception: Exception
(BLE001)
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/__init__.py
49-51: try-except-pass detected, consider logging the exception
(S110)
49-49: Do not catch blind exception: Exception
(BLE001)
53-53: Unused method argument: kwargs
(ARG002)
59-60: try-except-pass detected, consider logging the exception
(S110)
59-59: Do not catch blind exception: Exception
(BLE001)
packages/opentelemetry-instrumentation-openai-agents/tests/test_realtime_session.py
37-37: Unpacked variable exporter is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
60-60: Unpacked variable exporter is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
85-85: Unpacked variable exporter is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
125-125: Unpacked variable exporter is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
149-149: Unpacked variable exporter is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
161-161: Unpacked variable exporter is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Lint
- GitHub Check: Build Packages (3.11)
- GitHub Check: Test Packages (3.10)
- GitHub Check: Test Packages (3.12)
- GitHub Check: Test Packages (3.11)
🔇 Additional comments (13)
packages/sample-app/pyproject.toml (3)
30-30: No compatibility issues found. Both llama-index-embeddings-openai (0.5.1) and llama-index-llms-openai (0.6.12) are compatible with openai ^2.14.0.
26-26: Verify that sample app and instrumentation code support openai v2.14.0.The major version bump from v1.52.2 to v2.14.0 introduces significant breaking changes:
- Client instantiation pattern (requires
OpenAI()class instead of module-level functions)- Namespaced API surface (
client.chat.*,client.audio.*, etc.)- Keyword-only arguments in method calls
- Response object structure changes
- Streaming API changes
Ensure that all sample app code and the opentelemetry-instrumentation-openai package have been updated to use v2 patterns and that integration tests verify compatibility with the new version.
58-58: Version 0.6.5 is appropriate and includes realtime support.The version bump from 0.2.7 to 0.6.5 is justified by the addition of realtime support in this version. Verification confirms that openai-agents 0.6.5 exists on PyPI and includes the RealtimeSession API for realtime model sessions (supporting WebRTC, WebSocket, and SIP transports). No documented breaking changes are present in the 0.6.5 patch release.
packages/opentelemetry-instrumentation-openai-agents/pyproject.toml (1)
34-34: LGTM!Adding
websocketsas a dev dependency is appropriate for testing the realtime session instrumentation. The version constraint^15.0.1is reasonable.packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py (2)
69-78: LGTM!The guarded import pattern correctly handles optional realtime span types, matching the existing approach for
set_agent_name. Setting placeholders toNoneensuresisinstancechecks gracefully returnFalsewhen realtime is unavailable.
235-312: LGTM!The realtime span handling correctly creates hierarchical spans with appropriate attributes following OTel semantic conventions. The span names (
openai.realtime.speech,openai.realtime.transcription,openai.realtime.speech_group) and attributes (LLM_REQUEST_TYPE,gen_ai.system,gen_ai.operation.name) are well-aligned with existing patterns.Minor note: The
has_realtime_spans and SpeechSpanDatachecks are technically redundant sincehas_realtime_spansis onlyTruewhen the imports succeed, but this defensive approach is acceptable for clarity.packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py (3)
17-94: LGTM!The
RealtimeTracingStateclass provides clean encapsulation of per-session tracing state. The span lifecycle management correctly:
- Creates hierarchical spans with proper parent contexts
- Tracks multiple concurrent spans (agents, tools, audio)
- Uses appropriate semantic conventions (
TraceloopSpanKindValues.WORKFLOW,AGENT)
97-193: LGTM!The tool, handoff, and audio span methods implement proper context propagation with a sensible fallback chain. The handoff span correctly represents a point-in-time event by creating and immediately ending. The 4096-character truncation for tool output prevents excessively large span attributes.
248-254: VerifySpanAttributes.LLM_SYSTEMexists.The code references
SpanAttributes.LLM_SYSTEMon line 250, but this attribute may not exist in theSpanAttributesclass fromopentelemetry-semantic-conventions-ai. If this attribute doesn't exist, it will raise anAttributeErrorat runtime.packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/__init__.py (1)
44-60: LGTM!The realtime session instrumentation integration follows the same pattern as the existing
add_trace_processorblock. The silent exception handling is appropriate for optional features that may not be available in all SDK versions.The unused
kwargsparameter in_uninstrumentis a standardBaseInstrumentorinterface requirement.packages/sample-app/sample_app/openai_agents_realtime_example.py (1)
1-41: LGTM!The sample app has comprehensive documentation explaining the three demo modes (text, voice, handoff) and requirements. The Traceloop initialization is properly configured.
packages/opentelemetry-instrumentation-openai-agents/tests/test_realtime.py (1)
1-481: LGTM!Comprehensive test coverage for realtime span instrumentation. The tests properly verify:
- Span creation for speech, transcription, and speech_group types
- Correct attribute population (model, format, operation name)
- Error status propagation
- Span hierarchy (realtime spans nested under workflow)
The mocking approach using
patch.dict("sys.modules", ...)is appropriate given WebSocket-based realtime API cannot be recorded with VCR.packages/opentelemetry-instrumentation-openai-agents/tests/test_realtime_session.py (1)
32-306: LGTM!Thorough test coverage for
RealtimeTracingState:
- Span lifecycle (start/end for workflow, agent, tool, audio)
- Handoff span creation with correct attributes
- Prompt/completion buffering and LLM span generation
- Cleanup behavior ensuring all spans are terminated
- Proper parent-child span hierarchy
The tests marked with unused
exporterby static analysis are tests that only verify in-memory state rather than exported spans - this is intentional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important
Looks good to me! 👍
Reviewed ac76869 in 57 seconds. Click for details.
- Reviewed
87lines of code in1files - Skipped
0files when reviewing. - Skipped posting
6draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py:66
- Draft comment:
Removed redundant comment for importing realtime span types. This cleanup is cosmetic. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
2. packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py:232
- Draft comment:
Removed unnecessary comment before handling SpeechSpanData. Improves clarity without affecting functionality. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
3. packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py:244
- Draft comment:
Removed inline comments ('Add model if available' and 'Add output format if available') from the SpeechSpanData block. Code remains self-explanatory. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
4. packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py:273
- Draft comment:
Removed inline comments for the TranscriptionSpanData block. This cleanup helps maintain concise code. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
5. packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py:669
- Draft comment:
Removed redundant inline comments in the SpeechSpanData handling within on_span_end. This change is purely cosmetic. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
6. packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py:706
- Draft comment:
Removed redundant comment in the SpeechGroupSpanData block. The code remains clear without it. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
Workflow ID: wflow_hRvoqqSZH3ByYumM
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In
@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py:
- Around line 666-701: The on_span_end branch uses string-based type checks
(type(span_data).__name__ == '...') for SpeechSpanData, TranscriptionSpanData,
and SpeechGroupSpanData; change these to the same isinstance() checks used in
on_span_start (e.g., isinstance(span_data, SpeechSpanData),
isinstance(span_data, TranscriptionSpanData), isinstance(span_data,
SpeechGroupSpanData)) and move or add the conditional import of those span
classes to module scope (or use typing.TYPE_CHECKING guards) so the classes are
available for isinstance without causing circular-import issues; update the
checks in on_span_end to reference the actual classes (SpeechSpanData,
TranscriptionSpanData, SpeechGroupSpanData) instead of comparing type names.
- Around line 69-77: The conditional import block inside on_span_start
(currently creating has_realtime_spans and importing SpeechSpanData,
TranscriptionSpanData, SpeechGroupSpanData) should be moved to module level (or
performed once in class initialization) like set_agent_name does: attempt the
import once at import/initialization time, set has_realtime_spans True/False and
set the three symbols to None on ImportError, and then remove the try/except
import from on_span_start so the method uses the pre-initialized
has_realtime_spans and symbol bindings.
🧹 Nitpick comments (3)
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py (3)
234-234: Consider simplifying redundant condition checks.Lines 234, 261, and 288 check both
SpeechSpanData(truthiness) andisinstance(span_data, SpeechSpanData). The truthiness check is redundant sinceisinstance()returnsFalsewhen the class isNone.While the current pattern is explicit and defensive, simplifying to
has_realtime_spans and isinstance(span_data, SpeechSpanData)would be cleaner.♻️ Optional simplification
- elif has_realtime_spans and SpeechSpanData and isinstance(span_data, SpeechSpanData): + elif has_realtime_spans and isinstance(span_data, SpeechSpanData):Apply the same pattern to TranscriptionSpanData (line 261) and SpeechGroupSpanData (line 288).
Also applies to: 261-261, 288-288
234-305: Consider extracting common logic for realtime span creation.The three realtime span creation blocks (SpeechSpanData, TranscriptionSpanData, SpeechGroupSpanData) share significant structural similarity:
- Finding current agent span
- Setting parent context
- Building base attributes with common keys
- Starting CLIENT spans with timestamps
While each has type-specific attributes, extracting the common pattern into a helper method would reduce duplication and maintenance burden.
💡 Example helper method approach
def _create_realtime_span(self, span_data, span_name: str, operation: str, parent_context): """Helper to create realtime spans with common attributes.""" attributes = { SpanAttributes.LLM_REQUEST_TYPE: "realtime", "gen_ai.system": "openai", "gen_ai.operation.name": operation, } model = getattr(span_data, 'model', None) if model: attributes[GenAIAttributes.GEN_AI_REQUEST_MODEL] = model return self.tracer.start_span( span_name, kind=SpanKind.CLIENT, context=parent_context, attributes=attributes, start_time=time.time_ns() )Then add format-specific attributes after creation if needed.
672-678: Consider using json.dumps() for non-bytes output data.Lines 677 and 689 use
str()to convert non-bytes output data. For complex objects,str()may produce unhelpful representations. The rest of the file usesjson.dumps()for non-string content (e.g., lines 391, 434, 563), which provides consistent serialization.♻️ Suggested change for consistency
output_audio = getattr(span_data, 'output', None) if output_audio: if isinstance(output_audio, (bytes, bytearray)): otel_span.set_attribute("gen_ai.speech.output_bytes", len(output_audio)) else: - otel_span.set_attribute(f"{GenAIAttributes.GEN_AI_COMPLETION}.0.content", str(output_audio)) + content = output_audio if isinstance(output_audio, str) else json.dumps(output_audio) + otel_span.set_attribute(f"{GenAIAttributes.GEN_AI_COMPLETION}.0.content", content)Apply similar pattern to line 689 for TranscriptionSpanData input handling.
Also applies to: 684-690
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules
Files:
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py
🧬 Code graph analysis (1)
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py (1)
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)
SpanAttributes(64-245)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Test Packages (3.11)
- GitHub Check: Test Packages (3.10)
- GitHub Check: Lint
- GitHub Check: Build Packages (3.11)
- GitHub Check: Test Packages (3.12)
...elemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py
Outdated
Show resolved
Hide resolved
...elemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important
Looks good to me! 👍
Reviewed b3ccbac in 2 minutes and 14 seconds. Click for details.
- Reviewed
244lines of code in3files - Skipped
0files when reviewing. - Skipped posting
4draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/__init__.py:44
- Draft comment:
Clear invocation of wrap_realtime_session; removal of redundant comments streamlines the code. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =10%<= threshold50%This comment seems to be purely informative, as it is stating that the invocation ofwrap_realtime_sessionis clear and that removing redundant comments streamlines the code. It does not provide a suggestion, ask for confirmation, or point out a potential issue.
2. packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py:48
- Draft comment:
Good update: using GenAIAttributes.GEN_AI_SYSTEM instead of a literal improves consistency. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%<= threshold50%This comment is purely informative, as it only praises the update without providing any actionable feedback or suggestions. It doesn't ask for confirmation or suggest any changes.
3. packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py:715
- Draft comment:
Consider using a GenAIAttributes constant for the 'frequency_penalty' attribute for consistency if one is available. - Reason this comment was not posted:
Comment was on unchanged code.
4. packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py:262
- Draft comment:
For consistency, consider using GenAIAttributes constants (e.g. GEN_AI_PROMPT/GEN_AI_COMPLETION) instead of literal strings for prompt and completion keys in create_llm_span. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
Workflow ID: wflow_KBU2Lf1RoTBOrzIP
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In
@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py:
- Around line 299-311: wrap_realtime_session currently unconditionally writes to
the module-level _original_methods, so repeated calls overwrite true originals;
change wrap_realtime_session to guard storing originals by only setting entries
in _original_methods if the key is not already present (e.g., check for
existence of '__aenter__', '__aexit__', '_put_event', and 'send_message' before
assigning), and ensure the guard uses the RealtimeSession attributes
(RealtimeSession.__aenter__, RealtimeSession.__aexit__,
RealtimeSession._put_event, RealtimeSession.send_message) so re-instrumentation
does not capture already-wrapped methods.
- Around line 275-296: The cleanup method fails to clear the llm_spans
dictionary; update the cleanup implementation in the cleanup method to iterate
over self.llm_spans (similar to tool_spans/audio_spans/agent_spans), ensure each
span has set_status(Status(StatusCode.OK)) called and end() invoked, then clear
self.llm_spans; reference the cleanup method and the llm_spans attribute (and
consider consistency with create_llm_span behavior).
🧹 Nitpick comments (3)
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/__init__.py (1)
44-55: Consider adding debug-level logging for exception handling.The silent
try-except-passblocks make it difficult to diagnose issues when realtime instrumentation fails to load or unload. While completely silent failures are acceptable for optional features, adding debug-level logging would aid troubleshooting without impacting normal operation.♻️ Suggested improvement with logging
+import logging + +logger = logging.getLogger(__name__) + ... try: from ._realtime_wrappers import wrap_realtime_session wrap_realtime_session(tracer) except Exception: - pass + logger.debug("Realtime session instrumentation not available", exc_info=True) def _uninstrument(self, **kwargs): try: from ._realtime_wrappers import unwrap_realtime_session unwrap_realtime_session() except Exception: - pass + logger.debug("Realtime session uninstrumentation skipped", exc_info=True)packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py (2)
206-219: Unusedroleparameters could be utilized or removed.Both
record_promptandrecord_completionaccept aroleparameter that is never used. The role is hardcoded to "user" and "assistant" respectively increate_llm_span. Consider either using the passed role or removing the parameter if it's not needed.♻️ Option 1: Use the role parameter
def create_llm_span(self, completion_content: str): """Create an LLM span that shows the prompt and completion.""" - prompts = self.pending_prompts.copy() + prompts = self.pending_prompts.copy() # This would need to store (role, content) tuples ... for i, prompt in enumerate(prompts): - span.set_attribute(f"gen_ai.prompt.{i}.role", "user") + span.set_attribute(f"gen_ai.prompt.{i}.role", prompt[0]) # role from tuple♻️ Option 2: Remove unused parameters
- def record_prompt(self, role: str, content: str): + def record_prompt(self, content: str): """Record a prompt message - buffers it for the LLM span.""" ... - def record_completion(self, role: str, content: str): + def record_completion(self, content: str): """Record a completion message - creates an LLM span with prompt and completion.""" ...
251-257: Hardcoded model name may not reflect actual model used.The model is hardcoded to
"gpt-4o-realtime-preview"which may not always be accurate. Consider extracting the model from the session configuration or event data if available, with this as a fallback.♻️ Suggested approach
+ def create_llm_span(self, completion_content: str, model: Optional[str] = None): """Create an LLM span that shows the prompt and completion.""" ... span = self.tracer.start_span( "openai.realtime", kind=SpanKind.CLIENT, context=parent_context, start_time=start_time, attributes={ SpanAttributes.LLM_REQUEST_TYPE: "realtime", SpanAttributes.LLM_SYSTEM: "openai", GenAIAttributes.GEN_AI_SYSTEM: "openai", - GenAIAttributes.GEN_AI_REQUEST_MODEL: "gpt-4o-realtime-preview", + GenAIAttributes.GEN_AI_REQUEST_MODEL: model or "gpt-4o-realtime-preview", } )Then pass the model from the event data in
traced_put_eventwhen callingrecord_completion.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (3)
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/__init__.pypackages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.pypackages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules
Files:
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/__init__.pypackages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.pypackages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py
🧠 Learnings (1)
📚 Learning: 2025-12-02T21:09:48.690Z
Learnt from: duanyutong
Repo: traceloop/openllmetry PR: 3487
File: packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/utils.py:177-178
Timestamp: 2025-12-02T21:09:48.690Z
Learning: The opentelemetry-instrumentation-openai and opentelemetry-instrumentation-openai-agents packages must remain independent and not share code, so code duplication between them is acceptable.
Applied to files:
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/__init__.pypackages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py
🧬 Code graph analysis (2)
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py (1)
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (2)
SpanAttributes(64-245)TraceloopSpanKindValues(285-290)
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py (1)
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)
SpanAttributes(64-245)
🪛 Ruff (0.14.10)
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/__init__.py
47-48: try-except-pass detected, consider logging the exception
(S110)
47-47: Do not catch blind exception: Exception
(BLE001)
50-50: Unused method argument: kwargs
(ARG002)
54-55: try-except-pass detected, consider logging the exception
(S110)
54-54: Do not catch blind exception: Exception
(BLE001)
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py
206-206: Unused method argument: role
(ARG002)
215-215: Unused method argument: role
(ARG002)
324-325: try-except-pass detected, consider logging the exception
(S110)
324-324: Do not catch blind exception: Exception
(BLE001)
342-343: try-except-pass detected, consider logging the exception
(S110)
342-342: Do not catch blind exception: Exception
(BLE001)
447-448: try-except-pass detected, consider logging the exception
(S110)
447-447: Do not catch blind exception: Exception
(BLE001)
477-478: try-except-pass detected, consider logging the exception
(S110)
477-477: Do not catch blind exception: Exception
(BLE001)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Test Packages (3.12)
- GitHub Check: Test Packages (3.11)
- GitHub Check: Test Packages (3.10)
- GitHub Check: Lint
- GitHub Check: Build Packages (3.11)
🔇 Additional comments (2)
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py (1)
234-305: LGTM - Realtime span creation follows existing patterns.The implementation correctly creates OpenTelemetry spans for the three realtime span types with appropriate attributes and parent context linking. The guard pattern
has_realtime_spans and SpeechSpanData and isinstance(span_data, SpeechSpanData)is slightly redundant (ifhas_realtime_spansis True,SpeechSpanDatais guaranteed to be non-None), but it's defensive and harmless.packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py (1)
1-15: New realtime session instrumentation module looks well-structured.The module introduces comprehensive tracing for OpenAI Agents SDK realtime sessions with proper span hierarchy management, context propagation, and cleanup. The
dont_throwdecorator ensures instrumentation failures don't break the application.
| def cleanup(self): | ||
| """Clean up any remaining spans.""" | ||
| for span in list(self.tool_spans.values()): | ||
| span.set_status(Status(StatusCode.OK)) | ||
| span.end() | ||
| self.tool_spans.clear() | ||
|
|
||
| for span in list(self.audio_spans.values()): | ||
| span.set_status(Status(StatusCode.OK)) | ||
| span.end() | ||
| self.audio_spans.clear() | ||
|
|
||
| for span in list(self.agent_spans.values()): | ||
| span.set_status(Status(StatusCode.OK)) | ||
| span.end() | ||
| self.agent_spans.clear() | ||
| self.agent_span_contexts.clear() | ||
|
|
||
| if self.workflow_span: | ||
| self.workflow_span.set_status(Status(StatusCode.OK)) | ||
| self.workflow_span.end() | ||
| self.workflow_span = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
llm_spans dictionary is not cleaned up.
The cleanup method clears tool_spans, audio_spans, and agent_spans, but self.llm_spans (initialized on line 29) is never cleaned up. While LLM spans are created and immediately ended in create_llm_span, the dictionary should still be cleared for consistency.
🔧 Suggested fix
for span in list(self.agent_spans.values()):
span.set_status(Status(StatusCode.OK))
span.end()
self.agent_spans.clear()
self.agent_span_contexts.clear()
+ self.llm_spans.clear()
if self.workflow_span:🤖 Prompt for AI Agents
In
@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py
around lines 275 - 296, The cleanup method fails to clear the llm_spans
dictionary; update the cleanup implementation in the cleanup method to iterate
over self.llm_spans (similar to tool_spans/audio_spans/agent_spans), ensure each
span has set_status(Status(StatusCode.OK)) called and end() invoked, then clear
self.llm_spans; reference the cleanup method and the llm_spans attribute (and
consider consistency with create_llm_span behavior).
...trumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important
Looks good to me! 👍
Reviewed 688aab4 in 2 minutes and 31 seconds. Click for details.
- Reviewed
642lines of code in2files - Skipped
0files when reviewing. - Skipped posting
5draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py:92
- Draft comment:
Wrap json.dumps calls in try/except to handle potential serialization errors. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% The diff shows that this is primarily a refactoring - code was extracted from inline implementation into helper functions. Thejson.dumpscalls existed in the original code and were simply moved to the new helper functions. The comment is suggesting defensive programming (adding try/except), which is a code quality suggestion. However, since this code was just moved and not newly written, and the comment doesn't point to a specific new problem introduced by the refactoring, this appears to be a comment about unchanged logic. The rules state "Do NOT comment unless there is clearly a code change required" and "Do NOT comment on anything that would be obviously caught by the build". If json.dumps fails, it would likely be caught during testing. This is more of a speculative improvement suggestion rather than a clear bug. The comment could be valid if the data being serialized is user-controlled or could contain non-serializable objects. In production code, defensive error handling around serialization is often a good practice. However, I need to consider whether this represents a clear issue that was introduced or changed in this PR. While defensive error handling is good practice, this comment is about code that was simply moved during refactoring, not newly introduced logic. The original code didn't have try/except blocks either. According to the rules, I should only keep comments about changes made in the diff, and this is just relocated code with the same behavior. Additionally, this is a speculative suggestion about what "could" go wrong rather than a definite issue. This comment should be deleted because it's about code that was simply moved during refactoring, not changed. The json.dumps calls existed in the original code without try/except blocks, so this isn't a new issue introduced by the PR. The comment is also speculative about potential errors rather than pointing to a definite problem.
2. packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py:543
- Draft comment:
Good refactor using _extract_prompt_attributes and _extract_response_attributes; consider enhancing docstrings to specify expected input formats. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
3. packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py:67
- Draft comment:
Clarify the return type in _get_parent_context's docstring (i.e. a context object or None). - Reason this comment was not posted:
Confidence changes required:50%<= threshold50%None
4. packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py:327
- Draft comment:
Cleanup() already ends the workflow span; calling end_workflow_span() afterward may be redundant—verify if both calls are needed. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
5. packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py:440
- Draft comment:
Consider logging exceptions in wrapper methods (e.g., in traced_put_event) rather than silently passing, to aid debugging in development. - Reason this comment was not posted:
Confidence changes required:33%<= threshold50%None
Workflow ID: wflow_0O5XGrnoAlqBxWrB
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important
Looks good to me! 👍
Reviewed fb63eab in 1 minute and 30 seconds. Click for details.
- Reviewed
18lines of code in1files - Skipped
0files when reviewing. - Skipped posting
1draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py:417
- Draft comment:
The code now records completions only for 'assistant' in history_added events, removing the branch for 'user'. Ensure this is intentional (e.g. to avoid duplicate prompt recording, since prompts are captured elsewhere). - Reason this comment was not posted:
Comment looked like it was already resolved.
Workflow ID: wflow_TOgAnM43qqMcOGfx
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In
@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py:
- Line 29: Remove the unused instance attribute self.llm_spans from the class:
delete its initialization (the line "self.llm_spans: Dict[str, Span] = {}") and
any direct references to that name; ensure no other code relies on it since
spans are created and ended immediately in create_llm_span(), and remove any
now-unused imports (e.g., Dict or Span) only if they become unused after this
deletion.
- Line 248: The span currently hardcodes GenAIAttributes.GEN_AI_REQUEST_MODEL to
"gpt-4o-realtime-preview"; change the wrapper to capture the actual model from
the realtime configuration during initialization (e.g., read into self._model or
model_name from the incoming realtime config/params in the wrapper's __init__ or
setup) and modify calls to create_llm_span(...) to accept and forward that model
value so create_llm_span uses the real model string for
GenAIAttributes.GEN_AI_REQUEST_MODEL instead of the hardcoded literal.
🧹 Nitpick comments (1)
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py (1)
205-212: Consider removing unusedroleparameters.The
roleparameters in bothrecord_prompt()andrecord_completion()are unused. If these are intended for future use or API consistency, consider adding a comment explaining the design decision.♻️ Proposed refactor
If the
roleparameter is not needed:- def record_prompt(self, role: str, content: str): + def record_prompt(self, content: str): """Record a prompt message - buffers it for the LLM span.""" if not content: return if not self.pending_prompts: self.prompt_start_time = time.time_ns() self.prompt_agent_name = self.current_agent_name or self.starting_agent_name self.pending_prompts.append(content) - def record_completion(self, role: str, content: str): + def record_completion(self, content: str): """Record a completion message - creates an LLM span with prompt and completion.""" if not content: return self.create_llm_span(content)And update the call sites on lines 421, 436, and 455.
Also applies to: 214-218
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules
Files:
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py
🧠 Learnings (2)
📓 Common learnings
Learnt from: duanyutong
Repo: traceloop/openllmetry PR: 3487
File: packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/utils.py:177-178
Timestamp: 2025-12-02T21:09:48.690Z
Learning: The opentelemetry-instrumentation-openai and opentelemetry-instrumentation-openai-agents packages must remain independent and not share code, so code duplication between them is acceptable.
📚 Learning: 2025-12-02T21:09:48.690Z
Learnt from: duanyutong
Repo: traceloop/openllmetry PR: 3487
File: packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/utils.py:177-178
Timestamp: 2025-12-02T21:09:48.690Z
Learning: The opentelemetry-instrumentation-openai and opentelemetry-instrumentation-openai-agents packages must remain independent and not share code, so code duplication between them is acceptable.
Applied to files:
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py
🧬 Code graph analysis (1)
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py (1)
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)
TraceloopSpanKindValues(285-290)
🪛 Ruff (0.14.10)
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py
205-205: Unused method argument: role
(ARG002)
214-214: Unused method argument: role
(ARG002)
317-318: try-except-pass detected, consider logging the exception
(S110)
317-317: Do not catch blind exception: Exception
(BLE001)
335-336: try-except-pass detected, consider logging the exception
(S110)
335-335: Do not catch blind exception: Exception
(BLE001)
437-438: try-except-pass detected, consider logging the exception
(S110)
437-437: Do not catch blind exception: Exception
(BLE001)
467-468: try-except-pass detected, consider logging the exception
(S110)
467-467: Do not catch blind exception: Exception
(BLE001)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Test Packages (3.12)
- GitHub Check: Test Packages (3.10)
- GitHub Check: Test Packages (3.11)
- GitHub Check: Lint
- GitHub Check: Build Packages (3.11)
...trumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py
Outdated
Show resolved
Hide resolved
| SpanAttributes.LLM_REQUEST_TYPE: "realtime", | ||
| SpanAttributes.LLM_SYSTEM: "openai", | ||
| GenAIAttributes.GEN_AI_SYSTEM: "openai", | ||
| GenAIAttributes.GEN_AI_REQUEST_MODEL: "gpt-4o-realtime-preview", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
rg -n --type=py -C3 "gpt-4o-realtime|model.*realtime|realtime.*model" --glob '!**/test*.py'Repository: traceloop/openllmetry
Length of output: 7720
🏁 Script executed:
# Check the context around line 248 in the realtime_wrappers.py file
sed -n '230,270p' packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.pyRepository: traceloop/openllmetry
Length of output: 1554
🏁 Script executed:
# Look for how the model is determined elsewhere in the realtime wrappers
rg -n "GEN_AI_REQUEST_MODEL|gen_ai_request_model" --type=py packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/Repository: traceloop/openllmetry
Length of output: 1062
🏁 Script executed:
# Check if there's any configuration or client model information available in the realtime session
rg -n "model\s*=" --type=py -A2 -B2 packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.pyRepository: traceloop/openllmetry
Length of output: 47
🏁 Script executed:
# Find the class definition and initialization for the wrapper containing create_llm_span
rg -n "class.*Wrapper|def __init__" --type=py -A5 packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py | head -60Repository: traceloop/openllmetry
Length of output: 341
🏁 Script executed:
# Check the full create_llm_span function signature and see what instance variables are available
sed -n '200,270p' packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.pyRepository: traceloop/openllmetry
Length of output: 2809
🏁 Script executed:
# Look for where create_llm_span is called to understand what context is available
rg -n "create_llm_span" --type=py packages/opentelemetry-instrumentation-openai-agents/Repository: traceloop/openllmetry
Length of output: 404
Hardcoded model name should be captured from the realtime configuration.
The model name "gpt-4o-realtime-preview" is hardcoded in the span attributes. The actual model can vary (as seen in examples using "gpt-4o-realtime-preview-2024-12-17" or other variants). Store the model information when the wrapper is initialized and pass it to create_llm_span() so the span attributes reflect the actual model being used.
🤖 Prompt for AI Agents
In
@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py
at line 248, The span currently hardcodes GenAIAttributes.GEN_AI_REQUEST_MODEL
to "gpt-4o-realtime-preview"; change the wrapper to capture the actual model
from the realtime configuration during initialization (e.g., read into
self._model or model_name from the incoming realtime config/params in the
wrapper's __init__ or setup) and modify calls to create_llm_span(...) to accept
and forward that model value so create_llm_span uses the real model string for
GenAIAttributes.GEN_AI_REQUEST_MODEL instead of the hardcoded literal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important
Looks good to me! 👍
Reviewed 7d3c83d in 42 seconds. Click for details.
- Reviewed
12lines of code in1files - Skipped
1files when reviewing. - Skipped posting
1draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/opentelemetry-instrumentation-openai-agents/tests/test_realtime_session.py:4
- Draft comment:
Removed unused unittest.mock import for MagicMock, AsyncMock, and patch. This cleanup improves code clarity. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
Workflow ID: wflow_y47ImqsYJkHsj8jj
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (4)
packages/opentelemetry-instrumentation-openai-agents/tests/test_realtime_session.py (4)
19-19: Consider moving import to module level.The
SimpleSpanProcessorimport is inside the fixture function. For consistency with other imports, consider moving it to the top of the file.♻️ Proposed refactor
Move the import to the module level:
from opentelemetry.sdk.trace import TracerProvider from opentelemetry.sdk.trace.export.in_memory_span_exporter import InMemorySpanExporter +from opentelemetry.sdk.trace.export import SimpleSpanProcessor from opentelemetry.instrumentation.openai_agents._realtime_wrappers import ( RealtimeTracingState, wrap_realtime_session, unwrap_realtime_session, )And remove it from inside the fixture:
@pytest.fixture def tracer_provider(): """Create a tracer provider with in-memory exporter.""" provider = TracerProvider() exporter = InMemorySpanExporter() - from opentelemetry.sdk.trace.export import SimpleSpanProcessor provider.add_span_processor(SimpleSpanProcessor(exporter)) return provider, exporter
36-36: Prefix unused exporter variables with underscore.Multiple tests unpack the
exporterfrom thetracer_providerfixture but don't use it. Per Python convention and Ruff's suggestion, prefix these with an underscore to indicate they're intentionally unused.♻️ Proposed fix
- _, exporter = tracer_provider + _, _ = tracer_providerApply this change to lines 36, 59, 84, 124, 148, and 160.
Also applies to: 59-59, 84-84, 124-124, 148-148, 160-160
251-255: Consider adding integration tests for session wrapping.The current test only verifies that
wrap_realtime_sessionandunwrap_realtime_sessiondon't raise exceptions. It doesn't verify that:
- Wrapping actually patches RealtimeSession methods
- The tracing state is properly attached to session instances
- Unwrapping restores original behavior
Consider adding integration tests that create mock RealtimeSession objects and verify the instrumentation behavior.
34-246: Comprehensive test coverage for RealtimeTracingState.The test suite thoroughly covers the main lifecycle operations:
- Workflow, agent, tool, and audio span management
- Handoff span creation with proper attributes
- Error recording and prompt/completion buffering
- Multiple LLM span creation
- Cleanup functionality
The tests validate span names, attributes (e.g.,
gen_ai.tool.output,gen_ai.handoff.*), and proper export behavior.Optionally, consider adding edge case tests for:
- Attempting to end spans that haven't been started
- Starting multiple workflow spans
- Recording completion without a prior prompt
- Handling empty/null values
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (1)
packages/sample-app/poetry.lockis excluded by!**/*.lock
📒 Files selected for processing (1)
packages/opentelemetry-instrumentation-openai-agents/tests/test_realtime_session.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules
Files:
packages/opentelemetry-instrumentation-openai-agents/tests/test_realtime_session.py
🧠 Learnings (1)
📓 Common learnings
Learnt from: duanyutong
Repo: traceloop/openllmetry PR: 3487
File: packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/utils.py:177-178
Timestamp: 2025-12-02T21:09:48.690Z
Learning: The opentelemetry-instrumentation-openai and opentelemetry-instrumentation-openai-agents packages must remain independent and not share code, so code duplication between them is acceptable.
🪛 Ruff (0.14.10)
packages/opentelemetry-instrumentation-openai-agents/tests/test_realtime_session.py
36-36: Unpacked variable exporter is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
59-59: Unpacked variable exporter is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
84-84: Unpacked variable exporter is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
124-124: Unpacked variable exporter is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
148-148: Unpacked variable exporter is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
160-160: Unpacked variable exporter is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Lint
- GitHub Check: Test Packages (3.12)
- GitHub Check: Build Packages (3.11)
- GitHub Check: Test Packages (3.10)
- GitHub Check: Test Packages (3.11)
🔇 Additional comments (2)
packages/opentelemetry-instrumentation-openai-agents/tests/test_realtime_session.py (2)
258-305: Well-designed span hierarchy validation.The tests correctly verify parent-child relationships between spans:
- Tool spans properly parented under agent spans
- Audio spans properly parented under the current agent
- Proper use of OpenTelemetry's span context API for validation
The hierarchy tests ensure that spans are correctly nested within the tracing structure, which is critical for proper observability.
153-156: The test correctly accesses attributes on live spans. In OpenTelemetry's Python SDK, span attributes are directly accessible throughout the span's lifecycle via the.attributesproperty—they are immediately readable afterset_attribute()is called, not just after the span is exported. This pattern is used consistently throughout the test suite and is the standard way to verify attributes in OpenTelemetry tests.Likely an incorrect or invalid review comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important
Looks good to me! 👍
Reviewed 89558df in 45 seconds. Click for details.
- Reviewed
354lines of code in3files - Skipped
0files when reviewing. - Skipped posting
3draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py:24
- Draft comment:
Good improvement: caching realtime span type imports at module level (lines 24-31) avoids repeated costly lookups. Ensure all realtime span checks now consistently use the global _has_realtime_spans variable. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
2. packages/opentelemetry-instrumentation-openai-agents/tests/test_realtime.py:98
- Draft comment:
Nice refactor in tests: patching module-level variables directly (using patch.object) instead of modifying sys.modules clarifies the intent and minimizes side effects. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
3. packages/sample-app/sample_app/openai_agents_realtime_example.py:284
- Draft comment:
Excellent update in the calculate() function: switching from unsafe eval to AST-based safe evaluation greatly improves security. Consider whether additional numeric operators or validations are needed for expanded use cases. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
Workflow ID: wflow_T6RlIbTY7REsz3PH
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py (1)
411-411: Import AgentSpanData in on_span_end for consistency and replace the string-based type check.Line 625 uses
type(span_data).__name__ == 'AgentSpanData'while the same type is checked withisinstance()inon_span_start. Import AgentSpanData at the top of theon_span_endmethod (where GenerationSpanData is already imported on line 52) and replace the string comparison withisinstance(span_data, AgentSpanData)for consistency.Note: ResponseSpanData cannot be imported from the agents module and requires the string-based fallback.
Also applies to: 540 (where applicable)
🤖 Fix all issues with AI agents
In @packages/sample-app/sample_app/openai_agents_realtime_example.py:
- Around line 629-636: Remove the unused "# noqa: F401" comments from the two
import statements so they no longer appear after "import agents" and "from
agents.realtime import RealtimeAgent"; simply delete the " # noqa: F401"
trailing each import to clean up the file and satisfy the linter.
🧹 Nitpick comments (3)
packages/sample-app/sample_app/openai_agents_realtime_example.py (3)
231-233: Broad exception handling is acceptable for demo code.Catching
Exceptionbroadly is flagged by Ruff, but for a sample application, this provides better user experience with a friendly error message. In production code, consider catching more specific exceptions (e.g.,OpenAIError,ConnectionError) to avoid masking bugs.
462-464: Broad exception handling is acceptable for demo code.Similar to the voice demo, catching
Exceptionprovides user-friendly error messages. This is reasonable for sample code but production code should catch more specific exceptions.
615-616: Broad exception handling is acceptable for demo code.Consistent with the other demo functions, this provides user-friendly error messages suitable for sample code.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (3)
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.pypackages/opentelemetry-instrumentation-openai-agents/tests/test_realtime.pypackages/sample-app/sample_app/openai_agents_realtime_example.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules
Files:
packages/sample-app/sample_app/openai_agents_realtime_example.pypackages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.pypackages/opentelemetry-instrumentation-openai-agents/tests/test_realtime.py
🧠 Learnings (1)
📓 Common learnings
Learnt from: duanyutong
Repo: traceloop/openllmetry PR: 3487
File: packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/utils.py:177-178
Timestamp: 2025-12-02T21:09:48.690Z
Learning: The opentelemetry-instrumentation-openai and opentelemetry-instrumentation-openai-agents packages must remain independent and not share code, so code duplication between them is acceptable.
🧬 Code graph analysis (2)
packages/sample-app/sample_app/openai_agents_realtime_example.py (2)
packages/traceloop-sdk/traceloop/sdk/__init__.py (2)
Traceloop(37-268)init(49-199)packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/realtime_wrappers.py (2)
session(362-364)item(560-562)
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py (1)
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)
SpanAttributes(64-245)
🪛 Ruff (0.14.10)
packages/sample-app/sample_app/openai_agents_realtime_example.py
231-231: Do not catch blind exception: Exception
(BLE001)
306-306: Avoid specifying long messages outside the exception class
(TRY003)
311-311: Avoid specifying long messages outside the exception class
(TRY003)
316-316: Avoid specifying long messages outside the exception class
(TRY003)
322-322: Avoid specifying long messages outside the exception class
(TRY003)
328-328: Prefer TypeError exception for invalid type
(TRY004)
328-328: Avoid specifying long messages outside the exception class
(TRY003)
333-333: Consider moving this statement to an else block
(TRY300)
462-462: Do not catch blind exception: Exception
(BLE001)
615-615: Do not catch blind exception: Exception
(BLE001)
629-629: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
634-634: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Test Packages (3.12)
- GitHub Check: Test Packages (3.10)
- GitHub Check: Lint
- GitHub Check: Build Packages (3.11)
- GitHub Check: Test Packages (3.11)
🔇 Additional comments (21)
packages/sample-app/sample_app/openai_agents_realtime_example.py (7)
1-41: LGTM: Clean module setup with proper documentation.The module docstring is clear, environment variables are loaded properly, and Traceloop is initialized correctly. No hardcoded secrets detected.
66-108: LGTM: Well-structured tool functions and agent configuration.The tool functions are simple, safe, and well-documented. The agent is configured with clear instructions and appropriate tools for the voice interface demo.
164-229: LGTM: Robust event handling with defensive programming.The event loop uses
getattrwith defaults to safely handle different event types, and themax_eventslimit prevents infinite loops. The AttributeError handling provides graceful degradation for unexpected event structures.
284-335: Excellent: Security concern addressed with AST-based safe evaluation.The
calculatefunction now uses AST parsing instead ofeval(), directly addressing the previous review's security concern. The implementation correctly:
- Excludes exponentiation (
Pow) to prevent resource exhaustion- Limits expression depth to 50 levels
- Only allows safe operators (Add, Sub, Mult, Div, unary +/-)
- Handles both Python 3.7 (
ast.Num) and 3.8+ (ast.Constant) nodesThis is a secure, production-ready approach for a sample application.
Optional: Minor static analysis refinements
Ruff suggests some minor style improvements:
- Line 328: Consider
TypeErrorinstead ofValueErrorfor unsupported node types- Lines 306, 311, 316, 322, 328: Long error messages could be moved to custom exception classes
- Line 333: The
returnstatement could move to anelseblockThese are nitpicks and don't affect functionality for a demo app.
492-524: LGTM: Clean agent handoff configuration.The agent hierarchy is well-structured with specialized agents and appropriate handoff descriptions. This effectively demonstrates the realtime handoff capabilities.
645-648: LGTM: Proper environment variable check.Checking for
OPENAI_API_KEYin environment variables follows best practices and aligns with the coding guideline to store API keys only in environment variables or secure vaults.
653-684: LGTM: Clean main block with good user guidance.The command-line argument parsing and routing logic is clear and includes helpful error messages. The requirement check before execution prevents confusing errors.
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py (7)
23-32: LGTM! Module-level import pattern implemented correctly.The conditional import has been moved to module level, following the same pattern as
set_agent_name. This addresses the previous review comment and avoids repeated expensive path lookups on everyon_span_startinvocation.
34-150: Well-structured helper for prompt extraction.The function properly handles both OpenAI chat format and Agents SDK format, with appropriate conversion logic and nested structure support. The unified dict-based approach is clean.
152-245: Well-structured helper for response extraction.The function properly extracts model settings, completions, and usage data with defensive attribute checks. The handling of different output types (text messages, tool calls, direct text) and token naming variations is thorough. Returning model_settings for parent span propagation is a good design choice.
273-273: Good migration to GenAIAttributes constants.Replacing hardcoded strings with semantic convention constants improves maintainability and ensures consistency with OpenTelemetry standards.
Also applies to: 323-323, 366-366, 391-392, 418-419, 437-438, 636-636
449-520: Realtime span creation logic is well-structured.The three realtime span types (Speech, Transcription, SpeechGroup) follow a consistent pattern with appropriate attribute setting, parent context management, and defensive checks using the module-level
_has_realtime_spansflag.
543-545: Good use of extraction helpers.Replacing inline extraction logic with the new helper functions reduces duplication and improves maintainability.
Also applies to: 576-576, 582-582, 586-586
589-624: Realtime span end handling is well-implemented.The end handling for realtime spans properly extracts input/output data and sets appropriate attributes. The use of
isinstance()checks is consistent with best practices.packages/opentelemetry-instrumentation-openai-agents/tests/test_realtime.py (7)
1-59: Well-designed mock classes for testing.The mock classes provide a clean abstraction for testing realtime instrumentation without requiring actual WebSocket connections. The use of dynamic attribute setting via kwargs in MockSpanData is flexible and appropriate for testing.
61-68: LGTM on the test fixture.The fixture properly sets up an in-memory exporter with a simple span processor, which is the standard pattern for testing OpenTelemetry instrumentation.
70-167: Comprehensive speech span tests.The tests properly verify span creation, attribute setting, and model/format capture for speech synthesis. The patching strategy using module-level variables is correct and aligns with the implementation.
169-265: Comprehensive transcription span tests.The tests properly verify span creation and attribute capture for transcription operations, following the same solid pattern as the speech tests.
267-314: LGTM on speech group span test.The test properly verifies span creation and attributes for speech group operations.
316-361: Good error handling test.The test properly verifies that errors are recorded on realtime spans with the correct status code.
363-424: Thorough span hierarchy verification.The test properly verifies that realtime spans are correctly nested under the workflow root span, ensuring proper parent-child relationships in the span tree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important
Looks good to me! 👍
Reviewed f07996d in 1 minute and 30 seconds. Click for details.
- Reviewed
23lines of code in1files - Skipped
0files when reviewing. - Skipped posting
2draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py:26
- Draft comment:
Removed 'self.llm_spans' from the tracing state. Confirm this field is unused and its removal is intentional. - Reason this comment was not posted:
Comment looked like it was already resolved.
2. packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py:291
- Draft comment:
Added guard to prevent multiple wrapping by checking '_original_methods'. Ensure this approach meets concurrent usage requirements if applicable. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%<= threshold50%The comment is asking the author to ensure that the approach meets concurrent usage requirements, which is against the rules. It is not making a specific suggestion or pointing out a specific issue with the code.
Workflow ID: wflow_XCeiGXzkr8Jzwz35
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🤖 Fix all issues with AI agents
In
@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py:
- Line 308: The _tracing_states dict currently created inside
wrap_realtime_session() can leak across wrap/unwrap cycles; move its declaration
out to module scope alongside _original_methods (name: _tracing_states:
Dict[int, RealtimeTracingState]) so all wrappers share the same mapping, then
update unwrap_realtime_session() to explicitly clear _tracing_states (e.g.,
_tracing_states.clear()) when restoring original methods to avoid leftover
references; adjust any references inside wrap_realtime_session() and the wrapper
functions to use the module-level _tracing_states.
- Line 29: The llm_spans dictionary is initialized but never used; remove the
unused attribute and any references to it to avoid dead state: delete the
self.llm_spans: Dict[str, Any] = {} initialization and ensure create_llm_span
(and any other methods referencing llm_spans) do not rely on it since spans are
created and ended immediately in create_llm_span; if tracking is desired
instead, implement proper population and use of llm_spans within create_llm_span
and related logic, otherwise remove the unused symbol entirely.
In @packages/sample-app/sample_app/openai_agents_realtime_example.py:
- Around line 629-636: Remove the unnecessary noqa directives on the two import
lines that reference agents and agents.realtime (the lines importing agents and
from agents.realtime import RealtimeAgent); simply delete the trailing "# noqa:
F401" comments so the imports remain unchanged but without the unused linter
suppression.
🧹 Nitpick comments (3)
packages/sample-app/sample_app/openai_agents_realtime_example.py (1)
268-281: Consider extracting duplicated tool definitions (optional).The
get_weathertool is defined in bothrun_realtime_demo(lines 66-79) andrun_text_only_demo(lines 268-281). While the duplication is minor and each demo remains self-contained for clarity, you could optionally extract common tools to module level if you prefer DRY principles.However, for sample/demo code, the current self-contained approach may be more pedagogically clear for readers.
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py (1)
204-211: Unusedroleparameters inrecord_promptandrecord_completion.Both methods accept a
roleparameter but never use it. If these parameters are reserved for future functionality, consider documenting that intent. Otherwise, remove them to simplify the API.♻️ Proposed fix (if role is not needed)
- def record_prompt(self, role: str, content: str): - """Record a prompt message - buffers it for the LLM span.""" + def record_prompt(self, content: str): + """Record a prompt message - buffers it for the LLM span.""" if not content: return if not self.pending_prompts: self.prompt_start_time = time.time_ns() self.prompt_agent_name = self.current_agent_name or self.starting_agent_name self.pending_prompts.append(content) - def record_completion(self, role: str, content: str): - """Record a completion message - creates an LLM span with prompt and completion.""" + def record_completion(self, content: str): + """Record a completion message - creates an LLM span with prompt and completion.""" if not content: return self.create_llm_span(content)Then update callers at lines 424, 439, 458, 463.
Also applies to: 213-217
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py (1)
34-150: Consider adding size limits to prompt attribute values.The
_extract_prompt_attributesfunction sets content and arguments attributes without size limits (lines 105, 149). Unlike other parts of the codebase that truncate to 4096 characters (e.g., line 256 in_realtime_wrappers.py), these attributes could become extremely large if prompts contain substantial text or encoded data.Add truncation to prevent excessive attribute sizes that could impact telemetry backend performance.
♻️ Proposed fix
# Set content attribute if content is not None: if not isinstance(content, str): content = json.dumps(content) - otel_span.set_attribute(f"{prefix}.content", content) + otel_span.set_attribute(f"{prefix}.content", content[:4096]) # Set tool_call_id for tool result messages if tool_call_id: otel_span.set_attribute(f"{prefix}.tool_call_id", tool_call_id) # Set tool_calls for assistant messages with tool calls if tool_calls: for j, tool_call in enumerate(tool_calls): # ... existing code ... if tool_call.get('arguments'): args = tool_call['arguments'] if not isinstance(args, str): args = json.dumps(args) - otel_span.set_attribute(f"{prefix}.tool_calls.{j}.arguments", args) + otel_span.set_attribute(f"{prefix}.tool_calls.{j}.arguments", args[:4096])
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (4)
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.pypackages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.pypackages/opentelemetry-instrumentation-openai-agents/tests/test_realtime.pypackages/sample-app/sample_app/openai_agents_realtime_example.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules
Files:
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.pypackages/opentelemetry-instrumentation-openai-agents/tests/test_realtime.pypackages/sample-app/sample_app/openai_agents_realtime_example.pypackages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py
🧠 Learnings (2)
📓 Common learnings
Learnt from: duanyutong
Repo: traceloop/openllmetry PR: 3487
File: packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/utils.py:177-178
Timestamp: 2025-12-02T21:09:48.690Z
Learning: The opentelemetry-instrumentation-openai and opentelemetry-instrumentation-openai-agents packages must remain independent and not share code, so code duplication between them is acceptable.
📚 Learning: 2025-12-02T21:09:48.690Z
Learnt from: duanyutong
Repo: traceloop/openllmetry PR: 3487
File: packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/utils.py:177-178
Timestamp: 2025-12-02T21:09:48.690Z
Learning: The opentelemetry-instrumentation-openai and opentelemetry-instrumentation-openai-agents packages must remain independent and not share code, so code duplication between them is acceptable.
Applied to files:
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.pypackages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py
🧬 Code graph analysis (1)
packages/sample-app/sample_app/openai_agents_realtime_example.py (2)
packages/traceloop-sdk/traceloop/sdk/__init__.py (2)
Traceloop(37-268)init(49-199)packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/realtime_wrappers.py (2)
session(362-364)item(560-562)
🪛 Ruff (0.14.10)
packages/sample-app/sample_app/openai_agents_realtime_example.py
231-231: Do not catch blind exception: Exception
(BLE001)
306-306: Avoid specifying long messages outside the exception class
(TRY003)
311-311: Avoid specifying long messages outside the exception class
(TRY003)
316-316: Avoid specifying long messages outside the exception class
(TRY003)
322-322: Avoid specifying long messages outside the exception class
(TRY003)
328-328: Prefer TypeError exception for invalid type
(TRY004)
328-328: Avoid specifying long messages outside the exception class
(TRY003)
333-333: Consider moving this statement to an else block
(TRY300)
462-462: Do not catch blind exception: Exception
(BLE001)
615-615: Do not catch blind exception: Exception
(BLE001)
629-629: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
634-634: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py
204-204: Unused method argument: role
(ARG002)
213-213: Unused method argument: role
(ARG002)
320-321: try-except-pass detected, consider logging the exception
(S110)
320-320: Do not catch blind exception: Exception
(BLE001)
338-339: try-except-pass detected, consider logging the exception
(S110)
338-338: Do not catch blind exception: Exception
(BLE001)
440-441: try-except-pass detected, consider logging the exception
(S110)
440-440: Do not catch blind exception: Exception
(BLE001)
470-471: try-except-pass detected, consider logging the exception
(S110)
470-470: Do not catch blind exception: Exception
(BLE001)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Test Packages (3.11)
- GitHub Check: Build Packages (3.11)
- GitHub Check: Test Packages (3.10)
- GitHub Check: Test Packages (3.12)
- GitHub Check: Lint
🔇 Additional comments (6)
packages/sample-app/sample_app/openai_agents_realtime_example.py (5)
43-243: LGTM! Well-structured realtime demo.The function demonstrates realtime capabilities effectively with:
- Proper tool definitions using
@function_tooldecorator- Comprehensive agent and runner configuration
- Safe event processing with
getattrpatterns to handle varying event structures- Event limit to prevent infinite loops in demo context
The broad exception catch on line 231 is acceptable for a sample application as it provides user-friendly error messaging.
283-335: Excellent secure implementation of safe expression evaluator.The AST-based evaluation properly addresses the previous security concern by:
- Using an allowlist of safe operators (no exponentiation to prevent
9**9**9attacks)- Implementing depth limits to prevent deeply nested expressions
- Handling both Python 3.7 (
ast.Num) and 3.8+ (ast.Constant) for compatibility- Providing clear error messages for unsupported operations
The static analysis hints about long error messages and using
TypeErrorare pedantic in this context - the implementation is appropriate for a sample/demo tool.
377-461: LGTM! Sequential message handling is well-implemented.The event loop correctly:
- Sends demo messages sequentially after agent completion
- Uses brief pauses between messages for natural pacing
- Filters verbose events to keep output readable
- Includes safeguards (max_events limit) to prevent runaway loops
475-622: LGTM! Handoff demo clearly illustrates multi-agent coordination.The function effectively demonstrates:
- Agent hierarchy with specialized agents (Weather Expert, Time Expert)
- Proper use of
realtime_handoffwith descriptive tool descriptions- Handoff event tracking with emphasized logging (
***markers)- Graceful session termination after sufficient events
653-684: LGTM! Main block provides clear entry points.The main execution block:
- Validates requirements before running demos
- Provides clear command-line interface with three demo options
- Includes helpful guidance messages for users
- Properly handles async execution with
asyncio.run()packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py (1)
247-247: Hardcoded model name doesn't reflect the actual realtime model.The model is hardcoded as
"gpt-4o-realtime-preview", but realtime sessions can use different model variants (e.g.,"gpt-4o-realtime-preview-2024-12-17"). This results in inaccurate telemetry data.Capture the actual model from the realtime session configuration during initialization and use it when creating LLM spans.
Verify what model information is available from the RealtimeSession object:
#!/bin/bash # Search for model configuration in RealtimeSession and related classes rg -n --type=py -C5 "class RealtimeSession" packages/Likely an incorrect or invalid review comment.
| elif span_data: | ||
| # Extract prompt data from input and add to response span (legacy support) | ||
| input_data = getattr(span_data, 'input', []) | ||
| if input_data: | ||
| for i, message in enumerate(input_data): | ||
| if hasattr(message, 'role') and hasattr(message, 'content'): | ||
| otel_span.set_attribute(f"gen_ai.prompt.{i}.role", message.role) | ||
| content = message.content | ||
| if isinstance(content, dict): | ||
| content = json.dumps(content) | ||
| otel_span.set_attribute(f"gen_ai.prompt.{i}.content", content) | ||
| elif isinstance(message, dict): | ||
| if 'role' in message and 'content' in message: | ||
| otel_span.set_attribute(f"gen_ai.prompt.{i}.role", message['role']) | ||
| content = message['content'] | ||
| if isinstance(content, dict): | ||
| content = json.dumps(content) | ||
| otel_span.set_attribute(f"gen_ai.prompt.{i}.content", content) | ||
| _extract_prompt_attributes(otel_span, input_data) | ||
|
|
||
| response = getattr(span_data, 'response', None) | ||
| if response: | ||
|
|
||
| # Extract model settings from the response | ||
| model_settings = {} | ||
|
|
||
| if hasattr(response, 'temperature') and response.temperature is not None: | ||
| model_settings['temperature'] = response.temperature | ||
| otel_span.set_attribute(GenAIAttributes.GEN_AI_REQUEST_TEMPERATURE, response.temperature) | ||
|
|
||
| if hasattr(response, 'max_output_tokens') and response.max_output_tokens is not None: | ||
| model_settings['max_tokens'] = response.max_output_tokens | ||
| otel_span.set_attribute(GenAIAttributes.GEN_AI_REQUEST_MAX_TOKENS, response.max_output_tokens) | ||
|
|
||
| if hasattr(response, 'top_p') and response.top_p is not None: | ||
| model_settings['top_p'] = response.top_p | ||
| otel_span.set_attribute(GenAIAttributes.GEN_AI_REQUEST_TOP_P, response.top_p) | ||
|
|
||
| if hasattr(response, 'model') and response.model: | ||
| model_settings['model'] = response.model | ||
| otel_span.set_attribute("gen_ai.request.model", response.model) | ||
|
|
||
| # Extract completions and add directly to response span using OpenAI semantic conventions | ||
| if hasattr(response, 'output') and response.output: | ||
| for i, output in enumerate(response.output): | ||
| # Handle different output types | ||
| if hasattr(output, 'content') and output.content: | ||
| # Text message with content array (ResponseOutputMessage) | ||
| content_text = "" | ||
| for content_item in output.content: | ||
| if hasattr(content_item, 'text'): | ||
| content_text += content_item.text | ||
|
|
||
| if content_text: | ||
| otel_span.set_attribute( | ||
| f"{GenAIAttributes.GEN_AI_COMPLETION}.{i}.content", content_text) | ||
| otel_span.set_attribute( | ||
| f"{GenAIAttributes.GEN_AI_COMPLETION}.{i}.role", getattr( | ||
| output, 'role', 'assistant')) | ||
|
|
||
| elif hasattr(output, 'name'): | ||
| # Function/tool call (ResponseFunctionToolCall) - use OpenAI tool call format | ||
| tool_name = getattr(output, 'name', 'unknown_tool') | ||
| arguments = getattr(output, 'arguments', '{}') | ||
| tool_call_id = getattr(output, 'call_id', f"call_{i}") | ||
|
|
||
| # Set completion with tool call following OpenAI format | ||
| otel_span.set_attribute(f"{GenAIAttributes.GEN_AI_COMPLETION}.{i}.role", "assistant") | ||
| otel_span.set_attribute( | ||
| f"{GenAIAttributes.GEN_AI_COMPLETION}.{i}.finish_reason", "tool_calls") | ||
| otel_span.set_attribute( | ||
| f"{GenAIAttributes.GEN_AI_COMPLETION}.{i}.tool_calls.0.name", tool_name) | ||
| otel_span.set_attribute( | ||
| f"{GenAIAttributes.GEN_AI_COMPLETION}.{i}.tool_calls.0.arguments", arguments) | ||
| otel_span.set_attribute( | ||
| f"{GenAIAttributes.GEN_AI_COMPLETION}.{i}.tool_calls.0.id", tool_call_id) | ||
|
|
||
| elif hasattr(output, 'text'): | ||
| # Direct text content | ||
| otel_span.set_attribute(f"{GenAIAttributes.GEN_AI_COMPLETION}.{i}.content", output.text) | ||
| otel_span.set_attribute( | ||
| f"{GenAIAttributes.GEN_AI_COMPLETION}.{i}.role", getattr( | ||
| output, 'role', 'assistant')) | ||
|
|
||
| # Add finish reason if available (for non-tool-call cases) | ||
| if hasattr(response, 'finish_reason') and not hasattr(output, 'name'): | ||
| otel_span.set_attribute( | ||
| f"{GenAIAttributes.GEN_AI_COMPLETION}.{i}.finish_reason", response.finish_reason) | ||
|
|
||
| # Extract usage data and add directly to response span | ||
| if hasattr(response, 'usage') and response.usage: | ||
| usage = response.usage | ||
| # Try both naming conventions: input_tokens/output_tokens and prompt_tokens/completion_tokens | ||
| if hasattr(usage, 'input_tokens') and usage.input_tokens is not None: | ||
| otel_span.set_attribute(GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS, usage.input_tokens) | ||
| elif hasattr(usage, 'prompt_tokens') and usage.prompt_tokens is not None: | ||
| otel_span.set_attribute(GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS, usage.prompt_tokens) | ||
|
|
||
| if hasattr(usage, 'output_tokens') and usage.output_tokens is not None: | ||
| otel_span.set_attribute(GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS, usage.output_tokens) | ||
| elif hasattr(usage, 'completion_tokens') and usage.completion_tokens is not None: | ||
| otel_span.set_attribute(GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS, usage.completion_tokens) | ||
|
|
||
| if hasattr(usage, 'total_tokens') and usage.total_tokens is not None: | ||
| otel_span.set_attribute(SpanAttributes.LLM_USAGE_TOTAL_TOKENS, usage.total_tokens) | ||
|
|
||
| # Check for frequency_penalty | ||
| if hasattr(response, 'frequency_penalty') and response.frequency_penalty is not None: | ||
| model_settings['frequency_penalty'] = response.frequency_penalty | ||
|
|
||
| # Store model settings to add to the agent span (but NOT prompts/completions) | ||
| model_settings = _extract_response_attributes(otel_span, response) | ||
| self._last_model_settings = model_settings | ||
|
|
||
| elif _has_realtime_spans and SpeechSpanData and isinstance(span_data, SpeechSpanData): | ||
| input_text = getattr(span_data, 'input', None) | ||
| if input_text: | ||
| otel_span.set_attribute(f"{GenAIAttributes.GEN_AI_PROMPT}.0.content", input_text) | ||
| otel_span.set_attribute(f"{GenAIAttributes.GEN_AI_PROMPT}.0.role", "user") | ||
|
|
||
| output_audio = getattr(span_data, 'output', None) | ||
| if output_audio: | ||
| if isinstance(output_audio, (bytes, bytearray)): | ||
| otel_span.set_attribute("gen_ai.speech.output_bytes", len(output_audio)) | ||
| else: | ||
| otel_span.set_attribute(f"{GenAIAttributes.GEN_AI_COMPLETION}.0.content", str(output_audio)) | ||
|
|
||
| first_content_at = getattr(span_data, 'first_content_at', None) | ||
| if first_content_at: | ||
| otel_span.set_attribute("gen_ai.speech.first_content_at", first_content_at) | ||
|
|
||
| elif _has_realtime_spans and TranscriptionSpanData and isinstance(span_data, TranscriptionSpanData): | ||
| input_audio = getattr(span_data, 'input', None) | ||
| if input_audio: | ||
| if isinstance(input_audio, (bytes, bytearray)): | ||
| otel_span.set_attribute("gen_ai.transcription.input_bytes", len(input_audio)) | ||
| else: | ||
| otel_span.set_attribute(f"{GenAIAttributes.GEN_AI_PROMPT}.0.content", str(input_audio)) | ||
|
|
||
| output_text = getattr(span_data, 'output', None) | ||
| if output_text: | ||
| otel_span.set_attribute(f"{GenAIAttributes.GEN_AI_COMPLETION}.0.content", output_text) | ||
| otel_span.set_attribute(f"{GenAIAttributes.GEN_AI_COMPLETION}.0.role", "assistant") | ||
|
|
||
| elif _has_realtime_spans and SpeechGroupSpanData and isinstance(span_data, SpeechGroupSpanData): | ||
| input_text = getattr(span_data, 'input', None) | ||
| if input_text: | ||
| otel_span.set_attribute(f"{GenAIAttributes.GEN_AI_PROMPT}.0.content", input_text) | ||
| otel_span.set_attribute(f"{GenAIAttributes.GEN_AI_PROMPT}.0.role", "user") | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Realtime span handling is unreachable due to overly broad legacy fallback condition.
The legacy fallback at line 580 (elif span_data:) matches ANY span with data, preventing the realtime-specific handlers (lines 589-624) from ever executing. This means SpeechSpanData, TranscriptionSpanData, and SpeechGroupSpanData will be processed by the generic legacy code instead of their specialized handlers, resulting in missing or incorrect attributes.
Reorder the conditions so specific realtime checks come before the broad legacy fallback.
🔧 Proposed fix
if response:
model_settings = _extract_response_attributes(otel_span, response)
self._last_model_settings = model_settings
- # Legacy fallback for other span types
- elif span_data:
- input_data = getattr(span_data, 'input', [])
- _extract_prompt_attributes(otel_span, input_data)
-
- response = getattr(span_data, 'response', None)
- if response:
- model_settings = _extract_response_attributes(otel_span, response)
- self._last_model_settings = model_settings
-
- elif _has_realtime_spans and SpeechSpanData and isinstance(span_data, SpeechSpanData):
+ elif _has_realtime_spans and SpeechSpanData and isinstance(span_data, SpeechSpanData):
input_text = getattr(span_data, 'input', None)
if input_text:
otel_span.set_attribute(f"{GenAIAttributes.GEN_AI_PROMPT}.0.content", input_text)
otel_span.set_attribute(f"{GenAIAttributes.GEN_AI_PROMPT}.0.role", "user")
output_audio = getattr(span_data, 'output', None)
if output_audio:
if isinstance(output_audio, (bytes, bytearray)):
otel_span.set_attribute("gen_ai.speech.output_bytes", len(output_audio))
else:
otel_span.set_attribute(f"{GenAIAttributes.GEN_AI_COMPLETION}.0.content", str(output_audio))
first_content_at = getattr(span_data, 'first_content_at', None)
if first_content_at:
otel_span.set_attribute("gen_ai.speech.first_content_at", first_content_at)
elif _has_realtime_spans and TranscriptionSpanData and isinstance(span_data, TranscriptionSpanData):
input_audio = getattr(span_data, 'input', None)
if input_audio:
if isinstance(input_audio, (bytes, bytearray)):
otel_span.set_attribute("gen_ai.transcription.input_bytes", len(input_audio))
else:
otel_span.set_attribute(f"{GenAIAttributes.GEN_AI_PROMPT}.0.content", str(input_audio))
output_text = getattr(span_data, 'output', None)
if output_text:
otel_span.set_attribute(f"{GenAIAttributes.GEN_AI_COMPLETION}.0.content", output_text)
otel_span.set_attribute(f"{GenAIAttributes.GEN_AI_COMPLETION}.0.role", "assistant")
elif _has_realtime_spans and SpeechGroupSpanData and isinstance(span_data, SpeechGroupSpanData):
input_text = getattr(span_data, 'input', None)
if input_text:
otel_span.set_attribute(f"{GenAIAttributes.GEN_AI_PROMPT}.0.content", input_text)
otel_span.set_attribute(f"{GenAIAttributes.GEN_AI_PROMPT}.0.role", "user")
+ # Legacy fallback for other span types
+ elif span_data:
+ input_data = getattr(span_data, 'input', [])
+ _extract_prompt_attributes(otel_span, input_data)
+
+ response = getattr(span_data, 'response', None)
+ if response:
+ model_settings = _extract_response_attributes(otel_span, response)
+ self._last_model_settings = model_settings
+
elif span_data and type(span_data).__name__ == 'AgentSpanData':| self.agent_spans: Dict[str, Span] = {} | ||
| self.tool_spans: Dict[str, Span] = {} | ||
| self.audio_spans: Dict[str, Span] = {} | ||
| self.span_contexts: Dict[str, Any] = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The llm_spans dictionary remains unused.
As previously noted, llm_spans is initialized but never populated or accessed anywhere in the code. LLM spans are created and immediately ended in create_llm_span() (line 238-262), so tracking is unnecessary.
🧹 Proposed fix
self.tool_spans: Dict[str, Span] = {}
self.audio_spans: Dict[str, Span] = {}
- self.llm_spans: Dict[str, Span] = {}
self.span_contexts: Dict[str, Any] = {}Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In
@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py
at line 29, The llm_spans dictionary is initialized but never used; remove the
unused attribute and any references to it to avoid dead state: delete the
self.llm_spans: Dict[str, Any] = {} initialization and ensure create_llm_span
(and any other methods referencing llm_spans) do not rely on it since spans are
created and ended immediately in create_llm_span; if tracking is desired
instead, implement proper population and use of llm_spans within create_llm_span
and related logic, otherwise remove the unused symbol entirely.
| if hasattr(RealtimeSession, 'send_message'): | ||
| _original_methods['send_message'] = RealtimeSession.send_message | ||
|
|
||
| _tracing_states: Dict[int, RealtimeTracingState] = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_tracing_states dictionary may leak memory across wrap/unwrap cycles.
The _tracing_states dictionary is defined inside wrap_realtime_session() and captured in the closure of the wrapper functions. If wrap_realtime_session() is called multiple times after unwrap_realtime_session(), each invocation creates a new _tracing_states dictionary, but the old ones may not be garbage collected if there are lingering references from previous wrappers.
Consider defining _tracing_states at module level (similar to _original_methods) and explicitly clearing it in unwrap_realtime_session().
♻️ Proposed fix
_original_methods: Dict[str, Any] = {}
+_tracing_states: Dict[int, Any] = {}
def wrap_realtime_session(tracer: Tracer):
"""Wrap the RealtimeSession class to add OpenTelemetry tracing."""
# Guard against multiple wrapping - don't overwrite originals with wrapped methods
if _original_methods:
return
try:
from agents.realtime.session import RealtimeSession
except ImportError:
return
_original_methods['__aenter__'] = RealtimeSession.__aenter__
_original_methods['__aexit__'] = RealtimeSession.__aexit__
_original_methods['_put_event'] = RealtimeSession._put_event
if hasattr(RealtimeSession, 'send_message'):
_original_methods['send_message'] = RealtimeSession.send_message
- _tracing_states: Dict[int, RealtimeTracingState] = {}
@dont_throwThen in unwrap_realtime_session():
if 'send_message' in _original_methods:
RealtimeSession.send_message = _original_methods['send_message']
_original_methods.clear()
+ _tracing_states.clear()Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In
@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py
at line 308, The _tracing_states dict currently created inside
wrap_realtime_session() can leak across wrap/unwrap cycles; move its declaration
out to module scope alongside _original_methods (name: _tracing_states:
Dict[int, RealtimeTracingState]) so all wrappers share the same mapping, then
update unwrap_realtime_session() to explicitly clear _tracing_states (e.g.,
_tracing_states.clear()) when restoring original methods to avoid leftover
references; adjust any references inside wrap_realtime_session() and the wrapper
functions to use the module-level _tracing_states.
| def test_speech_span_start_creates_otel_span(self, tracer_provider_and_exporter): | ||
| """Test that SpeechSpanData creates an OpenTelemetry span.""" | ||
| from opentelemetry.instrumentation.openai_agents._hooks import ( | ||
| OpenTelemetryTracingProcessor, | ||
| ) | ||
|
|
||
| provider, exporter = tracer_provider_and_exporter | ||
| tracer = provider.get_tracer("test") | ||
| processor = OpenTelemetryTracingProcessor(tracer) | ||
|
|
||
| # Create a mock trace first | ||
| mock_trace = MagicMock() | ||
| mock_trace.trace_id = "test-trace-123" | ||
| processor.on_trace_start(mock_trace) | ||
|
|
||
| # Mock the span data with type name matching | ||
| speech_data = MockSpeechSpanData( | ||
| model="tts-1", | ||
| output_format="mp3", | ||
| input="Hello, world!", | ||
| ) | ||
| speech_data.__class__.__name__ = "SpeechSpanData" | ||
|
|
||
| mock_span = MockAgentSpan(speech_data, trace_id="test-trace-123") | ||
|
|
||
| # Patch the module-level variables directly | ||
| import opentelemetry.instrumentation.openai_agents._hooks as hooks_module | ||
| with patch.object(hooks_module, '_has_realtime_spans', True), \ | ||
| patch.object(hooks_module, 'SpeechSpanData', MockSpeechSpanData), \ | ||
| patch.object(hooks_module, 'TranscriptionSpanData', MockTranscriptionSpanData), \ | ||
| patch.object(hooks_module, 'SpeechGroupSpanData', MockSpeechGroupSpanData): | ||
| processor.on_span_start(mock_span) | ||
| processor.on_span_end(mock_span) | ||
|
|
||
| # End the trace | ||
| processor.on_trace_end(mock_trace) | ||
|
|
||
| spans = exporter.get_finished_spans() | ||
| span_names = [s.name for s in spans] | ||
|
|
||
| assert "Agent Workflow" in span_names | ||
| assert "openai.realtime.speech" in span_names | ||
|
|
||
| speech_span = next(s for s in spans if s.name == "openai.realtime.speech") | ||
| assert speech_span.attributes[SpanAttributes.LLM_REQUEST_TYPE] == "realtime" | ||
| assert speech_span.attributes["gen_ai.system"] == "openai" | ||
| assert speech_span.attributes["gen_ai.operation.name"] == "speech" | ||
| assert speech_span.status.status_code == StatusCode.OK |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Tests may not detect the unreachable realtime span handling in on_span_end.
The test verifies span creation and attributes, but the critical bug in _hooks.py (line 580) means realtime-specific attribute extraction in on_span_end is unreachable. The tests pass because:
- Span creation and basic attributes are set in
on_span_start(lines 449-520 in_hooks.py) - The realtime-specific attributes that should be set in
on_span_end(lines 589-624 in_hooks.py) are never checked
Add test assertions that verify realtime-specific attributes set during on_span_end, such as:
gen_ai.prompt.0.contentfor input text (line 592 in_hooks.py)gen_ai.speech.output_bytesfor audio output (line 598 in_hooks.py)gen_ai.speech.first_content_atfor timing (line 604 in_hooks.py)
🧪 Proposed additional test assertions
speech_span = next(s for s in spans if s.name == "openai.realtime.speech")
assert speech_span.attributes[SpanAttributes.LLM_REQUEST_TYPE] == "realtime"
assert speech_span.attributes["gen_ai.system"] == "openai"
assert speech_span.attributes["gen_ai.operation.name"] == "speech"
assert speech_span.status.status_code == StatusCode.OK
+ # Verify on_span_end set these attributes
+ assert speech_span.attributes.get(f"{GenAIAttributes.GEN_AI_PROMPT}.0.content") == "Hello, world!"
+ assert speech_span.attributes.get(f"{GenAIAttributes.GEN_AI_PROMPT}.0.role") == "user"There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important
Looks good to me! 👍
Reviewed 5325e3f in 2 minutes and 38 seconds. Click for details.
- Reviewed
65lines of code in2files - Skipped
0files when reviewing. - Skipped posting
5draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py:40
- Draft comment:
Document the default 'model_name' value and its override mechanism to ensure consistency with session configuration. - Reason this comment was not posted:
Confidence changes required:33%<= threshold50%None
2. packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py:244
- Draft comment:
Good improvement—using 'self.model_name' for the GEN_AI_REQUEST_MODEL attribute allows configuration overrides. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
3. packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py:322
- Draft comment:
Consider handling cases where 'config' is a dict rather than an object; using getattr may fail. Using config.get('model_settings') (or similar) might be more robust. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% The comment raises a valid point about dict vs object handling. However, I need to consider: 1) The entire block is wrapped in a try-except that catches all exceptions, so ifconfigis a dict andgetattrfails, it will be caught silently. 2) This is defensive code trying to extract model name from various possible locations. 3) The comment suggests a code improvement but the current code won't crash due to the exception handler. 4) The comment is speculative ("may fail", "might be more robust") rather than pointing to a definite bug. 5) According to the rules, speculative comments like "If X, then Y is an issue" should be removed unless it's definitely an issue. The code is already protected by a try-except block that catches all exceptions, so even ifconfigis a dict andgetattrfails, the code won't crash. The comment is suggesting a code quality improvement rather than pointing out a definite bug. This might be considered speculative since it says "may fail" and "might be more robust". While the try-except does prevent crashes, the comment is suggesting a more robust approach that would actually work for both dicts and objects. However, the comment uses speculative language ("may fail", "might be more robust") and doesn't demonstrate that this is definitely a problem. Without evidence thatconfigis actually a dict in practice, this is a speculative suggestion for a refactor. The comment is speculative and suggests a code quality improvement rather than identifying a definite bug. The existing try-except block already handles any potential failures. According to the rules, speculative comments should be removed unless there's strong evidence of a definite issue.
4. packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py:339
- Draft comment:
Consider logging exception details in the traced_aenter block instead of silently swallowing errors; this would aid in debugging potential issues with model name extraction. - Reason this comment was not posted:
Comment was on unchanged code.
5. packages/sample-app/sample_app/openai_agents_realtime_example.py:54
- Draft comment:
Removing the unnecessary '# noqa' comments on import statements cleans up the code. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
Workflow ID: wflow_ro6nmpVV0WODpPT2
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important
Looks good to me! 👍
Reviewed b3bc328 in 38 seconds. Click for details.
- Reviewed
19lines of code in1files - Skipped
0files when reviewing. - Skipped posting
2draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/sample-app/sample_app/openai_agents_realtime_example.py:629
- Draft comment:
Add '# noqa: F401' to suppress unused import warnings; consider adding a brief comment explaining that this import is meant solely for dependency checking. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
2. packages/sample-app/sample_app/openai_agents_realtime_example.py:634
- Draft comment:
Using '# noqa: F401' suppresses unused import warnings; a short inline note on its purpose (dependency check) would be helpful. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
Workflow ID: wflow_o0N4A4OiprupApfl
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In
@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py:
- Line 309: _tracing_states is currently defined inside wrap_realtime_session()
causing a closure and potential memory leaks; move the _tracing_states:
Dict[int, RealtimeTracingState] = {} declaration to module level alongside
_original_methods, remove its local declaration inside wrap_realtime_session(),
and update unwrap_realtime_session() to explicitly clear _tracing_states (e.g.,
_tracing_states.clear()) when restoring originals so old state dictionaries can
be garbage collected.
🧹 Nitpick comments (2)
packages/sample-app/sample_app/openai_agents_realtime_example.py (1)
624-650: Consider re-adding linter suppressions for intentional import-only checks.The imports on lines 629 and 634 are intentionally unused—they exist solely to verify package availability via try-except. While the previous review requested removing
# noqa: F401directives, Flake8 now correctly flags them as unused (F401).For clarity and to document intent, consider re-adding the suppressions:
📝 Suggested clarification
try: - import agents + import agents # noqa: F401 - import check only except ImportError: missing.append("openai-agents") try: - from agents.realtime import RealtimeAgent + from agents.realtime import RealtimeAgent # noqa: F401 - import check only except ImportError: missing.append("openai-agents (realtime module)")packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py (1)
205-218: Optional: Consider prefixing unusedroleparameters.The
roleparameter inrecord_prompt()andrecord_completion()is unused (flagged by ARG002). While this is likely intentional for API consistency or future extensibility, consider prefixing with an underscore to document the intent:📝 Optional cleanup
- def record_prompt(self, role: str, content: str): + def record_prompt(self, _role: str, content: str): """Record a prompt message - buffers it for the LLM span.""" - def record_completion(self, role: str, content: str): + def record_completion(self, _role: str, content: str): """Record a completion message - creates an LLM span with prompt and completion."""
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.pypackages/sample-app/sample_app/openai_agents_realtime_example.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules
Files:
packages/sample-app/sample_app/openai_agents_realtime_example.pypackages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py
🧠 Learnings (2)
📓 Common learnings
Learnt from: duanyutong
Repo: traceloop/openllmetry PR: 3487
File: packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/utils.py:177-178
Timestamp: 2025-12-02T21:09:48.690Z
Learning: The opentelemetry-instrumentation-openai and opentelemetry-instrumentation-openai-agents packages must remain independent and not share code, so code duplication between them is acceptable.
📚 Learning: 2025-12-02T21:09:48.690Z
Learnt from: duanyutong
Repo: traceloop/openllmetry PR: 3487
File: packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/utils.py:177-178
Timestamp: 2025-12-02T21:09:48.690Z
Learning: The opentelemetry-instrumentation-openai and opentelemetry-instrumentation-openai-agents packages must remain independent and not share code, so code duplication between them is acceptable.
Applied to files:
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py
🪛 Flake8 (7.3.0)
packages/sample-app/sample_app/openai_agents_realtime_example.py
[error] 629-629: 'agents' imported but unused
(F401)
[error] 634-634: 'agents.realtime.RealtimeAgent' imported but unused
(F401)
🪛 Ruff (0.14.10)
packages/sample-app/sample_app/openai_agents_realtime_example.py
231-231: Do not catch blind exception: Exception
(BLE001)
306-306: Avoid specifying long messages outside the exception class
(TRY003)
311-311: Avoid specifying long messages outside the exception class
(TRY003)
316-316: Avoid specifying long messages outside the exception class
(TRY003)
322-322: Avoid specifying long messages outside the exception class
(TRY003)
328-328: Prefer TypeError exception for invalid type
(TRY004)
328-328: Avoid specifying long messages outside the exception class
(TRY003)
333-333: Consider moving this statement to an else block
(TRY300)
462-462: Do not catch blind exception: Exception
(BLE001)
615-615: Do not catch blind exception: Exception
(BLE001)
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py
205-205: Unused method argument: role
(ARG002)
214-214: Unused method argument: role
(ARG002)
339-340: try-except-pass detected, consider logging the exception
(S110)
339-339: Do not catch blind exception: Exception
(BLE001)
357-358: try-except-pass detected, consider logging the exception
(S110)
357-357: Do not catch blind exception: Exception
(BLE001)
459-460: try-except-pass detected, consider logging the exception
(S110)
459-459: Do not catch blind exception: Exception
(BLE001)
489-490: try-except-pass detected, consider logging the exception
(S110)
489-489: Do not catch blind exception: Exception
(BLE001)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Test Packages (3.12)
- GitHub Check: Lint
- GitHub Check: Build Packages (3.11)
🔇 Additional comments (15)
packages/sample-app/sample_app/openai_agents_realtime_example.py (5)
1-41: LGTM!The module documentation is comprehensive, imports are appropriate, and Traceloop initialization follows standard patterns for OpenTelemetry instrumentation.
43-243: LGTM!The voice demo implementation is well-structured with clear tool definitions, comprehensive event handling, and appropriate error management for sample code. The broad exception catching on line 231 is acceptable in a demonstration context.
284-335: Excellent implementation of safe expression evaluation!The
calculate()function properly addresses the previous security concern by using AST parsing with:
- Operator allowlist (excluding exponentiation to prevent resource exhaustion)
- Depth limit to prevent deeply nested expressions
- Proper node type validation
The implementation is secure and appropriate for the sample app.
475-621: LGTM!The handoff demo effectively demonstrates multi-agent workflows with clear agent hierarchy and handoff routing. The implementation is appropriate for a sample application.
653-684: LGTM!The main block provides a clean CLI interface with clear usage guidance and appropriate error handling.
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py (10)
1-18: LGTM!The module documentation clearly explains the wrapper approach, and the setup follows proper patterns with
_original_methodsat module level for tracking original implementations.
20-106: LGTM!The
RealtimeTracingStateinitialization and workflow/agent span management are well-structured. The default model name on line 40 is appropriately documented as being updated from session configuration.
108-195: LGTM!Tool, handoff, and audio span management properly handles span lifecycle, parent context propagation, and includes appropriate safeguards like content truncation to 4096 characters.
268-290: LGTM!The
cleanup()method comprehensively handles all remaining spans, properly setting status and clearing all tracking dictionaries. The previous concern aboutllm_spanscleanup has been resolved.
292-307: LGTM on the wrapping guard!The guard on lines 295-296 properly prevents multiple wrapping from overwriting original methods. This addresses the previous concern about re-instrumentation.
312-342: LGTM on session initialization and model capture!The
traced_aenterwrapper properly starts the workflow span and attempts to capture the actual model name from the session configuration (lines 322-338) with multiple fallback paths. This addresses the previous concern about hardcoded model names.The try-except-pass pattern (lines 339-340) is appropriate for instrumentation code to prevent failures from breaking the application.
344-360: LGTM on session cleanup!The
traced_aexitwrapper properly performs cleanup and ends the workflow span with appropriate error status. The try-except-pass pattern is suitable for instrumentation resilience.
362-462: LGTM on comprehensive event handling!The
traced_put_eventwrapper handles a wide variety of event types (agent_start/end, tool_start/end, handoff, audio, error, history_added, raw_model_event) with proper span lifecycle management and attribute capture. The try-except-pass pattern is appropriate for non-breaking instrumentation.
464-492: LGTM on user input capture!The
traced_send_messagewrapper properly captures user input for prompt recording, handling both string and structured message formats. The try-except-pass pattern maintains instrumentation resilience.
501-517: LGTM on method restoration!The
unwrap_realtime_session()function properly restores all original methods and clears the_original_methodsdictionary. Note that this doesn't clear_tracing_states(which is in the closure), but that will be resolved when the module-level refactoring is applied.
| if hasattr(RealtimeSession, 'send_message'): | ||
| _original_methods['send_message'] = RealtimeSession.send_message | ||
|
|
||
| _tracing_states: Dict[int, RealtimeTracingState] = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Memory leak risk: _tracing_states should be at module level.
The _tracing_states dictionary is defined inside wrap_realtime_session() and captured in the closure. Across wrap/unwrap/wrap cycles, old dictionaries may not be garbage collected if lingering references exist from previous wrappers.
Move _tracing_states to module level (like _original_methods) and explicitly clear it in unwrap_realtime_session().
♻️ Proposed fix
At module level (after line 17):
_original_methods: Dict[str, Any] = {}
+_tracing_states: Dict[int, 'RealtimeTracingState'] = {}Remove from wrap_realtime_session() (line 309):
if hasattr(RealtimeSession, 'send_message'):
_original_methods['send_message'] = RealtimeSession.send_message
- _tracing_states: Dict[int, RealtimeTracingState] = {}
-
@dont_throwAdd cleanup in unwrap_realtime_session():
if 'send_message' in _original_methods:
RealtimeSession.send_message = _original_methods['send_message']
_original_methods.clear()
+ _tracing_states.clear()Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In
@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py
at line 309, _tracing_states is currently defined inside wrap_realtime_session()
causing a closure and potential memory leaks; move the _tracing_states:
Dict[int, RealtimeTracingState] = {} declaration to module level alongside
_original_methods, remove its local declaration inside wrap_realtime_session(),
and update unwrap_realtime_session() to explicitly clear _tracing_states (e.g.,
_tracing_states.clear()) when restoring originals so old state dictionaries can
be garbage collected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
packages/sample-app/sample_app/openai_agents_realtime_example.py (1)
284-336: Safe AST-based evaluation implementation is excellent.The calculate function correctly implements AST-based parsing to safely evaluate math expressions, preventing the security issues of
eval(). The depth limit and operator restrictions effectively prevent resource exhaustion attacks.💡 Optional: Use TypeError for unsupported expression types
For line 328, consider using
TypeErrorinstead ofValueErrorfor unsupported expression types, as suggested by Ruff:- else: - raise ValueError(f"Unsupported expression type: {type(node).__name__}") + else: + raise TypeError(f"Unsupported expression type: {type(node).__name__}")This more accurately reflects that the error is about an inappropriate type rather than an inappropriate value.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
packages/sample-app/sample_app/openai_agents_realtime_example.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules
Files:
packages/sample-app/sample_app/openai_agents_realtime_example.py
🧠 Learnings (1)
📓 Common learnings
Learnt from: duanyutong
Repo: traceloop/openllmetry PR: 3487
File: packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/utils.py:177-178
Timestamp: 2025-12-02T21:09:48.690Z
Learning: The opentelemetry-instrumentation-openai and opentelemetry-instrumentation-openai-agents packages must remain independent and not share code, so code duplication between them is acceptable.
🪛 Ruff (0.14.10)
packages/sample-app/sample_app/openai_agents_realtime_example.py
231-231: Do not catch blind exception: Exception
(BLE001)
306-306: Avoid specifying long messages outside the exception class
(TRY003)
311-311: Avoid specifying long messages outside the exception class
(TRY003)
316-316: Avoid specifying long messages outside the exception class
(TRY003)
322-322: Avoid specifying long messages outside the exception class
(TRY003)
328-328: Prefer TypeError exception for invalid type
(TRY004)
328-328: Avoid specifying long messages outside the exception class
(TRY003)
333-333: Consider moving this statement to an else block
(TRY300)
462-462: Do not catch blind exception: Exception
(BLE001)
615-615: Do not catch blind exception: Exception
(BLE001)
629-629: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
634-634: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Test Packages (3.11)
- GitHub Check: Build Packages (3.11)
- GitHub Check: Test Packages (3.10)
- GitHub Check: Lint
🔇 Additional comments (4)
packages/sample-app/sample_app/openai_agents_realtime_example.py (4)
1-41: LGTM!The module documentation is comprehensive, imports are appropriate, and initialization follows best practices. API key is correctly loaded from environment variables rather than hardcoded.
43-243: LGTM!The voice demo implementation is well-structured with proper async context management, defensive event handling using
getattr(), and appropriate error messages. The broad exception catch at line 231 (flagged by Ruff) is acceptable for a sample application where user-friendly error messages are prioritized over granular error handling.
475-622: LGTM!The handoff demo effectively demonstrates agent handoffs in realtime sessions. The agent hierarchy is well-structured with specialized agents and proper handoff configuration using
realtime_handoff(). Event handling appropriately captures handoff events.
653-684: LGTM!The main execution block provides a clean command-line interface with clear help messages, requirement checking, and appropriate error handling. The demo type selection is well-structured and user-friendly.
| if tool_name in self.tool_spans: | ||
| span = self.tool_spans[tool_name] | ||
| if output is not None: | ||
| span.set_attribute("gen_ai.tool.output", str(output)[:4096]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dont we have a const for this?
| "gen_ai.handoff.from_agent": from_agent, | ||
| "gen_ai.handoff.to_agent": to_agent, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets create consts for this
doronkopit5
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i guess you didnt succeed cassetes because of the sockets?
Important
Adds real-time OpenTelemetry tracing support for OpenAI Agents, including new span types and a sample app demonstrating text, voice, and handoff workflows.
__init__.pyand_hooks.py, supporting speech, transcription, and speech_group spans.RealtimeTracingStatein_realtime_wrappers.pyto manage spans for real-time sessions.openai_agents_realtime_example.pydemonstrating text, voice, and handoff workflows.test_realtime.pyandtest_realtime_session.pyto validate real-time span creation, attributes, and error handling.pyproject.tomlforopentelemetry-instrumentation-openai-agentsandsample-appto support new features.This description was created by
for b3bc328. You can customize this summary. It will automatically update as commits are pushed.
Summary by CodeRabbit
New Features
Improvements
Tests
Chores
✏️ Tip: You can customize this high-level summary in your review settings.