Skip to content

Conversation

@galkleinman
Copy link
Contributor

@galkleinman galkleinman commented Jan 7, 2026

Important

Adds real-time OpenTelemetry tracing support for OpenAI Agents, including new span types and a sample app demonstrating text, voice, and handoff workflows.

  • Behavior:
    • Adds real-time OpenTelemetry tracing for agent sessions in __init__.py and _hooks.py, supporting speech, transcription, and speech_group spans.
    • Introduces RealtimeTracingState in _realtime_wrappers.py to manage spans for real-time sessions.
    • Adds a sample app openai_agents_realtime_example.py demonstrating text, voice, and handoff workflows.
  • Tests:
    • Adds tests in test_realtime.py and test_realtime_session.py to validate real-time span creation, attributes, and error handling.
  • Dependencies:
    • Updates dependencies in pyproject.toml for opentelemetry-instrumentation-openai-agents and sample-app to support new features.

This description was created by Ellipsis for b3bc328. You can customize this summary. It will automatically update as commits are pushed.


Summary by CodeRabbit

  • New Features

    • Optional realtime OpenTelemetry tracing for agent sessions (speech, transcription, speech_group) with per-session lifecycle, nested spans, and LLM prompt/completion capture.
    • New realtime sample app demo showcasing text, voice/audio, and agent handoff workflows.
  • Improvements

    • Unified extraction/propagation of prompts, responses, and model settings; attribute keys aligned with GenAI conventions and resilient error handling.
  • Tests

    • Comprehensive realtime test coverage for spans, hierarchies, attributes, and error scenarios.
  • Chores

    • Sample app dependency updates and added development websocket dependency.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Jan 7, 2026

📝 Walkthrough

Walkthrough

Adds optional realtime OpenTelemetry instrumentation for OpenAI Agents SDK sessions, new prompt/response extraction helpers and GenAI-aligned attributes in hooks, session wrappers that manage nested workflow/agent/tool/audio/LLM spans, tests and examples for realtime flows, and a development dependency.

Changes

Cohort / File(s) Summary
Instrumentation entrypoint
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/__init__.py
Calls wrap_realtime_session(tracer) in _instrument and unwrap_realtime_session() in _uninstrument, each wrapped in try/except to tolerate absent realtime runtime.
Hooks & GenAI attributes
packages/opentelemetry-instrumentation-openai-agents/.../_hooks.py
Adds _extract_prompt_attributes and _extract_response_attributes; guarded realtime span-type imports; adds start/end handling for SpeechSpanData, TranscriptionSpanData, SpeechGroupSpanData; migrates attribute keys to GenAIAttributes.* and propagates model settings to agent/parent spans.
Realtime wrappers & tracing state
packages/opentelemetry-instrumentation-openai-agents/.../_realtime_wrappers.py
New module adding RealtimeTracingState, wrap_realtime_session(tracer) to patch RealtimeSession lifecycle and _put_event, and unwrap_realtime_session() to restore originals; manages per-session state, buffers prompts/completions, and creates nested spans (workflow/agent/tool/audio/LLM) with error-tolerant guards.
Realtime span tests
packages/opentelemetry-instrumentation-openai-agents/tests/test_realtime.py
New tests validating realtime span creation (openai.realtime.speech, openai.realtime.transcription, openai.realtime.speech_group), GenAI attributes, model/format capture, error handling, and parent-child relationships under an Agent workflow root.
Realtime session tests
packages/opentelemetry-instrumentation-openai-agents/tests/test_realtime_session.py
New tests exercising RealtimeTracingState, wrap/unwrap behavior, buffering and LLM span creation, lifecycle of workflow/agent/tool/audio spans, handoffs, error recording, and cleanup.
Dev dependency
packages/opentelemetry-instrumentation-openai-agents/pyproject.toml
Adds development dependency websockets = "^15.0.1".
Sample app example
packages/sample-app/sample_app/openai_agents_realtime_example.py
New example script demonstrating realtime demos (text-only, voice/audio, handoffs), session setup, event handling, and logging; includes requirement checks and demo runners.
Sample app dependencies
packages/sample-app/pyproject.toml
Bumps multiple dependency versions (notably openai, openai-agents, llama-index, and related packages).

Sequence Diagram(s)

sequenceDiagram
    participant App as Application
    participant RT as RealtimeSession (wrapped)
    participant State as RealtimeTracingState
    participant Tracer as OT Tracer
    participant Exporter as Span Exporter

    Note over App,RT: Session enter (wrapper applied)
    App->>RT: __aenter__()
    RT->>State: init tracing state
    State->>Tracer: start workflow span

    Note over App,RT: Events routed via _put_event
    App->>RT: _put_event(agent_start)
    RT->>State: start_agent_span()
    State->>Tracer: start agent span

    App->>RT: _put_event(audio)
    RT->>State: start_audio_span()
    State->>Tracer: start audio span

    App->>RT: _put_event(raw_model_event)
    RT->>State: record_prompt()/record_completion()
    State->>State: buffer prompt/completion
    State->>Tracer: create LLM span (with buffered data)
    Tracer->>Exporter: export LLM span

    App->>RT: _put_event(agent_end)
    RT->>State: end_agent_span()
    State->>Tracer: end agent span

    App->>RT: __aexit__()
    RT->>State: cleanup()
    State->>Tracer: end workflow span
    Tracer->>Exporter: export spans
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I hopped through realtime events, nimble and spry,

I buffered prompts and completions on the fly,
I stitched workflow, agent, tool, audio in a line,
Spans hum the tale — each prompt, error, and sign,
Carrot-coded telemetry, glowing fine.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding realtime support to the OpenAI Agents instrumentation package, which is the primary objective of the PR.
Docstring Coverage ✅ Passed Docstring coverage is 87.50% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Comment @coderabbitai help to get the list of available commands and usage tips.

@galkleinman galkleinman marked this pull request as ready for review January 7, 2026 21:19
Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

Looks good to me! 👍

Reviewed everything up to 30f3bc4 in 2 minutes and 11 seconds. Click for details.
  • Reviewed 2199 lines of code in 8 files
  • Skipped 2 files when reviewing.
  • Skipped posting 5 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/__init__.py:44
  • Draft comment:
    Realtime instrumentation errors are silently ignored. Consider logging exception details for better diagnostics instead of using bare 'except Exception: pass'.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 30% vs. threshold = 50% The comment is about code that was added in the diff (lines 49-51), so it is about changes. However, this is a code quality suggestion about error handling patterns. The existing code already uses the same silent exception handling pattern (lines 40-42), so this isn't introducing a new anti-pattern - it's following the established convention in this file. The comment doesn't point to a bug or clear issue, it's suggesting a different approach to error handling. According to the rules, "Comments that suggest code quality refactors are good! But only if they are actionable and clear." This is actionable and clear, but it's also inconsistent to only flag the new code when the existing code uses the same pattern. This feels like a style preference rather than a clear code issue. The comment is technically correct that logging would provide better diagnostics. The suggestion is actionable and clear. If this is an instrumentation library, having visibility into why instrumentation failed could be valuable for debugging. Just because the existing code uses the same pattern doesn't mean the pattern is correct. While the suggestion has merit, the rules state to avoid comments that are "obvious or unimportant" and to focus on clear code changes required. This is more of a general code quality improvement that could apply to multiple places in the file. The comment singles out only the new code while ignoring the identical pattern in existing code (lines 40-42), which suggests this isn't a critical issue. If it were truly important, it should apply to all instances. This is a reasonable code quality suggestion, but it's not a clear bug or required change. The new code follows the existing pattern in the file. Since the comment only targets new code while ignoring identical existing patterns, and it's more of a style/quality preference than a clear issue, it should be deleted.
2. packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py:236
  • Draft comment:
    Realtime span creation for SpeechSpanData, TranscriptionSpanData, and SpeechGroupSpanData share similar logic. Consider refactoring common code to reduce duplication.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 35% vs. threshold = 50% This is a code quality refactoring suggestion. The rules state that "Comments that suggest code quality refactors are good! But only if they are actionable and clear." The comment is actionable - the developer could extract a helper method. However, I need to consider: 1) Is this duplication significant enough to warrant refactoring? The blocks are relatively short and have meaningful differences. 2) Is this the right time for this comment? The PR is adding new functionality (realtime span support), and asking for refactoring during feature addition might not be appropriate. 3) The duplication is limited to about 3 blocks with subtle differences - this is borderline whether it's worth refactoring. The comment doesn't provide specific guidance on how to refactor, just points out the duplication exists. While there is some duplication, the blocks are relatively short and each has unique characteristics. Refactoring might actually make the code harder to read by abstracting away the differences. Also, this is a "nice to have" suggestion rather than identifying a bug or serious issue. The rules emphasize removing comments unless there's STRONG EVIDENCE they're correct and necessary. The comment is technically correct that there is duplication, and it is actionable. However, it's a minor code quality suggestion on newly added code, and the duplication isn't severe enough to clearly warrant immediate action. The PR author made a reasonable choice to keep the code explicit and readable for these three different span types. This falls into the category of "not clearly necessary" feedback. This is a borderline case, but I lean toward removing it. The comment is a minor code quality suggestion on new functionality, and the duplication isn't severe. The author's approach of keeping the three handlers separate and explicit is reasonable and maintainable. This doesn't meet the bar of "STRONG EVIDENCE that the comment is correct and necessary."
3. packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py:316
  • Draft comment:
    Exceptions in traced_aenter are caught silently. It’s advisable to log exception details to aid debugging.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% This is a code quality suggestion about error handling. The comment points out that exceptions are being caught and silently ignored, which could make debugging difficult. However, I need to consider: 1) The function is already decorated with @dont_throw which likely handles exceptions at a higher level, 2) This is instrumentation code that wraps external library code, so silent failures might be intentional to avoid breaking the instrumented application, 3) The comment is suggesting a code improvement (adding logging) rather than pointing out a bug. Looking at the pattern, all the wrapped functions (traced_aenter, traced_aexit, traced_put_event, traced_send_message) use the same pattern of @dont_throw decorator plus internal try-except with pass. This appears to be an intentional design pattern for defensive instrumentation. The comment is making a reasonable code quality suggestion about observability. However, this appears to be an intentional design pattern used consistently throughout the file for defensive instrumentation. The @dont_throw decorator plus silent exception handling is likely meant to ensure instrumentation never breaks the application. Without seeing the implementation of @dont_throw or understanding the broader instrumentation strategy, I can't be certain this is a problem. While the suggestion has merit for general code quality, this is instrumentation code where silent failures are often intentional. The consistent pattern across all wrapped methods suggests this is a deliberate design choice. The comment doesn't provide strong evidence that this is actually causing a problem or that the current approach is incorrect. This comment should be deleted. It's a speculative code quality suggestion about adding logging, but the silent exception handling appears to be an intentional design pattern for defensive instrumentation. There's no evidence of an actual bug or problem, and the rules state not to make comments unless there's clearly a code change required.
4. packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py:327
  • Draft comment:
    Similarly, traced_aexit and traced_put_event suppress errors without logging. Consider adding logging to capture error details rather than silently passing.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% This is a new file being added (all lines are new). The comment is suggesting a code quality improvement - adding logging instead of silently suppressing errors. However, looking at the rules: 1) This is about code quality/observability, which could be good if actionable and clear. 2) The code uses both @dont_throw decorator AND bare except blocks, suggesting intentional error suppression. 3) This is instrumentation code that wraps external library methods - silent error suppression is often intentional in instrumentation to avoid breaking the instrumented code. 4) The comment doesn't point to a bug or clear issue, just suggests "consider adding logging" which is speculative about what would be better. 5) Without seeing the dont_throw decorator implementation or understanding the broader error handling strategy, it's hard to say if this is definitely correct. The instrumentation code appears to intentionally suppress errors to avoid breaking the wrapped functionality. The @dont_throw decorator is explicitly imported and used, suggesting this is a deliberate design choice. The comment is speculative ("consider") rather than pointing to a definite issue. While the error suppression appears intentional for instrumentation code, the comment does raise a valid observability concern. However, without understanding the full error handling strategy (what dont_throw does) and whether this is a standard pattern in this codebase, I cannot be confident this is the right change. The comment is also not clearly actionable - it says "consider" which violates the rule about making definite suggestions. This comment should be deleted. It's a speculative suggestion ("consider") about code quality rather than identifying a clear bug. The error suppression appears intentional for instrumentation code, and without more context about the codebase's error handling patterns, there's no strong evidence this change is needed.
5. packages/sample-app/sample_app/openai_agents_realtime_example.py:583
  • Draft comment:
    The check_requirements function relies on print statements for missing dependencies. Consider raising exceptions or logging warnings to improve error handling in production scenarios.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% This is a sample/demo application, not production code. The comment is suggesting a refactor for "production scenarios" but this is explicitly a demo/example file. The use of print statements for a CLI script is completely appropriate and standard. The comment is making speculative suggestions about production use cases that don't apply here. According to the rules, I should not keep comments that are speculative or suggest refactors that aren't clearly necessary. This comment also doesn't point to any actual bug or issue - it's just a style preference that doesn't apply to demo code. Could this be a legitimate concern about code quality even in a sample app? Maybe the author wants to demonstrate best practices even in examples. However, the comment specifically mentions "production scenarios" which suggests it acknowledges this isn't production code. While demonstrating best practices in examples can be valuable, the comment explicitly frames this as a concern for "production scenarios," acknowledging this isn't production code. For a CLI demo script, print statements are the standard and appropriate approach. The comment doesn't identify any actual problem with the current implementation for its intended use case. This comment should be deleted. It's suggesting a refactor for "production scenarios" on what is clearly a sample/demo application where print statements are appropriate for CLI usage. The comment is speculative and doesn't identify any actual issue with the code as written for its intended purpose.

Workflow ID: wflow_9pqYSMjSLLNalYks

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In @packages/sample-app/sample_app/openai_agents_realtime_example.py:
- Around line 284-294: The calculate function currently uses eval(expression)
which is unsafe; replace it with a safe parser/evaluator that uses Python's ast
module to parse the expression and evaluates only allowed node types (numbers,
BinOp with ast.Add, ast.Sub, ast.Mult, ast.Div, and parenthesis) using a mapped
operator table, explicitly disallow exponentiation (Pow) and other nodes, and
catch/return errors (including overly deep ASTs) instead of calling eval; modify
calculate to call this safe evaluator and keep the original input validation and
error handling around it.
🧹 Nitpick comments (5)
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py (1)

672-714: LGTM!

The realtime span data extraction correctly handles different data types:

  • Audio bytes are stored by length (avoiding raw bytes in attributes)
  • Text content is properly mapped to prompt/completion attributes
  • The first_content_at attribute enables latency tracking

Note: Content is not truncated here, unlike the [:4096] limit applied in _realtime_wrappers.py. Consider adding truncation for consistency to prevent excessively large span attributes.

packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py (2)

203-216: Consider prefixing unused role parameters with underscore.

The role parameter in record_prompt and record_completion is not used in the method bodies (hardcoded to "user" and "assistant" respectively). If this is intentional, prefix with underscore to signal intent.

♻️ Suggested fix
-    def record_prompt(self, role: str, content: str):
+    def record_prompt(self, _role: str, content: str):
         """Record a prompt message - buffers it for the LLM span."""
         ...

-    def record_completion(self, role: str, content: str):
+    def record_completion(self, _role: str, content: str):
         """Record a completion message - creates an LLM span with prompt and completion."""
         ...

316-323: Consider adding debug-level logging for silently caught exceptions.

The try/except/pass blocks (lines 321-322, 339-340, 444-445, 474-475) intentionally swallow exceptions to prevent instrumentation from breaking user code. While this is a valid pattern for instrumentation, it makes debugging difficult when issues occur.

Consider adding optional debug-level logging to aid troubleshooting:

♻️ Example improvement
import logging
logger = logging.getLogger(__name__)

# Then in the exception handlers:
except Exception as e:
    logger.debug("Realtime instrumentation error: %s", e, exc_info=True)
packages/sample-app/sample_app/openai_agents_realtime_example.py (1)

588-595: Remove unused noqa directives.

Static analysis indicates the # noqa: F401 comments on lines 588 and 593 are unnecessary. The imports are used for existence checks, but the linter isn't flagging them.

♻️ Suggested cleanup
-        import agents  # noqa: F401
+        import agents
     except ImportError:
         missing.append("openai-agents")

     try:
-        from agents.realtime import RealtimeAgent  # noqa: F401
+        from agents.realtime import RealtimeAgent
     except ImportError:
packages/opentelemetry-instrumentation-openai-agents/tests/test_realtime_session.py (1)

4-4: Remove unused imports.

MagicMock, AsyncMock, and patch are imported but not used in this test file.

♻️ Suggested fix
 import pytest
-from unittest.mock import MagicMock, AsyncMock, patch
 from opentelemetry.sdk.trace import TracerProvider
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between a528699 and 30f3bc4.

⛔ Files ignored due to path filters (2)
  • packages/opentelemetry-instrumentation-openai-agents/poetry.lock is excluded by !**/*.lock
  • packages/sample-app/poetry.lock is excluded by !**/*.lock
📒 Files selected for processing (8)
  • packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/__init__.py
  • packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py
  • packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py
  • packages/opentelemetry-instrumentation-openai-agents/pyproject.toml
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_realtime.py
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_realtime_session.py
  • packages/sample-app/pyproject.toml
  • packages/sample-app/sample_app/openai_agents_realtime_example.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules

Files:

  • packages/sample-app/sample_app/openai_agents_realtime_example.py
  • packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_realtime.py
  • packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/__init__.py
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_realtime_session.py
  • packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py
🧠 Learnings (2)
📓 Common learnings
Learnt from: duanyutong
Repo: traceloop/openllmetry PR: 3487
File: packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/utils.py:177-178
Timestamp: 2025-12-02T21:09:48.690Z
Learning: The opentelemetry-instrumentation-openai and opentelemetry-instrumentation-openai-agents packages must remain independent and not share code, so code duplication between them is acceptable.
Learnt from: CR
Repo: traceloop/openllmetry PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-08-17T15:06:48.109Z
Learning: Instrumentation packages must leverage the semantic conventions package and emit OTel-compliant spans
📚 Learning: 2025-12-02T21:09:48.690Z
Learnt from: duanyutong
Repo: traceloop/openllmetry PR: 3487
File: packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/utils.py:177-178
Timestamp: 2025-12-02T21:09:48.690Z
Learning: The opentelemetry-instrumentation-openai and opentelemetry-instrumentation-openai-agents packages must remain independent and not share code, so code duplication between them is acceptable.

Applied to files:

  • packages/opentelemetry-instrumentation-openai-agents/pyproject.toml
  • packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py
  • packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/__init__.py
🧬 Code graph analysis (3)
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py (1)
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (2)
  • SpanAttributes (64-245)
  • TraceloopSpanKindValues (285-290)
packages/opentelemetry-instrumentation-openai-agents/tests/test_realtime.py (4)
packages/traceloop-sdk/traceloop/sdk/utils/in_memory_span_exporter.py (3)
  • export (45-51)
  • InMemorySpanExporter (22-61)
  • get_finished_spans (40-43)
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)
  • SpanAttributes (64-245)
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py (4)
  • OpenTelemetryTracingProcessor (24-778)
  • on_trace_start (41-53)
  • on_span_start (65-317)
  • on_trace_end (56-62)
packages/opentelemetry-instrumentation-openai-agents/tests/test_realtime_session.py (1)
  • tracer (26-29)
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/__init__.py (2)
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py (2)
  • wrap_realtime_session (296-483)
  • unwrap_realtime_session (486-502)
packages/opentelemetry-instrumentation-openai-agents/tests/test_realtime_session.py (1)
  • tracer (26-29)
🪛 Flake8 (7.3.0)
packages/opentelemetry-instrumentation-openai-agents/tests/test_realtime_session.py

[error] 4-4: 'unittest.mock.MagicMock' imported but unused

(F401)


[error] 4-4: 'unittest.mock.AsyncMock' imported but unused

(F401)


[error] 4-4: 'unittest.mock.patch' imported but unused

(F401)

🪛 Ruff (0.14.10)
packages/sample-app/sample_app/openai_agents_realtime_example.py

231-231: Do not catch blind exception: Exception

(BLE001)


290-290: Use of possibly insecure function; consider using ast.literal_eval

(S307)


292-292: Consider moving this statement to an else block

(TRY300)


293-293: Do not catch blind exception: Exception

(BLE001)


421-421: Do not catch blind exception: Exception

(BLE001)


574-574: Do not catch blind exception: Exception

(BLE001)


588-588: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


593-593: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py

203-203: Unused method argument: role

(ARG002)


212-212: Unused method argument: role

(ARG002)


321-322: try-except-pass detected, consider logging the exception

(S110)


321-321: Do not catch blind exception: Exception

(BLE001)


339-340: try-except-pass detected, consider logging the exception

(S110)


339-339: Do not catch blind exception: Exception

(BLE001)


444-445: try-except-pass detected, consider logging the exception

(S110)


444-444: Do not catch blind exception: Exception

(BLE001)


474-475: try-except-pass detected, consider logging the exception

(S110)


474-474: Do not catch blind exception: Exception

(BLE001)

packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/__init__.py

49-51: try-except-pass detected, consider logging the exception

(S110)


49-49: Do not catch blind exception: Exception

(BLE001)


53-53: Unused method argument: kwargs

(ARG002)


59-60: try-except-pass detected, consider logging the exception

(S110)


59-59: Do not catch blind exception: Exception

(BLE001)

packages/opentelemetry-instrumentation-openai-agents/tests/test_realtime_session.py

37-37: Unpacked variable exporter is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


60-60: Unpacked variable exporter is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


85-85: Unpacked variable exporter is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


125-125: Unpacked variable exporter is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


149-149: Unpacked variable exporter is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


161-161: Unpacked variable exporter is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Lint
  • GitHub Check: Build Packages (3.11)
  • GitHub Check: Test Packages (3.10)
  • GitHub Check: Test Packages (3.12)
  • GitHub Check: Test Packages (3.11)
🔇 Additional comments (13)
packages/sample-app/pyproject.toml (3)

30-30: No compatibility issues found. Both llama-index-embeddings-openai (0.5.1) and llama-index-llms-openai (0.6.12) are compatible with openai ^2.14.0.


26-26: Verify that sample app and instrumentation code support openai v2.14.0.

The major version bump from v1.52.2 to v2.14.0 introduces significant breaking changes:

  • Client instantiation pattern (requires OpenAI() class instead of module-level functions)
  • Namespaced API surface (client.chat.*, client.audio.*, etc.)
  • Keyword-only arguments in method calls
  • Response object structure changes
  • Streaming API changes

Ensure that all sample app code and the opentelemetry-instrumentation-openai package have been updated to use v2 patterns and that integration tests verify compatibility with the new version.


58-58: Version 0.6.5 is appropriate and includes realtime support.

The version bump from 0.2.7 to 0.6.5 is justified by the addition of realtime support in this version. Verification confirms that openai-agents 0.6.5 exists on PyPI and includes the RealtimeSession API for realtime model sessions (supporting WebRTC, WebSocket, and SIP transports). No documented breaking changes are present in the 0.6.5 patch release.

packages/opentelemetry-instrumentation-openai-agents/pyproject.toml (1)

34-34: LGTM!

Adding websockets as a dev dependency is appropriate for testing the realtime session instrumentation. The version constraint ^15.0.1 is reasonable.

packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py (2)

69-78: LGTM!

The guarded import pattern correctly handles optional realtime span types, matching the existing approach for set_agent_name. Setting placeholders to None ensures isinstance checks gracefully return False when realtime is unavailable.


235-312: LGTM!

The realtime span handling correctly creates hierarchical spans with appropriate attributes following OTel semantic conventions. The span names (openai.realtime.speech, openai.realtime.transcription, openai.realtime.speech_group) and attributes (LLM_REQUEST_TYPE, gen_ai.system, gen_ai.operation.name) are well-aligned with existing patterns.

Minor note: The has_realtime_spans and SpeechSpanData checks are technically redundant since has_realtime_spans is only True when the imports succeed, but this defensive approach is acceptable for clarity.

packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py (3)

17-94: LGTM!

The RealtimeTracingState class provides clean encapsulation of per-session tracing state. The span lifecycle management correctly:

  • Creates hierarchical spans with proper parent contexts
  • Tracks multiple concurrent spans (agents, tools, audio)
  • Uses appropriate semantic conventions (TraceloopSpanKindValues.WORKFLOW, AGENT)

97-193: LGTM!

The tool, handoff, and audio span methods implement proper context propagation with a sensible fallback chain. The handoff span correctly represents a point-in-time event by creating and immediately ending. The 4096-character truncation for tool output prevents excessively large span attributes.


248-254: Verify SpanAttributes.LLM_SYSTEM exists.

The code references SpanAttributes.LLM_SYSTEM on line 250, but this attribute may not exist in the SpanAttributes class from opentelemetry-semantic-conventions-ai. If this attribute doesn't exist, it will raise an AttributeError at runtime.

packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/__init__.py (1)

44-60: LGTM!

The realtime session instrumentation integration follows the same pattern as the existing add_trace_processor block. The silent exception handling is appropriate for optional features that may not be available in all SDK versions.

The unused kwargs parameter in _uninstrument is a standard BaseInstrumentor interface requirement.

packages/sample-app/sample_app/openai_agents_realtime_example.py (1)

1-41: LGTM!

The sample app has comprehensive documentation explaining the three demo modes (text, voice, handoff) and requirements. The Traceloop initialization is properly configured.

packages/opentelemetry-instrumentation-openai-agents/tests/test_realtime.py (1)

1-481: LGTM!

Comprehensive test coverage for realtime span instrumentation. The tests properly verify:

  • Span creation for speech, transcription, and speech_group types
  • Correct attribute population (model, format, operation name)
  • Error status propagation
  • Span hierarchy (realtime spans nested under workflow)

The mocking approach using patch.dict("sys.modules", ...) is appropriate given WebSocket-based realtime API cannot be recorded with VCR.

packages/opentelemetry-instrumentation-openai-agents/tests/test_realtime_session.py (1)

32-306: LGTM!

Thorough test coverage for RealtimeTracingState:

  • Span lifecycle (start/end for workflow, agent, tool, audio)
  • Handoff span creation with correct attributes
  • Prompt/completion buffering and LLM span generation
  • Cleanup behavior ensuring all spans are terminated
  • Proper parent-child span hierarchy

The tests marked with unused exporter by static analysis are tests that only verify in-memory state rather than exported spans - this is intentional.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

Looks good to me! 👍

Reviewed ac76869 in 57 seconds. Click for details.
  • Reviewed 87 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 6 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py:66
  • Draft comment:
    Removed redundant comment for importing realtime span types. This cleanup is cosmetic.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50% None
2. packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py:232
  • Draft comment:
    Removed unnecessary comment before handling SpeechSpanData. Improves clarity without affecting functionality.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50% None
3. packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py:244
  • Draft comment:
    Removed inline comments ('Add model if available' and 'Add output format if available') from the SpeechSpanData block. Code remains self-explanatory.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50% None
4. packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py:273
  • Draft comment:
    Removed inline comments for the TranscriptionSpanData block. This cleanup helps maintain concise code.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50% None
5. packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py:669
  • Draft comment:
    Removed redundant inline comments in the SpeechSpanData handling within on_span_end. This change is purely cosmetic.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50% None
6. packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py:706
  • Draft comment:
    Removed redundant comment in the SpeechGroupSpanData block. The code remains clear without it.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50% None

Workflow ID: wflow_hRvoqqSZH3ByYumM

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In
@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py:
- Around line 666-701: The on_span_end branch uses string-based type checks
(type(span_data).__name__ == '...') for SpeechSpanData, TranscriptionSpanData,
and SpeechGroupSpanData; change these to the same isinstance() checks used in
on_span_start (e.g., isinstance(span_data, SpeechSpanData),
isinstance(span_data, TranscriptionSpanData), isinstance(span_data,
SpeechGroupSpanData)) and move or add the conditional import of those span
classes to module scope (or use typing.TYPE_CHECKING guards) so the classes are
available for isinstance without causing circular-import issues; update the
checks in on_span_end to reference the actual classes (SpeechSpanData,
TranscriptionSpanData, SpeechGroupSpanData) instead of comparing type names.
- Around line 69-77: The conditional import block inside on_span_start
(currently creating has_realtime_spans and importing SpeechSpanData,
TranscriptionSpanData, SpeechGroupSpanData) should be moved to module level (or
performed once in class initialization) like set_agent_name does: attempt the
import once at import/initialization time, set has_realtime_spans True/False and
set the three symbols to None on ImportError, and then remove the try/except
import from on_span_start so the method uses the pre-initialized
has_realtime_spans and symbol bindings.
🧹 Nitpick comments (3)
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py (3)

234-234: Consider simplifying redundant condition checks.

Lines 234, 261, and 288 check both SpeechSpanData (truthiness) and isinstance(span_data, SpeechSpanData). The truthiness check is redundant since isinstance() returns False when the class is None.

While the current pattern is explicit and defensive, simplifying to has_realtime_spans and isinstance(span_data, SpeechSpanData) would be cleaner.

♻️ Optional simplification
-        elif has_realtime_spans and SpeechSpanData and isinstance(span_data, SpeechSpanData):
+        elif has_realtime_spans and isinstance(span_data, SpeechSpanData):

Apply the same pattern to TranscriptionSpanData (line 261) and SpeechGroupSpanData (line 288).

Also applies to: 261-261, 288-288


234-305: Consider extracting common logic for realtime span creation.

The three realtime span creation blocks (SpeechSpanData, TranscriptionSpanData, SpeechGroupSpanData) share significant structural similarity:

  • Finding current agent span
  • Setting parent context
  • Building base attributes with common keys
  • Starting CLIENT spans with timestamps

While each has type-specific attributes, extracting the common pattern into a helper method would reduce duplication and maintenance burden.

💡 Example helper method approach
def _create_realtime_span(self, span_data, span_name: str, operation: str, parent_context):
    """Helper to create realtime spans with common attributes."""
    attributes = {
        SpanAttributes.LLM_REQUEST_TYPE: "realtime",
        "gen_ai.system": "openai",
        "gen_ai.operation.name": operation,
    }
    
    model = getattr(span_data, 'model', None)
    if model:
        attributes[GenAIAttributes.GEN_AI_REQUEST_MODEL] = model
    
    return self.tracer.start_span(
        span_name,
        kind=SpanKind.CLIENT,
        context=parent_context,
        attributes=attributes,
        start_time=time.time_ns()
    )

Then add format-specific attributes after creation if needed.


672-678: Consider using json.dumps() for non-bytes output data.

Lines 677 and 689 use str() to convert non-bytes output data. For complex objects, str() may produce unhelpful representations. The rest of the file uses json.dumps() for non-string content (e.g., lines 391, 434, 563), which provides consistent serialization.

♻️ Suggested change for consistency
                 output_audio = getattr(span_data, 'output', None)
                 if output_audio:
                     if isinstance(output_audio, (bytes, bytearray)):
                         otel_span.set_attribute("gen_ai.speech.output_bytes", len(output_audio))
                     else:
-                        otel_span.set_attribute(f"{GenAIAttributes.GEN_AI_COMPLETION}.0.content", str(output_audio))
+                        content = output_audio if isinstance(output_audio, str) else json.dumps(output_audio)
+                        otel_span.set_attribute(f"{GenAIAttributes.GEN_AI_COMPLETION}.0.content", content)

Apply similar pattern to line 689 for TranscriptionSpanData input handling.

Also applies to: 684-690

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 30f3bc4 and ac76869.

📒 Files selected for processing (1)
  • packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules

Files:

  • packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py
🧬 Code graph analysis (1)
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py (1)
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)
  • SpanAttributes (64-245)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Test Packages (3.11)
  • GitHub Check: Test Packages (3.10)
  • GitHub Check: Lint
  • GitHub Check: Build Packages (3.11)
  • GitHub Check: Test Packages (3.12)

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

Looks good to me! 👍

Reviewed b3ccbac in 2 minutes and 14 seconds. Click for details.
  • Reviewed 244 lines of code in 3 files
  • Skipped 0 files when reviewing.
  • Skipped posting 4 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/__init__.py:44
  • Draft comment:
    Clear invocation of wrap_realtime_session; removal of redundant comments streamlines the code.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 10% <= threshold 50% This comment seems to be purely informative, as it is stating that the invocation of wrap_realtime_session is clear and that removing redundant comments streamlines the code. It does not provide a suggestion, ask for confirmation, or point out a potential issue.
2. packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py:48
  • Draft comment:
    Good update: using GenAIAttributes.GEN_AI_SYSTEM instead of a literal improves consistency.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50% This comment is purely informative, as it only praises the update without providing any actionable feedback or suggestions. It doesn't ask for confirmation or suggest any changes.
3. packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py:715
  • Draft comment:
    Consider using a GenAIAttributes constant for the 'frequency_penalty' attribute for consistency if one is available.
  • Reason this comment was not posted:
    Comment was on unchanged code.
4. packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py:262
  • Draft comment:
    For consistency, consider using GenAIAttributes constants (e.g. GEN_AI_PROMPT/GEN_AI_COMPLETION) instead of literal strings for prompt and completion keys in create_llm_span.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.

Workflow ID: wflow_KBU2Lf1RoTBOrzIP

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In
@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py:
- Around line 299-311: wrap_realtime_session currently unconditionally writes to
the module-level _original_methods, so repeated calls overwrite true originals;
change wrap_realtime_session to guard storing originals by only setting entries
in _original_methods if the key is not already present (e.g., check for
existence of '__aenter__', '__aexit__', '_put_event', and 'send_message' before
assigning), and ensure the guard uses the RealtimeSession attributes
(RealtimeSession.__aenter__, RealtimeSession.__aexit__,
RealtimeSession._put_event, RealtimeSession.send_message) so re-instrumentation
does not capture already-wrapped methods.
- Around line 275-296: The cleanup method fails to clear the llm_spans
dictionary; update the cleanup implementation in the cleanup method to iterate
over self.llm_spans (similar to tool_spans/audio_spans/agent_spans), ensure each
span has set_status(Status(StatusCode.OK)) called and end() invoked, then clear
self.llm_spans; reference the cleanup method and the llm_spans attribute (and
consider consistency with create_llm_span behavior).
🧹 Nitpick comments (3)
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/__init__.py (1)

44-55: Consider adding debug-level logging for exception handling.

The silent try-except-pass blocks make it difficult to diagnose issues when realtime instrumentation fails to load or unload. While completely silent failures are acceptable for optional features, adding debug-level logging would aid troubleshooting without impacting normal operation.

♻️ Suggested improvement with logging
+import logging
+
+logger = logging.getLogger(__name__)
+
 ...
 
         try:
             from ._realtime_wrappers import wrap_realtime_session
             wrap_realtime_session(tracer)
         except Exception:
-            pass
+            logger.debug("Realtime session instrumentation not available", exc_info=True)

     def _uninstrument(self, **kwargs):
         try:
             from ._realtime_wrappers import unwrap_realtime_session
             unwrap_realtime_session()
         except Exception:
-            pass
+            logger.debug("Realtime session uninstrumentation skipped", exc_info=True)
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py (2)

206-219: Unused role parameters could be utilized or removed.

Both record_prompt and record_completion accept a role parameter that is never used. The role is hardcoded to "user" and "assistant" respectively in create_llm_span. Consider either using the passed role or removing the parameter if it's not needed.

♻️ Option 1: Use the role parameter
     def create_llm_span(self, completion_content: str):
         """Create an LLM span that shows the prompt and completion."""
-        prompts = self.pending_prompts.copy()
+        prompts = self.pending_prompts.copy()  # This would need to store (role, content) tuples
         ...
         for i, prompt in enumerate(prompts):
-            span.set_attribute(f"gen_ai.prompt.{i}.role", "user")
+            span.set_attribute(f"gen_ai.prompt.{i}.role", prompt[0])  # role from tuple
♻️ Option 2: Remove unused parameters
-    def record_prompt(self, role: str, content: str):
+    def record_prompt(self, content: str):
         """Record a prompt message - buffers it for the LLM span."""
         ...

-    def record_completion(self, role: str, content: str):
+    def record_completion(self, content: str):
         """Record a completion message - creates an LLM span with prompt and completion."""
         ...

251-257: Hardcoded model name may not reflect actual model used.

The model is hardcoded to "gpt-4o-realtime-preview" which may not always be accurate. Consider extracting the model from the session configuration or event data if available, with this as a fallback.

♻️ Suggested approach
+    def create_llm_span(self, completion_content: str, model: Optional[str] = None):
         """Create an LLM span that shows the prompt and completion."""
         ...
         span = self.tracer.start_span(
             "openai.realtime",
             kind=SpanKind.CLIENT,
             context=parent_context,
             start_time=start_time,
             attributes={
                 SpanAttributes.LLM_REQUEST_TYPE: "realtime",
                 SpanAttributes.LLM_SYSTEM: "openai",
                 GenAIAttributes.GEN_AI_SYSTEM: "openai",
-                GenAIAttributes.GEN_AI_REQUEST_MODEL: "gpt-4o-realtime-preview",
+                GenAIAttributes.GEN_AI_REQUEST_MODEL: model or "gpt-4o-realtime-preview",
             }
         )

Then pass the model from the event data in traced_put_event when calling record_completion.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between ac76869 and b3ccbac.

📒 Files selected for processing (3)
  • packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/__init__.py
  • packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py
  • packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules

Files:

  • packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/__init__.py
  • packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py
  • packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py
🧠 Learnings (1)
📚 Learning: 2025-12-02T21:09:48.690Z
Learnt from: duanyutong
Repo: traceloop/openllmetry PR: 3487
File: packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/utils.py:177-178
Timestamp: 2025-12-02T21:09:48.690Z
Learning: The opentelemetry-instrumentation-openai and opentelemetry-instrumentation-openai-agents packages must remain independent and not share code, so code duplication between them is acceptable.

Applied to files:

  • packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/__init__.py
  • packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py
🧬 Code graph analysis (2)
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py (1)
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (2)
  • SpanAttributes (64-245)
  • TraceloopSpanKindValues (285-290)
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py (1)
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)
  • SpanAttributes (64-245)
🪛 Ruff (0.14.10)
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/__init__.py

47-48: try-except-pass detected, consider logging the exception

(S110)


47-47: Do not catch blind exception: Exception

(BLE001)


50-50: Unused method argument: kwargs

(ARG002)


54-55: try-except-pass detected, consider logging the exception

(S110)


54-54: Do not catch blind exception: Exception

(BLE001)

packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py

206-206: Unused method argument: role

(ARG002)


215-215: Unused method argument: role

(ARG002)


324-325: try-except-pass detected, consider logging the exception

(S110)


324-324: Do not catch blind exception: Exception

(BLE001)


342-343: try-except-pass detected, consider logging the exception

(S110)


342-342: Do not catch blind exception: Exception

(BLE001)


447-448: try-except-pass detected, consider logging the exception

(S110)


447-447: Do not catch blind exception: Exception

(BLE001)


477-478: try-except-pass detected, consider logging the exception

(S110)


477-477: Do not catch blind exception: Exception

(BLE001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Test Packages (3.12)
  • GitHub Check: Test Packages (3.11)
  • GitHub Check: Test Packages (3.10)
  • GitHub Check: Lint
  • GitHub Check: Build Packages (3.11)
🔇 Additional comments (2)
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py (1)

234-305: LGTM - Realtime span creation follows existing patterns.

The implementation correctly creates OpenTelemetry spans for the three realtime span types with appropriate attributes and parent context linking. The guard pattern has_realtime_spans and SpeechSpanData and isinstance(span_data, SpeechSpanData) is slightly redundant (if has_realtime_spans is True, SpeechSpanData is guaranteed to be non-None), but it's defensive and harmless.

packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py (1)

1-15: New realtime session instrumentation module looks well-structured.

The module introduces comprehensive tracing for OpenAI Agents SDK realtime sessions with proper span hierarchy management, context propagation, and cleanup. The dont_throw decorator ensures instrumentation failures don't break the application.

Comment on lines +275 to +296
def cleanup(self):
"""Clean up any remaining spans."""
for span in list(self.tool_spans.values()):
span.set_status(Status(StatusCode.OK))
span.end()
self.tool_spans.clear()

for span in list(self.audio_spans.values()):
span.set_status(Status(StatusCode.OK))
span.end()
self.audio_spans.clear()

for span in list(self.agent_spans.values()):
span.set_status(Status(StatusCode.OK))
span.end()
self.agent_spans.clear()
self.agent_span_contexts.clear()

if self.workflow_span:
self.workflow_span.set_status(Status(StatusCode.OK))
self.workflow_span.end()
self.workflow_span = None
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

llm_spans dictionary is not cleaned up.

The cleanup method clears tool_spans, audio_spans, and agent_spans, but self.llm_spans (initialized on line 29) is never cleaned up. While LLM spans are created and immediately ended in create_llm_span, the dictionary should still be cleared for consistency.

🔧 Suggested fix
         for span in list(self.agent_spans.values()):
             span.set_status(Status(StatusCode.OK))
             span.end()
         self.agent_spans.clear()
         self.agent_span_contexts.clear()
+        self.llm_spans.clear()

         if self.workflow_span:
🤖 Prompt for AI Agents
In
@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py
around lines 275 - 296, The cleanup method fails to clear the llm_spans
dictionary; update the cleanup implementation in the cleanup method to iterate
over self.llm_spans (similar to tool_spans/audio_spans/agent_spans), ensure each
span has set_status(Status(StatusCode.OK)) called and end() invoked, then clear
self.llm_spans; reference the cleanup method and the llm_spans attribute (and
consider consistency with create_llm_span behavior).

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

Looks good to me! 👍

Reviewed 688aab4 in 2 minutes and 31 seconds. Click for details.
  • Reviewed 642 lines of code in 2 files
  • Skipped 0 files when reviewing.
  • Skipped posting 5 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py:92
  • Draft comment:
    Wrap json.dumps calls in try/except to handle potential serialization errors.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% The diff shows that this is primarily a refactoring - code was extracted from inline implementation into helper functions. The json.dumps calls existed in the original code and were simply moved to the new helper functions. The comment is suggesting defensive programming (adding try/except), which is a code quality suggestion. However, since this code was just moved and not newly written, and the comment doesn't point to a specific new problem introduced by the refactoring, this appears to be a comment about unchanged logic. The rules state "Do NOT comment unless there is clearly a code change required" and "Do NOT comment on anything that would be obviously caught by the build". If json.dumps fails, it would likely be caught during testing. This is more of a speculative improvement suggestion rather than a clear bug. The comment could be valid if the data being serialized is user-controlled or could contain non-serializable objects. In production code, defensive error handling around serialization is often a good practice. However, I need to consider whether this represents a clear issue that was introduced or changed in this PR. While defensive error handling is good practice, this comment is about code that was simply moved during refactoring, not newly introduced logic. The original code didn't have try/except blocks either. According to the rules, I should only keep comments about changes made in the diff, and this is just relocated code with the same behavior. Additionally, this is a speculative suggestion about what "could" go wrong rather than a definite issue. This comment should be deleted because it's about code that was simply moved during refactoring, not changed. The json.dumps calls existed in the original code without try/except blocks, so this isn't a new issue introduced by the PR. The comment is also speculative about potential errors rather than pointing to a definite problem.
2. packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py:543
  • Draft comment:
    Good refactor using _extract_prompt_attributes and _extract_response_attributes; consider enhancing docstrings to specify expected input formats.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50% None
3. packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py:67
  • Draft comment:
    Clarify the return type in _get_parent_context's docstring (i.e. a context object or None).
  • Reason this comment was not posted:
    Confidence changes required: 50% <= threshold 50% None
4. packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py:327
  • Draft comment:
    Cleanup() already ends the workflow span; calling end_workflow_span() afterward may be redundant—verify if both calls are needed.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
5. packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py:440
  • Draft comment:
    Consider logging exceptions in wrapper methods (e.g., in traced_put_event) rather than silently passing, to aid debugging in development.
  • Reason this comment was not posted:
    Confidence changes required: 33% <= threshold 50% None

Workflow ID: wflow_0O5XGrnoAlqBxWrB

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

Looks good to me! 👍

Reviewed fb63eab in 1 minute and 30 seconds. Click for details.
  • Reviewed 18 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 1 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py:417
  • Draft comment:
    The code now records completions only for 'assistant' in history_added events, removing the branch for 'user'. Ensure this is intentional (e.g. to avoid duplicate prompt recording, since prompts are captured elsewhere).
  • Reason this comment was not posted:
    Comment looked like it was already resolved.

Workflow ID: wflow_TOgAnM43qqMcOGfx

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In
@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py:
- Line 29: Remove the unused instance attribute self.llm_spans from the class:
delete its initialization (the line "self.llm_spans: Dict[str, Span] = {}") and
any direct references to that name; ensure no other code relies on it since
spans are created and ended immediately in create_llm_span(), and remove any
now-unused imports (e.g., Dict or Span) only if they become unused after this
deletion.
- Line 248: The span currently hardcodes GenAIAttributes.GEN_AI_REQUEST_MODEL to
"gpt-4o-realtime-preview"; change the wrapper to capture the actual model from
the realtime configuration during initialization (e.g., read into self._model or
model_name from the incoming realtime config/params in the wrapper's __init__ or
setup) and modify calls to create_llm_span(...) to accept and forward that model
value so create_llm_span uses the real model string for
GenAIAttributes.GEN_AI_REQUEST_MODEL instead of the hardcoded literal.
🧹 Nitpick comments (1)
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py (1)

205-212: Consider removing unused role parameters.

The role parameters in both record_prompt() and record_completion() are unused. If these are intended for future use or API consistency, consider adding a comment explaining the design decision.

♻️ Proposed refactor

If the role parameter is not needed:

-    def record_prompt(self, role: str, content: str):
+    def record_prompt(self, content: str):
         """Record a prompt message - buffers it for the LLM span."""
         if not content:
             return
         if not self.pending_prompts:
             self.prompt_start_time = time.time_ns()
             self.prompt_agent_name = self.current_agent_name or self.starting_agent_name
         self.pending_prompts.append(content)

-    def record_completion(self, role: str, content: str):
+    def record_completion(self, content: str):
         """Record a completion message - creates an LLM span with prompt and completion."""
         if not content:
             return
         self.create_llm_span(content)

And update the call sites on lines 421, 436, and 455.

Also applies to: 214-218

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 688aab4 and fb63eab.

📒 Files selected for processing (1)
  • packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules

Files:

  • packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py
🧠 Learnings (2)
📓 Common learnings
Learnt from: duanyutong
Repo: traceloop/openllmetry PR: 3487
File: packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/utils.py:177-178
Timestamp: 2025-12-02T21:09:48.690Z
Learning: The opentelemetry-instrumentation-openai and opentelemetry-instrumentation-openai-agents packages must remain independent and not share code, so code duplication between them is acceptable.
📚 Learning: 2025-12-02T21:09:48.690Z
Learnt from: duanyutong
Repo: traceloop/openllmetry PR: 3487
File: packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/utils.py:177-178
Timestamp: 2025-12-02T21:09:48.690Z
Learning: The opentelemetry-instrumentation-openai and opentelemetry-instrumentation-openai-agents packages must remain independent and not share code, so code duplication between them is acceptable.

Applied to files:

  • packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py
🧬 Code graph analysis (1)
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py (1)
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)
  • TraceloopSpanKindValues (285-290)
🪛 Ruff (0.14.10)
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py

205-205: Unused method argument: role

(ARG002)


214-214: Unused method argument: role

(ARG002)


317-318: try-except-pass detected, consider logging the exception

(S110)


317-317: Do not catch blind exception: Exception

(BLE001)


335-336: try-except-pass detected, consider logging the exception

(S110)


335-335: Do not catch blind exception: Exception

(BLE001)


437-438: try-except-pass detected, consider logging the exception

(S110)


437-437: Do not catch blind exception: Exception

(BLE001)


467-468: try-except-pass detected, consider logging the exception

(S110)


467-467: Do not catch blind exception: Exception

(BLE001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Test Packages (3.12)
  • GitHub Check: Test Packages (3.10)
  • GitHub Check: Test Packages (3.11)
  • GitHub Check: Lint
  • GitHub Check: Build Packages (3.11)

SpanAttributes.LLM_REQUEST_TYPE: "realtime",
SpanAttributes.LLM_SYSTEM: "openai",
GenAIAttributes.GEN_AI_SYSTEM: "openai",
GenAIAttributes.GEN_AI_REQUEST_MODEL: "gpt-4o-realtime-preview",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

rg -n --type=py -C3 "gpt-4o-realtime|model.*realtime|realtime.*model" --glob '!**/test*.py'

Repository: traceloop/openllmetry

Length of output: 7720


🏁 Script executed:

# Check the context around line 248 in the realtime_wrappers.py file
sed -n '230,270p' packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py

Repository: traceloop/openllmetry

Length of output: 1554


🏁 Script executed:

# Look for how the model is determined elsewhere in the realtime wrappers
rg -n "GEN_AI_REQUEST_MODEL|gen_ai_request_model" --type=py packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/

Repository: traceloop/openllmetry

Length of output: 1062


🏁 Script executed:

# Check if there's any configuration or client model information available in the realtime session
rg -n "model\s*=" --type=py -A2 -B2 packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py

Repository: traceloop/openllmetry

Length of output: 47


🏁 Script executed:

# Find the class definition and initialization for the wrapper containing create_llm_span
rg -n "class.*Wrapper|def __init__" --type=py -A5 packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py | head -60

Repository: traceloop/openllmetry

Length of output: 341


🏁 Script executed:

# Check the full create_llm_span function signature and see what instance variables are available
sed -n '200,270p' packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py

Repository: traceloop/openllmetry

Length of output: 2809


🏁 Script executed:

# Look for where create_llm_span is called to understand what context is available
rg -n "create_llm_span" --type=py packages/opentelemetry-instrumentation-openai-agents/

Repository: traceloop/openllmetry

Length of output: 404


Hardcoded model name should be captured from the realtime configuration.

The model name "gpt-4o-realtime-preview" is hardcoded in the span attributes. The actual model can vary (as seen in examples using "gpt-4o-realtime-preview-2024-12-17" or other variants). Store the model information when the wrapper is initialized and pass it to create_llm_span() so the span attributes reflect the actual model being used.

🤖 Prompt for AI Agents
In
@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py
at line 248, The span currently hardcodes GenAIAttributes.GEN_AI_REQUEST_MODEL
to "gpt-4o-realtime-preview"; change the wrapper to capture the actual model
from the realtime configuration during initialization (e.g., read into
self._model or model_name from the incoming realtime config/params in the
wrapper's __init__ or setup) and modify calls to create_llm_span(...) to accept
and forward that model value so create_llm_span uses the real model string for
GenAIAttributes.GEN_AI_REQUEST_MODEL instead of the hardcoded literal.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

Looks good to me! 👍

Reviewed 7d3c83d in 42 seconds. Click for details.
  • Reviewed 12 lines of code in 1 files
  • Skipped 1 files when reviewing.
  • Skipped posting 1 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/opentelemetry-instrumentation-openai-agents/tests/test_realtime_session.py:4
  • Draft comment:
    Removed unused unittest.mock import for MagicMock, AsyncMock, and patch. This cleanup improves code clarity.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50% None

Workflow ID: wflow_y47ImqsYJkHsj8jj

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
packages/opentelemetry-instrumentation-openai-agents/tests/test_realtime_session.py (4)

19-19: Consider moving import to module level.

The SimpleSpanProcessor import is inside the fixture function. For consistency with other imports, consider moving it to the top of the file.

♻️ Proposed refactor

Move the import to the module level:

 from opentelemetry.sdk.trace import TracerProvider
 from opentelemetry.sdk.trace.export.in_memory_span_exporter import InMemorySpanExporter
+from opentelemetry.sdk.trace.export import SimpleSpanProcessor

 from opentelemetry.instrumentation.openai_agents._realtime_wrappers import (
     RealtimeTracingState,
     wrap_realtime_session,
     unwrap_realtime_session,
 )

And remove it from inside the fixture:

 @pytest.fixture
 def tracer_provider():
     """Create a tracer provider with in-memory exporter."""
     provider = TracerProvider()
     exporter = InMemorySpanExporter()
-    from opentelemetry.sdk.trace.export import SimpleSpanProcessor
     provider.add_span_processor(SimpleSpanProcessor(exporter))
     return provider, exporter

36-36: Prefix unused exporter variables with underscore.

Multiple tests unpack the exporter from the tracer_provider fixture but don't use it. Per Python convention and Ruff's suggestion, prefix these with an underscore to indicate they're intentionally unused.

♻️ Proposed fix
-        _, exporter = tracer_provider
+        _, _ = tracer_provider

Apply this change to lines 36, 59, 84, 124, 148, and 160.

Also applies to: 59-59, 84-84, 124-124, 148-148, 160-160


251-255: Consider adding integration tests for session wrapping.

The current test only verifies that wrap_realtime_session and unwrap_realtime_session don't raise exceptions. It doesn't verify that:

  • Wrapping actually patches RealtimeSession methods
  • The tracing state is properly attached to session instances
  • Unwrapping restores original behavior

Consider adding integration tests that create mock RealtimeSession objects and verify the instrumentation behavior.


34-246: Comprehensive test coverage for RealtimeTracingState.

The test suite thoroughly covers the main lifecycle operations:

  • Workflow, agent, tool, and audio span management
  • Handoff span creation with proper attributes
  • Error recording and prompt/completion buffering
  • Multiple LLM span creation
  • Cleanup functionality

The tests validate span names, attributes (e.g., gen_ai.tool.output, gen_ai.handoff.*), and proper export behavior.

Optionally, consider adding edge case tests for:

  • Attempting to end spans that haven't been started
  • Starting multiple workflow spans
  • Recording completion without a prior prompt
  • Handling empty/null values
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between fb63eab and 7d3c83d.

⛔ Files ignored due to path filters (1)
  • packages/sample-app/poetry.lock is excluded by !**/*.lock
📒 Files selected for processing (1)
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_realtime_session.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules

Files:

  • packages/opentelemetry-instrumentation-openai-agents/tests/test_realtime_session.py
🧠 Learnings (1)
📓 Common learnings
Learnt from: duanyutong
Repo: traceloop/openllmetry PR: 3487
File: packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/utils.py:177-178
Timestamp: 2025-12-02T21:09:48.690Z
Learning: The opentelemetry-instrumentation-openai and opentelemetry-instrumentation-openai-agents packages must remain independent and not share code, so code duplication between them is acceptable.
🪛 Ruff (0.14.10)
packages/opentelemetry-instrumentation-openai-agents/tests/test_realtime_session.py

36-36: Unpacked variable exporter is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


59-59: Unpacked variable exporter is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


84-84: Unpacked variable exporter is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


124-124: Unpacked variable exporter is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


148-148: Unpacked variable exporter is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


160-160: Unpacked variable exporter is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Lint
  • GitHub Check: Test Packages (3.12)
  • GitHub Check: Build Packages (3.11)
  • GitHub Check: Test Packages (3.10)
  • GitHub Check: Test Packages (3.11)
🔇 Additional comments (2)
packages/opentelemetry-instrumentation-openai-agents/tests/test_realtime_session.py (2)

258-305: Well-designed span hierarchy validation.

The tests correctly verify parent-child relationships between spans:

  • Tool spans properly parented under agent spans
  • Audio spans properly parented under the current agent
  • Proper use of OpenTelemetry's span context API for validation

The hierarchy tests ensure that spans are correctly nested within the tracing structure, which is critical for proper observability.


153-156: The test correctly accesses attributes on live spans. In OpenTelemetry's Python SDK, span attributes are directly accessible throughout the span's lifecycle via the .attributes property—they are immediately readable after set_attribute() is called, not just after the span is exported. This pattern is used consistently throughout the test suite and is the standard way to verify attributes in OpenTelemetry tests.

Likely an incorrect or invalid review comment.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

Looks good to me! 👍

Reviewed 89558df in 45 seconds. Click for details.
  • Reviewed 354 lines of code in 3 files
  • Skipped 0 files when reviewing.
  • Skipped posting 3 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py:24
  • Draft comment:
    Good improvement: caching realtime span type imports at module level (lines 24-31) avoids repeated costly lookups. Ensure all realtime span checks now consistently use the global _has_realtime_spans variable.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50% None
2. packages/opentelemetry-instrumentation-openai-agents/tests/test_realtime.py:98
  • Draft comment:
    Nice refactor in tests: patching module-level variables directly (using patch.object) instead of modifying sys.modules clarifies the intent and minimizes side effects.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50% None
3. packages/sample-app/sample_app/openai_agents_realtime_example.py:284
  • Draft comment:
    Excellent update in the calculate() function: switching from unsafe eval to AST-based safe evaluation greatly improves security. Consider whether additional numeric operators or validations are needed for expanded use cases.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50% None

Workflow ID: wflow_T6RlIbTY7REsz3PH

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py (1)

411-411: Import AgentSpanData in on_span_end for consistency and replace the string-based type check.

Line 625 uses type(span_data).__name__ == 'AgentSpanData' while the same type is checked with isinstance() in on_span_start. Import AgentSpanData at the top of the on_span_end method (where GenerationSpanData is already imported on line 52) and replace the string comparison with isinstance(span_data, AgentSpanData) for consistency.

Note: ResponseSpanData cannot be imported from the agents module and requires the string-based fallback.

Also applies to: 540 (where applicable)

🤖 Fix all issues with AI agents
In @packages/sample-app/sample_app/openai_agents_realtime_example.py:
- Around line 629-636: Remove the unused "# noqa: F401" comments from the two
import statements so they no longer appear after "import agents" and "from
agents.realtime import RealtimeAgent"; simply delete the " # noqa: F401"
trailing each import to clean up the file and satisfy the linter.
🧹 Nitpick comments (3)
packages/sample-app/sample_app/openai_agents_realtime_example.py (3)

231-233: Broad exception handling is acceptable for demo code.

Catching Exception broadly is flagged by Ruff, but for a sample application, this provides better user experience with a friendly error message. In production code, consider catching more specific exceptions (e.g., OpenAIError, ConnectionError) to avoid masking bugs.


462-464: Broad exception handling is acceptable for demo code.

Similar to the voice demo, catching Exception provides user-friendly error messages. This is reasonable for sample code but production code should catch more specific exceptions.


615-616: Broad exception handling is acceptable for demo code.

Consistent with the other demo functions, this provides user-friendly error messages suitable for sample code.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 7d3c83d and 89558df.

📒 Files selected for processing (3)
  • packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_realtime.py
  • packages/sample-app/sample_app/openai_agents_realtime_example.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules

Files:

  • packages/sample-app/sample_app/openai_agents_realtime_example.py
  • packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_realtime.py
🧠 Learnings (1)
📓 Common learnings
Learnt from: duanyutong
Repo: traceloop/openllmetry PR: 3487
File: packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/utils.py:177-178
Timestamp: 2025-12-02T21:09:48.690Z
Learning: The opentelemetry-instrumentation-openai and opentelemetry-instrumentation-openai-agents packages must remain independent and not share code, so code duplication between them is acceptable.
🧬 Code graph analysis (2)
packages/sample-app/sample_app/openai_agents_realtime_example.py (2)
packages/traceloop-sdk/traceloop/sdk/__init__.py (2)
  • Traceloop (37-268)
  • init (49-199)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/realtime_wrappers.py (2)
  • session (362-364)
  • item (560-562)
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py (1)
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)
  • SpanAttributes (64-245)
🪛 Ruff (0.14.10)
packages/sample-app/sample_app/openai_agents_realtime_example.py

231-231: Do not catch blind exception: Exception

(BLE001)


306-306: Avoid specifying long messages outside the exception class

(TRY003)


311-311: Avoid specifying long messages outside the exception class

(TRY003)


316-316: Avoid specifying long messages outside the exception class

(TRY003)


322-322: Avoid specifying long messages outside the exception class

(TRY003)


328-328: Prefer TypeError exception for invalid type

(TRY004)


328-328: Avoid specifying long messages outside the exception class

(TRY003)


333-333: Consider moving this statement to an else block

(TRY300)


462-462: Do not catch blind exception: Exception

(BLE001)


615-615: Do not catch blind exception: Exception

(BLE001)


629-629: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


634-634: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Test Packages (3.12)
  • GitHub Check: Test Packages (3.10)
  • GitHub Check: Lint
  • GitHub Check: Build Packages (3.11)
  • GitHub Check: Test Packages (3.11)
🔇 Additional comments (21)
packages/sample-app/sample_app/openai_agents_realtime_example.py (7)

1-41: LGTM: Clean module setup with proper documentation.

The module docstring is clear, environment variables are loaded properly, and Traceloop is initialized correctly. No hardcoded secrets detected.


66-108: LGTM: Well-structured tool functions and agent configuration.

The tool functions are simple, safe, and well-documented. The agent is configured with clear instructions and appropriate tools for the voice interface demo.


164-229: LGTM: Robust event handling with defensive programming.

The event loop uses getattr with defaults to safely handle different event types, and the max_events limit prevents infinite loops. The AttributeError handling provides graceful degradation for unexpected event structures.


284-335: Excellent: Security concern addressed with AST-based safe evaluation.

The calculate function now uses AST parsing instead of eval(), directly addressing the previous review's security concern. The implementation correctly:

  • Excludes exponentiation (Pow) to prevent resource exhaustion
  • Limits expression depth to 50 levels
  • Only allows safe operators (Add, Sub, Mult, Div, unary +/-)
  • Handles both Python 3.7 (ast.Num) and 3.8+ (ast.Constant) nodes

This is a secure, production-ready approach for a sample application.

Optional: Minor static analysis refinements

Ruff suggests some minor style improvements:

  • Line 328: Consider TypeError instead of ValueError for unsupported node types
  • Lines 306, 311, 316, 322, 328: Long error messages could be moved to custom exception classes
  • Line 333: The return statement could move to an else block

These are nitpicks and don't affect functionality for a demo app.


492-524: LGTM: Clean agent handoff configuration.

The agent hierarchy is well-structured with specialized agents and appropriate handoff descriptions. This effectively demonstrates the realtime handoff capabilities.


645-648: LGTM: Proper environment variable check.

Checking for OPENAI_API_KEY in environment variables follows best practices and aligns with the coding guideline to store API keys only in environment variables or secure vaults.


653-684: LGTM: Clean main block with good user guidance.

The command-line argument parsing and routing logic is clear and includes helpful error messages. The requirement check before execution prevents confusing errors.

packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py (7)

23-32: LGTM! Module-level import pattern implemented correctly.

The conditional import has been moved to module level, following the same pattern as set_agent_name. This addresses the previous review comment and avoids repeated expensive path lookups on every on_span_start invocation.


34-150: Well-structured helper for prompt extraction.

The function properly handles both OpenAI chat format and Agents SDK format, with appropriate conversion logic and nested structure support. The unified dict-based approach is clean.


152-245: Well-structured helper for response extraction.

The function properly extracts model settings, completions, and usage data with defensive attribute checks. The handling of different output types (text messages, tool calls, direct text) and token naming variations is thorough. Returning model_settings for parent span propagation is a good design choice.


273-273: Good migration to GenAIAttributes constants.

Replacing hardcoded strings with semantic convention constants improves maintainability and ensures consistency with OpenTelemetry standards.

Also applies to: 323-323, 366-366, 391-392, 418-419, 437-438, 636-636


449-520: Realtime span creation logic is well-structured.

The three realtime span types (Speech, Transcription, SpeechGroup) follow a consistent pattern with appropriate attribute setting, parent context management, and defensive checks using the module-level _has_realtime_spans flag.


543-545: Good use of extraction helpers.

Replacing inline extraction logic with the new helper functions reduces duplication and improves maintainability.

Also applies to: 576-576, 582-582, 586-586


589-624: Realtime span end handling is well-implemented.

The end handling for realtime spans properly extracts input/output data and sets appropriate attributes. The use of isinstance() checks is consistent with best practices.

packages/opentelemetry-instrumentation-openai-agents/tests/test_realtime.py (7)

1-59: Well-designed mock classes for testing.

The mock classes provide a clean abstraction for testing realtime instrumentation without requiring actual WebSocket connections. The use of dynamic attribute setting via kwargs in MockSpanData is flexible and appropriate for testing.


61-68: LGTM on the test fixture.

The fixture properly sets up an in-memory exporter with a simple span processor, which is the standard pattern for testing OpenTelemetry instrumentation.


70-167: Comprehensive speech span tests.

The tests properly verify span creation, attribute setting, and model/format capture for speech synthesis. The patching strategy using module-level variables is correct and aligns with the implementation.


169-265: Comprehensive transcription span tests.

The tests properly verify span creation and attribute capture for transcription operations, following the same solid pattern as the speech tests.


267-314: LGTM on speech group span test.

The test properly verifies span creation and attributes for speech group operations.


316-361: Good error handling test.

The test properly verifies that errors are recorded on realtime spans with the correct status code.


363-424: Thorough span hierarchy verification.

The test properly verifies that realtime spans are correctly nested under the workflow root span, ensuring proper parent-child relationships in the span tree.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

Looks good to me! 👍

Reviewed f07996d in 1 minute and 30 seconds. Click for details.
  • Reviewed 23 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 2 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py:26
  • Draft comment:
    Removed 'self.llm_spans' from the tracing state. Confirm this field is unused and its removal is intentional.
  • Reason this comment was not posted:
    Comment looked like it was already resolved.
2. packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py:291
  • Draft comment:
    Added guard to prevent multiple wrapping by checking '_original_methods'. Ensure this approach meets concurrent usage requirements if applicable.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50% The comment is asking the author to ensure that the approach meets concurrent usage requirements, which is against the rules. It is not making a specific suggestion or pointing out a specific issue with the code.

Workflow ID: wflow_XCeiGXzkr8Jzwz35

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Fix all issues with AI agents
In
@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py:
- Line 308: The _tracing_states dict currently created inside
wrap_realtime_session() can leak across wrap/unwrap cycles; move its declaration
out to module scope alongside _original_methods (name: _tracing_states:
Dict[int, RealtimeTracingState]) so all wrappers share the same mapping, then
update unwrap_realtime_session() to explicitly clear _tracing_states (e.g.,
_tracing_states.clear()) when restoring original methods to avoid leftover
references; adjust any references inside wrap_realtime_session() and the wrapper
functions to use the module-level _tracing_states.
- Line 29: The llm_spans dictionary is initialized but never used; remove the
unused attribute and any references to it to avoid dead state: delete the
self.llm_spans: Dict[str, Any] = {} initialization and ensure create_llm_span
(and any other methods referencing llm_spans) do not rely on it since spans are
created and ended immediately in create_llm_span; if tracking is desired
instead, implement proper population and use of llm_spans within create_llm_span
and related logic, otherwise remove the unused symbol entirely.

In @packages/sample-app/sample_app/openai_agents_realtime_example.py:
- Around line 629-636: Remove the unnecessary noqa directives on the two import
lines that reference agents and agents.realtime (the lines importing agents and
from agents.realtime import RealtimeAgent); simply delete the trailing "# noqa:
F401" comments so the imports remain unchanged but without the unused linter
suppression.
🧹 Nitpick comments (3)
packages/sample-app/sample_app/openai_agents_realtime_example.py (1)

268-281: Consider extracting duplicated tool definitions (optional).

The get_weather tool is defined in both run_realtime_demo (lines 66-79) and run_text_only_demo (lines 268-281). While the duplication is minor and each demo remains self-contained for clarity, you could optionally extract common tools to module level if you prefer DRY principles.

However, for sample/demo code, the current self-contained approach may be more pedagogically clear for readers.

packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py (1)

204-211: Unused role parameters in record_prompt and record_completion.

Both methods accept a role parameter but never use it. If these parameters are reserved for future functionality, consider documenting that intent. Otherwise, remove them to simplify the API.

♻️ Proposed fix (if role is not needed)
-    def record_prompt(self, role: str, content: str):
-        """Record a prompt message - buffers it for the LLM span."""
+    def record_prompt(self, content: str):
+        """Record a prompt message - buffers it for the LLM span."""
         if not content:
             return
         if not self.pending_prompts:
             self.prompt_start_time = time.time_ns()
             self.prompt_agent_name = self.current_agent_name or self.starting_agent_name
         self.pending_prompts.append(content)

-    def record_completion(self, role: str, content: str):
-        """Record a completion message - creates an LLM span with prompt and completion."""
+    def record_completion(self, content: str):
+        """Record a completion message - creates an LLM span with prompt and completion."""
         if not content:
             return
         self.create_llm_span(content)

Then update callers at lines 424, 439, 458, 463.

Also applies to: 213-217

packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py (1)

34-150: Consider adding size limits to prompt attribute values.

The _extract_prompt_attributes function sets content and arguments attributes without size limits (lines 105, 149). Unlike other parts of the codebase that truncate to 4096 characters (e.g., line 256 in _realtime_wrappers.py), these attributes could become extremely large if prompts contain substantial text or encoded data.

Add truncation to prevent excessive attribute sizes that could impact telemetry backend performance.

♻️ Proposed fix
         # Set content attribute
         if content is not None:
             if not isinstance(content, str):
                 content = json.dumps(content)
-            otel_span.set_attribute(f"{prefix}.content", content)
+            otel_span.set_attribute(f"{prefix}.content", content[:4096])

         # Set tool_call_id for tool result messages
         if tool_call_id:
             otel_span.set_attribute(f"{prefix}.tool_call_id", tool_call_id)

         # Set tool_calls for assistant messages with tool calls
         if tool_calls:
             for j, tool_call in enumerate(tool_calls):
                 # ... existing code ...
                 if tool_call.get('arguments'):
                     args = tool_call['arguments']
                     if not isinstance(args, str):
                         args = json.dumps(args)
-                    otel_span.set_attribute(f"{prefix}.tool_calls.{j}.arguments", args)
+                    otel_span.set_attribute(f"{prefix}.tool_calls.{j}.arguments", args[:4096])
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 7d3c83d and f07996d.

📒 Files selected for processing (4)
  • packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py
  • packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_realtime.py
  • packages/sample-app/sample_app/openai_agents_realtime_example.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules

Files:

  • packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_realtime.py
  • packages/sample-app/sample_app/openai_agents_realtime_example.py
  • packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py
🧠 Learnings (2)
📓 Common learnings
Learnt from: duanyutong
Repo: traceloop/openllmetry PR: 3487
File: packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/utils.py:177-178
Timestamp: 2025-12-02T21:09:48.690Z
Learning: The opentelemetry-instrumentation-openai and opentelemetry-instrumentation-openai-agents packages must remain independent and not share code, so code duplication between them is acceptable.
📚 Learning: 2025-12-02T21:09:48.690Z
Learnt from: duanyutong
Repo: traceloop/openllmetry PR: 3487
File: packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/utils.py:177-178
Timestamp: 2025-12-02T21:09:48.690Z
Learning: The opentelemetry-instrumentation-openai and opentelemetry-instrumentation-openai-agents packages must remain independent and not share code, so code duplication between them is acceptable.

Applied to files:

  • packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py
  • packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py
🧬 Code graph analysis (1)
packages/sample-app/sample_app/openai_agents_realtime_example.py (2)
packages/traceloop-sdk/traceloop/sdk/__init__.py (2)
  • Traceloop (37-268)
  • init (49-199)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/realtime_wrappers.py (2)
  • session (362-364)
  • item (560-562)
🪛 Ruff (0.14.10)
packages/sample-app/sample_app/openai_agents_realtime_example.py

231-231: Do not catch blind exception: Exception

(BLE001)


306-306: Avoid specifying long messages outside the exception class

(TRY003)


311-311: Avoid specifying long messages outside the exception class

(TRY003)


316-316: Avoid specifying long messages outside the exception class

(TRY003)


322-322: Avoid specifying long messages outside the exception class

(TRY003)


328-328: Prefer TypeError exception for invalid type

(TRY004)


328-328: Avoid specifying long messages outside the exception class

(TRY003)


333-333: Consider moving this statement to an else block

(TRY300)


462-462: Do not catch blind exception: Exception

(BLE001)


615-615: Do not catch blind exception: Exception

(BLE001)


629-629: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


634-634: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py

204-204: Unused method argument: role

(ARG002)


213-213: Unused method argument: role

(ARG002)


320-321: try-except-pass detected, consider logging the exception

(S110)


320-320: Do not catch blind exception: Exception

(BLE001)


338-339: try-except-pass detected, consider logging the exception

(S110)


338-338: Do not catch blind exception: Exception

(BLE001)


440-441: try-except-pass detected, consider logging the exception

(S110)


440-440: Do not catch blind exception: Exception

(BLE001)


470-471: try-except-pass detected, consider logging the exception

(S110)


470-470: Do not catch blind exception: Exception

(BLE001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Test Packages (3.11)
  • GitHub Check: Build Packages (3.11)
  • GitHub Check: Test Packages (3.10)
  • GitHub Check: Test Packages (3.12)
  • GitHub Check: Lint
🔇 Additional comments (6)
packages/sample-app/sample_app/openai_agents_realtime_example.py (5)

43-243: LGTM! Well-structured realtime demo.

The function demonstrates realtime capabilities effectively with:

  • Proper tool definitions using @function_tool decorator
  • Comprehensive agent and runner configuration
  • Safe event processing with getattr patterns to handle varying event structures
  • Event limit to prevent infinite loops in demo context

The broad exception catch on line 231 is acceptable for a sample application as it provides user-friendly error messaging.


283-335: Excellent secure implementation of safe expression evaluator.

The AST-based evaluation properly addresses the previous security concern by:

  • Using an allowlist of safe operators (no exponentiation to prevent 9**9**9 attacks)
  • Implementing depth limits to prevent deeply nested expressions
  • Handling both Python 3.7 (ast.Num) and 3.8+ (ast.Constant) for compatibility
  • Providing clear error messages for unsupported operations

The static analysis hints about long error messages and using TypeError are pedantic in this context - the implementation is appropriate for a sample/demo tool.


377-461: LGTM! Sequential message handling is well-implemented.

The event loop correctly:

  • Sends demo messages sequentially after agent completion
  • Uses brief pauses between messages for natural pacing
  • Filters verbose events to keep output readable
  • Includes safeguards (max_events limit) to prevent runaway loops

475-622: LGTM! Handoff demo clearly illustrates multi-agent coordination.

The function effectively demonstrates:

  • Agent hierarchy with specialized agents (Weather Expert, Time Expert)
  • Proper use of realtime_handoff with descriptive tool descriptions
  • Handoff event tracking with emphasized logging (*** markers)
  • Graceful session termination after sufficient events

653-684: LGTM! Main block provides clear entry points.

The main execution block:

  • Validates requirements before running demos
  • Provides clear command-line interface with three demo options
  • Includes helpful guidance messages for users
  • Properly handles async execution with asyncio.run()
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py (1)

247-247: Hardcoded model name doesn't reflect the actual realtime model.

The model is hardcoded as "gpt-4o-realtime-preview", but realtime sessions can use different model variants (e.g., "gpt-4o-realtime-preview-2024-12-17"). This results in inaccurate telemetry data.

Capture the actual model from the realtime session configuration during initialization and use it when creating LLM spans.

Verify what model information is available from the RealtimeSession object:

#!/bin/bash
# Search for model configuration in RealtimeSession and related classes
rg -n --type=py -C5 "class RealtimeSession" packages/

Likely an incorrect or invalid review comment.

Comment on lines 580 to +624
elif span_data:
# Extract prompt data from input and add to response span (legacy support)
input_data = getattr(span_data, 'input', [])
if input_data:
for i, message in enumerate(input_data):
if hasattr(message, 'role') and hasattr(message, 'content'):
otel_span.set_attribute(f"gen_ai.prompt.{i}.role", message.role)
content = message.content
if isinstance(content, dict):
content = json.dumps(content)
otel_span.set_attribute(f"gen_ai.prompt.{i}.content", content)
elif isinstance(message, dict):
if 'role' in message and 'content' in message:
otel_span.set_attribute(f"gen_ai.prompt.{i}.role", message['role'])
content = message['content']
if isinstance(content, dict):
content = json.dumps(content)
otel_span.set_attribute(f"gen_ai.prompt.{i}.content", content)
_extract_prompt_attributes(otel_span, input_data)

response = getattr(span_data, 'response', None)
if response:

# Extract model settings from the response
model_settings = {}

if hasattr(response, 'temperature') and response.temperature is not None:
model_settings['temperature'] = response.temperature
otel_span.set_attribute(GenAIAttributes.GEN_AI_REQUEST_TEMPERATURE, response.temperature)

if hasattr(response, 'max_output_tokens') and response.max_output_tokens is not None:
model_settings['max_tokens'] = response.max_output_tokens
otel_span.set_attribute(GenAIAttributes.GEN_AI_REQUEST_MAX_TOKENS, response.max_output_tokens)

if hasattr(response, 'top_p') and response.top_p is not None:
model_settings['top_p'] = response.top_p
otel_span.set_attribute(GenAIAttributes.GEN_AI_REQUEST_TOP_P, response.top_p)

if hasattr(response, 'model') and response.model:
model_settings['model'] = response.model
otel_span.set_attribute("gen_ai.request.model", response.model)

# Extract completions and add directly to response span using OpenAI semantic conventions
if hasattr(response, 'output') and response.output:
for i, output in enumerate(response.output):
# Handle different output types
if hasattr(output, 'content') and output.content:
# Text message with content array (ResponseOutputMessage)
content_text = ""
for content_item in output.content:
if hasattr(content_item, 'text'):
content_text += content_item.text

if content_text:
otel_span.set_attribute(
f"{GenAIAttributes.GEN_AI_COMPLETION}.{i}.content", content_text)
otel_span.set_attribute(
f"{GenAIAttributes.GEN_AI_COMPLETION}.{i}.role", getattr(
output, 'role', 'assistant'))

elif hasattr(output, 'name'):
# Function/tool call (ResponseFunctionToolCall) - use OpenAI tool call format
tool_name = getattr(output, 'name', 'unknown_tool')
arguments = getattr(output, 'arguments', '{}')
tool_call_id = getattr(output, 'call_id', f"call_{i}")

# Set completion with tool call following OpenAI format
otel_span.set_attribute(f"{GenAIAttributes.GEN_AI_COMPLETION}.{i}.role", "assistant")
otel_span.set_attribute(
f"{GenAIAttributes.GEN_AI_COMPLETION}.{i}.finish_reason", "tool_calls")
otel_span.set_attribute(
f"{GenAIAttributes.GEN_AI_COMPLETION}.{i}.tool_calls.0.name", tool_name)
otel_span.set_attribute(
f"{GenAIAttributes.GEN_AI_COMPLETION}.{i}.tool_calls.0.arguments", arguments)
otel_span.set_attribute(
f"{GenAIAttributes.GEN_AI_COMPLETION}.{i}.tool_calls.0.id", tool_call_id)

elif hasattr(output, 'text'):
# Direct text content
otel_span.set_attribute(f"{GenAIAttributes.GEN_AI_COMPLETION}.{i}.content", output.text)
otel_span.set_attribute(
f"{GenAIAttributes.GEN_AI_COMPLETION}.{i}.role", getattr(
output, 'role', 'assistant'))

# Add finish reason if available (for non-tool-call cases)
if hasattr(response, 'finish_reason') and not hasattr(output, 'name'):
otel_span.set_attribute(
f"{GenAIAttributes.GEN_AI_COMPLETION}.{i}.finish_reason", response.finish_reason)

# Extract usage data and add directly to response span
if hasattr(response, 'usage') and response.usage:
usage = response.usage
# Try both naming conventions: input_tokens/output_tokens and prompt_tokens/completion_tokens
if hasattr(usage, 'input_tokens') and usage.input_tokens is not None:
otel_span.set_attribute(GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS, usage.input_tokens)
elif hasattr(usage, 'prompt_tokens') and usage.prompt_tokens is not None:
otel_span.set_attribute(GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS, usage.prompt_tokens)

if hasattr(usage, 'output_tokens') and usage.output_tokens is not None:
otel_span.set_attribute(GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS, usage.output_tokens)
elif hasattr(usage, 'completion_tokens') and usage.completion_tokens is not None:
otel_span.set_attribute(GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS, usage.completion_tokens)

if hasattr(usage, 'total_tokens') and usage.total_tokens is not None:
otel_span.set_attribute(SpanAttributes.LLM_USAGE_TOTAL_TOKENS, usage.total_tokens)

# Check for frequency_penalty
if hasattr(response, 'frequency_penalty') and response.frequency_penalty is not None:
model_settings['frequency_penalty'] = response.frequency_penalty

# Store model settings to add to the agent span (but NOT prompts/completions)
model_settings = _extract_response_attributes(otel_span, response)
self._last_model_settings = model_settings

elif _has_realtime_spans and SpeechSpanData and isinstance(span_data, SpeechSpanData):
input_text = getattr(span_data, 'input', None)
if input_text:
otel_span.set_attribute(f"{GenAIAttributes.GEN_AI_PROMPT}.0.content", input_text)
otel_span.set_attribute(f"{GenAIAttributes.GEN_AI_PROMPT}.0.role", "user")

output_audio = getattr(span_data, 'output', None)
if output_audio:
if isinstance(output_audio, (bytes, bytearray)):
otel_span.set_attribute("gen_ai.speech.output_bytes", len(output_audio))
else:
otel_span.set_attribute(f"{GenAIAttributes.GEN_AI_COMPLETION}.0.content", str(output_audio))

first_content_at = getattr(span_data, 'first_content_at', None)
if first_content_at:
otel_span.set_attribute("gen_ai.speech.first_content_at", first_content_at)

elif _has_realtime_spans and TranscriptionSpanData and isinstance(span_data, TranscriptionSpanData):
input_audio = getattr(span_data, 'input', None)
if input_audio:
if isinstance(input_audio, (bytes, bytearray)):
otel_span.set_attribute("gen_ai.transcription.input_bytes", len(input_audio))
else:
otel_span.set_attribute(f"{GenAIAttributes.GEN_AI_PROMPT}.0.content", str(input_audio))

output_text = getattr(span_data, 'output', None)
if output_text:
otel_span.set_attribute(f"{GenAIAttributes.GEN_AI_COMPLETION}.0.content", output_text)
otel_span.set_attribute(f"{GenAIAttributes.GEN_AI_COMPLETION}.0.role", "assistant")

elif _has_realtime_spans and SpeechGroupSpanData and isinstance(span_data, SpeechGroupSpanData):
input_text = getattr(span_data, 'input', None)
if input_text:
otel_span.set_attribute(f"{GenAIAttributes.GEN_AI_PROMPT}.0.content", input_text)
otel_span.set_attribute(f"{GenAIAttributes.GEN_AI_PROMPT}.0.role", "user")

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Realtime span handling is unreachable due to overly broad legacy fallback condition.

The legacy fallback at line 580 (elif span_data:) matches ANY span with data, preventing the realtime-specific handlers (lines 589-624) from ever executing. This means SpeechSpanData, TranscriptionSpanData, and SpeechGroupSpanData will be processed by the generic legacy code instead of their specialized handlers, resulting in missing or incorrect attributes.

Reorder the conditions so specific realtime checks come before the broad legacy fallback.

🔧 Proposed fix
             if response:
                 model_settings = _extract_response_attributes(otel_span, response)
                 self._last_model_settings = model_settings

-        # Legacy fallback for other span types
-        elif span_data:
-            input_data = getattr(span_data, 'input', [])
-            _extract_prompt_attributes(otel_span, input_data)
-
-            response = getattr(span_data, 'response', None)
-            if response:
-                model_settings = _extract_response_attributes(otel_span, response)
-                self._last_model_settings = model_settings
-
-        elif _has_realtime_spans and SpeechSpanData and isinstance(span_data, SpeechSpanData):
+        elif _has_realtime_spans and SpeechSpanData and isinstance(span_data, SpeechSpanData):
             input_text = getattr(span_data, 'input', None)
             if input_text:
                 otel_span.set_attribute(f"{GenAIAttributes.GEN_AI_PROMPT}.0.content", input_text)
                 otel_span.set_attribute(f"{GenAIAttributes.GEN_AI_PROMPT}.0.role", "user")

             output_audio = getattr(span_data, 'output', None)
             if output_audio:
                 if isinstance(output_audio, (bytes, bytearray)):
                     otel_span.set_attribute("gen_ai.speech.output_bytes", len(output_audio))
                 else:
                     otel_span.set_attribute(f"{GenAIAttributes.GEN_AI_COMPLETION}.0.content", str(output_audio))

             first_content_at = getattr(span_data, 'first_content_at', None)
             if first_content_at:
                 otel_span.set_attribute("gen_ai.speech.first_content_at", first_content_at)

         elif _has_realtime_spans and TranscriptionSpanData and isinstance(span_data, TranscriptionSpanData):
             input_audio = getattr(span_data, 'input', None)
             if input_audio:
                 if isinstance(input_audio, (bytes, bytearray)):
                     otel_span.set_attribute("gen_ai.transcription.input_bytes", len(input_audio))
                 else:
                     otel_span.set_attribute(f"{GenAIAttributes.GEN_AI_PROMPT}.0.content", str(input_audio))

             output_text = getattr(span_data, 'output', None)
             if output_text:
                 otel_span.set_attribute(f"{GenAIAttributes.GEN_AI_COMPLETION}.0.content", output_text)
                 otel_span.set_attribute(f"{GenAIAttributes.GEN_AI_COMPLETION}.0.role", "assistant")

         elif _has_realtime_spans and SpeechGroupSpanData and isinstance(span_data, SpeechGroupSpanData):
             input_text = getattr(span_data, 'input', None)
             if input_text:
                 otel_span.set_attribute(f"{GenAIAttributes.GEN_AI_PROMPT}.0.content", input_text)
                 otel_span.set_attribute(f"{GenAIAttributes.GEN_AI_PROMPT}.0.role", "user")

+        # Legacy fallback for other span types
+        elif span_data:
+            input_data = getattr(span_data, 'input', [])
+            _extract_prompt_attributes(otel_span, input_data)
+
+            response = getattr(span_data, 'response', None)
+            if response:
+                model_settings = _extract_response_attributes(otel_span, response)
+                self._last_model_settings = model_settings
+
         elif span_data and type(span_data).__name__ == 'AgentSpanData':

self.agent_spans: Dict[str, Span] = {}
self.tool_spans: Dict[str, Span] = {}
self.audio_spans: Dict[str, Span] = {}
self.span_contexts: Dict[str, Any] = {}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

The llm_spans dictionary remains unused.

As previously noted, llm_spans is initialized but never populated or accessed anywhere in the code. LLM spans are created and immediately ended in create_llm_span() (line 238-262), so tracking is unnecessary.

🧹 Proposed fix
     self.tool_spans: Dict[str, Span] = {}
     self.audio_spans: Dict[str, Span] = {}
-    self.llm_spans: Dict[str, Span] = {}
     self.span_contexts: Dict[str, Any] = {}

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In
@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py
at line 29, The llm_spans dictionary is initialized but never used; remove the
unused attribute and any references to it to avoid dead state: delete the
self.llm_spans: Dict[str, Any] = {} initialization and ensure create_llm_span
(and any other methods referencing llm_spans) do not rely on it since spans are
created and ended immediately in create_llm_span; if tracking is desired
instead, implement proper population and use of llm_spans within create_llm_span
and related logic, otherwise remove the unused symbol entirely.

if hasattr(RealtimeSession, 'send_message'):
_original_methods['send_message'] = RealtimeSession.send_message

_tracing_states: Dict[int, RealtimeTracingState] = {}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

_tracing_states dictionary may leak memory across wrap/unwrap cycles.

The _tracing_states dictionary is defined inside wrap_realtime_session() and captured in the closure of the wrapper functions. If wrap_realtime_session() is called multiple times after unwrap_realtime_session(), each invocation creates a new _tracing_states dictionary, but the old ones may not be garbage collected if there are lingering references from previous wrappers.

Consider defining _tracing_states at module level (similar to _original_methods) and explicitly clearing it in unwrap_realtime_session().

♻️ Proposed fix
 _original_methods: Dict[str, Any] = {}
+_tracing_states: Dict[int, Any] = {}


 def wrap_realtime_session(tracer: Tracer):
     """Wrap the RealtimeSession class to add OpenTelemetry tracing."""
     # Guard against multiple wrapping - don't overwrite originals with wrapped methods
     if _original_methods:
         return

     try:
         from agents.realtime.session import RealtimeSession
     except ImportError:
         return

     _original_methods['__aenter__'] = RealtimeSession.__aenter__
     _original_methods['__aexit__'] = RealtimeSession.__aexit__
     _original_methods['_put_event'] = RealtimeSession._put_event
     if hasattr(RealtimeSession, 'send_message'):
         _original_methods['send_message'] = RealtimeSession.send_message

-    _tracing_states: Dict[int, RealtimeTracingState] = {}

     @dont_throw

Then in unwrap_realtime_session():

     if 'send_message' in _original_methods:
         RealtimeSession.send_message = _original_methods['send_message']

     _original_methods.clear()
+    _tracing_states.clear()

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In
@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py
at line 308, The _tracing_states dict currently created inside
wrap_realtime_session() can leak across wrap/unwrap cycles; move its declaration
out to module scope alongside _original_methods (name: _tracing_states:
Dict[int, RealtimeTracingState]) so all wrappers share the same mapping, then
update unwrap_realtime_session() to explicitly clear _tracing_states (e.g.,
_tracing_states.clear()) when restoring original methods to avoid leftover
references; adjust any references inside wrap_realtime_session() and the wrapper
functions to use the module-level _tracing_states.

Comment on lines +73 to +120
def test_speech_span_start_creates_otel_span(self, tracer_provider_and_exporter):
"""Test that SpeechSpanData creates an OpenTelemetry span."""
from opentelemetry.instrumentation.openai_agents._hooks import (
OpenTelemetryTracingProcessor,
)

provider, exporter = tracer_provider_and_exporter
tracer = provider.get_tracer("test")
processor = OpenTelemetryTracingProcessor(tracer)

# Create a mock trace first
mock_trace = MagicMock()
mock_trace.trace_id = "test-trace-123"
processor.on_trace_start(mock_trace)

# Mock the span data with type name matching
speech_data = MockSpeechSpanData(
model="tts-1",
output_format="mp3",
input="Hello, world!",
)
speech_data.__class__.__name__ = "SpeechSpanData"

mock_span = MockAgentSpan(speech_data, trace_id="test-trace-123")

# Patch the module-level variables directly
import opentelemetry.instrumentation.openai_agents._hooks as hooks_module
with patch.object(hooks_module, '_has_realtime_spans', True), \
patch.object(hooks_module, 'SpeechSpanData', MockSpeechSpanData), \
patch.object(hooks_module, 'TranscriptionSpanData', MockTranscriptionSpanData), \
patch.object(hooks_module, 'SpeechGroupSpanData', MockSpeechGroupSpanData):
processor.on_span_start(mock_span)
processor.on_span_end(mock_span)

# End the trace
processor.on_trace_end(mock_trace)

spans = exporter.get_finished_spans()
span_names = [s.name for s in spans]

assert "Agent Workflow" in span_names
assert "openai.realtime.speech" in span_names

speech_span = next(s for s in spans if s.name == "openai.realtime.speech")
assert speech_span.attributes[SpanAttributes.LLM_REQUEST_TYPE] == "realtime"
assert speech_span.attributes["gen_ai.system"] == "openai"
assert speech_span.attributes["gen_ai.operation.name"] == "speech"
assert speech_span.status.status_code == StatusCode.OK
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Tests may not detect the unreachable realtime span handling in on_span_end.

The test verifies span creation and attributes, but the critical bug in _hooks.py (line 580) means realtime-specific attribute extraction in on_span_end is unreachable. The tests pass because:

  1. Span creation and basic attributes are set in on_span_start (lines 449-520 in _hooks.py)
  2. The realtime-specific attributes that should be set in on_span_end (lines 589-624 in _hooks.py) are never checked

Add test assertions that verify realtime-specific attributes set during on_span_end, such as:

  • gen_ai.prompt.0.content for input text (line 592 in _hooks.py)
  • gen_ai.speech.output_bytes for audio output (line 598 in _hooks.py)
  • gen_ai.speech.first_content_at for timing (line 604 in _hooks.py)
🧪 Proposed additional test assertions
         speech_span = next(s for s in spans if s.name == "openai.realtime.speech")
         assert speech_span.attributes[SpanAttributes.LLM_REQUEST_TYPE] == "realtime"
         assert speech_span.attributes["gen_ai.system"] == "openai"
         assert speech_span.attributes["gen_ai.operation.name"] == "speech"
         assert speech_span.status.status_code == StatusCode.OK
+        # Verify on_span_end set these attributes
+        assert speech_span.attributes.get(f"{GenAIAttributes.GEN_AI_PROMPT}.0.content") == "Hello, world!"
+        assert speech_span.attributes.get(f"{GenAIAttributes.GEN_AI_PROMPT}.0.role") == "user"

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

Looks good to me! 👍

Reviewed 5325e3f in 2 minutes and 38 seconds. Click for details.
  • Reviewed 65 lines of code in 2 files
  • Skipped 0 files when reviewing.
  • Skipped posting 5 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py:40
  • Draft comment:
    Document the default 'model_name' value and its override mechanism to ensure consistency with session configuration.
  • Reason this comment was not posted:
    Confidence changes required: 33% <= threshold 50% None
2. packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py:244
  • Draft comment:
    Good improvement—using 'self.model_name' for the GEN_AI_REQUEST_MODEL attribute allows configuration overrides.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50% None
3. packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py:322
  • Draft comment:
    Consider handling cases where 'config' is a dict rather than an object; using getattr may fail. Using config.get('model_settings') (or similar) might be more robust.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% The comment raises a valid point about dict vs object handling. However, I need to consider: 1) The entire block is wrapped in a try-except that catches all exceptions, so if config is a dict and getattr fails, it will be caught silently. 2) This is defensive code trying to extract model name from various possible locations. 3) The comment suggests a code improvement but the current code won't crash due to the exception handler. 4) The comment is speculative ("may fail", "might be more robust") rather than pointing to a definite bug. 5) According to the rules, speculative comments like "If X, then Y is an issue" should be removed unless it's definitely an issue. The code is already protected by a try-except block that catches all exceptions, so even if config is a dict and getattr fails, the code won't crash. The comment is suggesting a code quality improvement rather than pointing out a definite bug. This might be considered speculative since it says "may fail" and "might be more robust". While the try-except does prevent crashes, the comment is suggesting a more robust approach that would actually work for both dicts and objects. However, the comment uses speculative language ("may fail", "might be more robust") and doesn't demonstrate that this is definitely a problem. Without evidence that config is actually a dict in practice, this is a speculative suggestion for a refactor. The comment is speculative and suggests a code quality improvement rather than identifying a definite bug. The existing try-except block already handles any potential failures. According to the rules, speculative comments should be removed unless there's strong evidence of a definite issue.
4. packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py:339
  • Draft comment:
    Consider logging exception details in the traced_aenter block instead of silently swallowing errors; this would aid in debugging potential issues with model name extraction.
  • Reason this comment was not posted:
    Comment was on unchanged code.
5. packages/sample-app/sample_app/openai_agents_realtime_example.py:54
  • Draft comment:
    Removing the unnecessary '# noqa' comments on import statements cleans up the code.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50% None

Workflow ID: wflow_ro6nmpVV0WODpPT2

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

Looks good to me! 👍

Reviewed b3bc328 in 38 seconds. Click for details.
  • Reviewed 19 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 2 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/sample-app/sample_app/openai_agents_realtime_example.py:629
  • Draft comment:
    Add '# noqa: F401' to suppress unused import warnings; consider adding a brief comment explaining that this import is meant solely for dependency checking.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50% None
2. packages/sample-app/sample_app/openai_agents_realtime_example.py:634
  • Draft comment:
    Using '# noqa: F401' suppresses unused import warnings; a short inline note on its purpose (dependency check) would be helpful.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50% None

Workflow ID: wflow_o0N4A4OiprupApfl

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In
@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py:
- Line 309: _tracing_states is currently defined inside wrap_realtime_session()
causing a closure and potential memory leaks; move the _tracing_states:
Dict[int, RealtimeTracingState] = {} declaration to module level alongside
_original_methods, remove its local declaration inside wrap_realtime_session(),
and update unwrap_realtime_session() to explicitly clear _tracing_states (e.g.,
_tracing_states.clear()) when restoring originals so old state dictionaries can
be garbage collected.
🧹 Nitpick comments (2)
packages/sample-app/sample_app/openai_agents_realtime_example.py (1)

624-650: Consider re-adding linter suppressions for intentional import-only checks.

The imports on lines 629 and 634 are intentionally unused—they exist solely to verify package availability via try-except. While the previous review requested removing # noqa: F401 directives, Flake8 now correctly flags them as unused (F401).

For clarity and to document intent, consider re-adding the suppressions:

📝 Suggested clarification
     try:
-        import agents
+        import agents  # noqa: F401 - import check only
     except ImportError:
         missing.append("openai-agents")
 
     try:
-        from agents.realtime import RealtimeAgent
+        from agents.realtime import RealtimeAgent  # noqa: F401 - import check only
     except ImportError:
         missing.append("openai-agents (realtime module)")
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py (1)

205-218: Optional: Consider prefixing unused role parameters.

The role parameter in record_prompt() and record_completion() is unused (flagged by ARG002). While this is likely intentional for API consistency or future extensibility, consider prefixing with an underscore to document the intent:

📝 Optional cleanup
-    def record_prompt(self, role: str, content: str):
+    def record_prompt(self, _role: str, content: str):
         """Record a prompt message - buffers it for the LLM span."""
         
-    def record_completion(self, role: str, content: str):
+    def record_completion(self, _role: str, content: str):
         """Record a completion message - creates an LLM span with prompt and completion."""
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between f07996d and 5325e3f.

📒 Files selected for processing (2)
  • packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py
  • packages/sample-app/sample_app/openai_agents_realtime_example.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules

Files:

  • packages/sample-app/sample_app/openai_agents_realtime_example.py
  • packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py
🧠 Learnings (2)
📓 Common learnings
Learnt from: duanyutong
Repo: traceloop/openllmetry PR: 3487
File: packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/utils.py:177-178
Timestamp: 2025-12-02T21:09:48.690Z
Learning: The opentelemetry-instrumentation-openai and opentelemetry-instrumentation-openai-agents packages must remain independent and not share code, so code duplication between them is acceptable.
📚 Learning: 2025-12-02T21:09:48.690Z
Learnt from: duanyutong
Repo: traceloop/openllmetry PR: 3487
File: packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/utils.py:177-178
Timestamp: 2025-12-02T21:09:48.690Z
Learning: The opentelemetry-instrumentation-openai and opentelemetry-instrumentation-openai-agents packages must remain independent and not share code, so code duplication between them is acceptable.

Applied to files:

  • packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py
🪛 Flake8 (7.3.0)
packages/sample-app/sample_app/openai_agents_realtime_example.py

[error] 629-629: 'agents' imported but unused

(F401)


[error] 634-634: 'agents.realtime.RealtimeAgent' imported but unused

(F401)

🪛 Ruff (0.14.10)
packages/sample-app/sample_app/openai_agents_realtime_example.py

231-231: Do not catch blind exception: Exception

(BLE001)


306-306: Avoid specifying long messages outside the exception class

(TRY003)


311-311: Avoid specifying long messages outside the exception class

(TRY003)


316-316: Avoid specifying long messages outside the exception class

(TRY003)


322-322: Avoid specifying long messages outside the exception class

(TRY003)


328-328: Prefer TypeError exception for invalid type

(TRY004)


328-328: Avoid specifying long messages outside the exception class

(TRY003)


333-333: Consider moving this statement to an else block

(TRY300)


462-462: Do not catch blind exception: Exception

(BLE001)


615-615: Do not catch blind exception: Exception

(BLE001)

packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py

205-205: Unused method argument: role

(ARG002)


214-214: Unused method argument: role

(ARG002)


339-340: try-except-pass detected, consider logging the exception

(S110)


339-339: Do not catch blind exception: Exception

(BLE001)


357-358: try-except-pass detected, consider logging the exception

(S110)


357-357: Do not catch blind exception: Exception

(BLE001)


459-460: try-except-pass detected, consider logging the exception

(S110)


459-459: Do not catch blind exception: Exception

(BLE001)


489-490: try-except-pass detected, consider logging the exception

(S110)


489-489: Do not catch blind exception: Exception

(BLE001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Test Packages (3.12)
  • GitHub Check: Lint
  • GitHub Check: Build Packages (3.11)
🔇 Additional comments (15)
packages/sample-app/sample_app/openai_agents_realtime_example.py (5)

1-41: LGTM!

The module documentation is comprehensive, imports are appropriate, and Traceloop initialization follows standard patterns for OpenTelemetry instrumentation.


43-243: LGTM!

The voice demo implementation is well-structured with clear tool definitions, comprehensive event handling, and appropriate error management for sample code. The broad exception catching on line 231 is acceptable in a demonstration context.


284-335: Excellent implementation of safe expression evaluation!

The calculate() function properly addresses the previous security concern by using AST parsing with:

  • Operator allowlist (excluding exponentiation to prevent resource exhaustion)
  • Depth limit to prevent deeply nested expressions
  • Proper node type validation

The implementation is secure and appropriate for the sample app.


475-621: LGTM!

The handoff demo effectively demonstrates multi-agent workflows with clear agent hierarchy and handoff routing. The implementation is appropriate for a sample application.


653-684: LGTM!

The main block provides a clean CLI interface with clear usage guidance and appropriate error handling.

packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py (10)

1-18: LGTM!

The module documentation clearly explains the wrapper approach, and the setup follows proper patterns with _original_methods at module level for tracking original implementations.


20-106: LGTM!

The RealtimeTracingState initialization and workflow/agent span management are well-structured. The default model name on line 40 is appropriately documented as being updated from session configuration.


108-195: LGTM!

Tool, handoff, and audio span management properly handles span lifecycle, parent context propagation, and includes appropriate safeguards like content truncation to 4096 characters.


268-290: LGTM!

The cleanup() method comprehensively handles all remaining spans, properly setting status and clearing all tracking dictionaries. The previous concern about llm_spans cleanup has been resolved.


292-307: LGTM on the wrapping guard!

The guard on lines 295-296 properly prevents multiple wrapping from overwriting original methods. This addresses the previous concern about re-instrumentation.


312-342: LGTM on session initialization and model capture!

The traced_aenter wrapper properly starts the workflow span and attempts to capture the actual model name from the session configuration (lines 322-338) with multiple fallback paths. This addresses the previous concern about hardcoded model names.

The try-except-pass pattern (lines 339-340) is appropriate for instrumentation code to prevent failures from breaking the application.


344-360: LGTM on session cleanup!

The traced_aexit wrapper properly performs cleanup and ends the workflow span with appropriate error status. The try-except-pass pattern is suitable for instrumentation resilience.


362-462: LGTM on comprehensive event handling!

The traced_put_event wrapper handles a wide variety of event types (agent_start/end, tool_start/end, handoff, audio, error, history_added, raw_model_event) with proper span lifecycle management and attribute capture. The try-except-pass pattern is appropriate for non-breaking instrumentation.


464-492: LGTM on user input capture!

The traced_send_message wrapper properly captures user input for prompt recording, handling both string and structured message formats. The try-except-pass pattern maintains instrumentation resilience.


501-517: LGTM on method restoration!

The unwrap_realtime_session() function properly restores all original methods and clears the _original_methods dictionary. Note that this doesn't clear _tracing_states (which is in the closure), but that will be resolved when the module-level refactoring is applied.

if hasattr(RealtimeSession, 'send_message'):
_original_methods['send_message'] = RealtimeSession.send_message

_tracing_states: Dict[int, RealtimeTracingState] = {}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Memory leak risk: _tracing_states should be at module level.

The _tracing_states dictionary is defined inside wrap_realtime_session() and captured in the closure. Across wrap/unwrap/wrap cycles, old dictionaries may not be garbage collected if lingering references exist from previous wrappers.

Move _tracing_states to module level (like _original_methods) and explicitly clear it in unwrap_realtime_session().

♻️ Proposed fix

At module level (after line 17):

 _original_methods: Dict[str, Any] = {}
+_tracing_states: Dict[int, 'RealtimeTracingState'] = {}

Remove from wrap_realtime_session() (line 309):

     if hasattr(RealtimeSession, 'send_message'):
         _original_methods['send_message'] = RealtimeSession.send_message
 
-    _tracing_states: Dict[int, RealtimeTracingState] = {}
-
     @dont_throw

Add cleanup in unwrap_realtime_session():

     if 'send_message' in _original_methods:
         RealtimeSession.send_message = _original_methods['send_message']
 
     _original_methods.clear()
+    _tracing_states.clear()

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In
@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py
at line 309, _tracing_states is currently defined inside wrap_realtime_session()
causing a closure and potential memory leaks; move the _tracing_states:
Dict[int, RealtimeTracingState] = {} declaration to module level alongside
_original_methods, remove its local declaration inside wrap_realtime_session(),
and update unwrap_realtime_session() to explicitly clear _tracing_states (e.g.,
_tracing_states.clear()) when restoring originals so old state dictionaries can
be garbage collected.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
packages/sample-app/sample_app/openai_agents_realtime_example.py (1)

284-336: Safe AST-based evaluation implementation is excellent.

The calculate function correctly implements AST-based parsing to safely evaluate math expressions, preventing the security issues of eval(). The depth limit and operator restrictions effectively prevent resource exhaustion attacks.

💡 Optional: Use TypeError for unsupported expression types

For line 328, consider using TypeError instead of ValueError for unsupported expression types, as suggested by Ruff:

-            else:
-                raise ValueError(f"Unsupported expression type: {type(node).__name__}")
+            else:
+                raise TypeError(f"Unsupported expression type: {type(node).__name__}")

This more accurately reflects that the error is about an inappropriate type rather than an inappropriate value.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 5325e3f and b3bc328.

📒 Files selected for processing (1)
  • packages/sample-app/sample_app/openai_agents_realtime_example.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules

Files:

  • packages/sample-app/sample_app/openai_agents_realtime_example.py
🧠 Learnings (1)
📓 Common learnings
Learnt from: duanyutong
Repo: traceloop/openllmetry PR: 3487
File: packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/utils.py:177-178
Timestamp: 2025-12-02T21:09:48.690Z
Learning: The opentelemetry-instrumentation-openai and opentelemetry-instrumentation-openai-agents packages must remain independent and not share code, so code duplication between them is acceptable.
🪛 Ruff (0.14.10)
packages/sample-app/sample_app/openai_agents_realtime_example.py

231-231: Do not catch blind exception: Exception

(BLE001)


306-306: Avoid specifying long messages outside the exception class

(TRY003)


311-311: Avoid specifying long messages outside the exception class

(TRY003)


316-316: Avoid specifying long messages outside the exception class

(TRY003)


322-322: Avoid specifying long messages outside the exception class

(TRY003)


328-328: Prefer TypeError exception for invalid type

(TRY004)


328-328: Avoid specifying long messages outside the exception class

(TRY003)


333-333: Consider moving this statement to an else block

(TRY300)


462-462: Do not catch blind exception: Exception

(BLE001)


615-615: Do not catch blind exception: Exception

(BLE001)


629-629: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


634-634: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Test Packages (3.11)
  • GitHub Check: Build Packages (3.11)
  • GitHub Check: Test Packages (3.10)
  • GitHub Check: Lint
🔇 Additional comments (4)
packages/sample-app/sample_app/openai_agents_realtime_example.py (4)

1-41: LGTM!

The module documentation is comprehensive, imports are appropriate, and initialization follows best practices. API key is correctly loaded from environment variables rather than hardcoded.


43-243: LGTM!

The voice demo implementation is well-structured with proper async context management, defensive event handling using getattr(), and appropriate error messages. The broad exception catch at line 231 (flagged by Ruff) is acceptable for a sample application where user-friendly error messages are prioritized over granular error handling.


475-622: LGTM!

The handoff demo effectively demonstrates agent handoffs in realtime sessions. The agent hierarchy is well-structured with specialized agents and proper handoff configuration using realtime_handoff(). Event handling appropriately captures handoff events.


653-684: LGTM!

The main execution block provides a clean command-line interface with clear help messages, requirement checking, and appropriate error handling. The demo type selection is well-structured and user-friendly.

if tool_name in self.tool_spans:
span = self.tool_spans[tool_name]
if output is not None:
span.set_attribute("gen_ai.tool.output", str(output)[:4096])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dont we have a const for this?

Comment on lines +160 to +161
"gen_ai.handoff.from_agent": from_agent,
"gen_ai.handoff.to_agent": to_agent,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets create consts for this

Copy link
Member

@doronkopit5 doronkopit5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guess you didnt succeed cassetes because of the sockets?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants