Skip to content

feat(sse): add progress events for enrichment nodes#264

Merged
ComBba merged 3 commits intomainfrom
feat/enrichment-progress-events
Feb 9, 2026
Merged

feat(sse): add progress events for enrichment nodes#264
ComBba merged 3 commits intomainfrom
feat/enrichment-progress-events

Conversation

@ComBba
Copy link
Contributor

@ComBba ComBba commented Feb 9, 2026

Summary

  • Add SSE progress events for enrichment phase (RAG, Web Search, Code Analysis)
  • Users can now see real-time progress during the previously invisible enrichment phase

Changes

  • Added ENRICHMENT_START, ENRICHMENT_COMPLETE, ENRICHMENT_ERROR to EventType enum
  • rag_enrich.py: Emit start/complete/error events
  • web_search_enrich.py: Emit start/complete/error events
  • code_analysis_enrich.py: Emit start/complete/error events

Architecture Impact

Stage 1 (병렬): RAG + Web Search + Code Analysis  ← NOW HAS PROGRESS EVENTS
        ↓
Stage 2 (병렬): 5 Sommeliers                       
        ↓
Stage 3: Jean-Pierre synthesis                     

Testing

  • All LSP diagnostics clean on modified files
  • No breaking changes to existing event handling

Summary by CodeRabbit

개선 사항

  • 엔리치먼트 프로세스의 실시간 진행 상황 추적이 추가되었습니다
  • 캐시된 결과, 처리 완료, 건너뜀, 오류 상황을 포함한 상세한 이벤트 정보를 제공합니다
  • 코드 분석, RAG, 웹 검색의 각 단계별 처리 현황과 메트릭(처리된 파일 수, 청크 수, 발견된 소스 수)을 실시간으로 모니터링할 수 있습니다

… Code Analysis)

Users can now see real-time progress during the enrichment phase:
- RAG context enrichment
- Web search grounding
- Code analysis

Added ENRICHMENT_START/COMPLETE/ERROR event types to EventType enum
and emit events at start, completion, and error states for all three
enrichment nodes.
@vercel
Copy link

vercel bot commented Feb 9, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
somm-dev Ready Ready Preview, Comment Feb 9, 2026 3:43pm

Request Review

@gemini-code-assist
Copy link

Summary of Changes

Hello @ComBba, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the user experience by making the previously opaque enrichment stages of the system transparent. By integrating Server-Sent Events, users will now receive real-time progress notifications as RAG, Web Search, and Code Analysis operations proceed, providing immediate feedback and improving visibility into the system's workflow.

Highlights

  • Real-time Progress Events: Introduced Server-Sent Events (SSE) to provide real-time progress updates for the enrichment phase, specifically for RAG, Web Search, and Code Analysis nodes.
  • New Event Types: Added ENRICHMENT_START, ENRICHMENT_COMPLETE, and ENRICHMENT_ERROR to the EventType enum to categorize these new progress events.
  • Event Emission: Implemented event emission logic within rag_enrich.py, web_search_enrich.py, and code_analysis_enrich.py to send start, complete, and error events at appropriate stages, including when results are cached or skipped.
Changelog
  • backend/app/graph/nodes/code_analysis_enrich.py
    • Imported event channel utilities (create_sommelier_event, get_event_channel).
    • Added logic to emit enrichment_start event at the beginning of the code_analysis_enrich function.
    • Added logic to emit enrichment_complete events when code analysis results are cached, when the analysis is skipped due to a missing repository URL, and upon successful completion.
    • Added logic to emit an enrichment_error event if an exception occurs during code analysis.
  • backend/app/graph/nodes/rag_enrich.py
    • Imported event channel utilities (create_sommelier_event, get_event_channel).
    • Added logic to emit enrichment_start event at the beginning of the rag_enrich function.
    • Added logic to emit enrichment_complete events when RAG context is cached, when RAG is skipped due to a missing API key, and upon successful completion.
    • Added logic to emit an enrichment_error event if an exception occurs during RAG embedding.
  • backend/app/graph/nodes/web_search_enrich.py
    • Imported event channel utilities (create_sommelier_event, get_event_channel).
    • Added logic to emit enrichment_start event at the beginning of the web_search_enrich function.
    • Added logic to emit enrichment_complete events when web search context is cached, when web search is skipped due to a missing API key, and upon successful completion.
    • Added logic to emit an enrichment_error event if an exception occurs during web search grounding.
  • backend/app/services/event_channel.py
    • Added new enum members ENRICHMENT_START, ENRICHMENT_COMPLETE, and ENRICHMENT_ERROR to the EventType class.
Activity
  • No human activity (comments, reviews, etc.) has been recorded for this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@coderabbitai
Copy link

coderabbitai bot commented Feb 9, 2026

Warning

Rate limit exceeded

@ComBba has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 11 minutes and 30 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

워크스루

세 개의 데이터 풍부화 노드(코드 분석, RAG, 웹 검색)에 이벤트 채널 통합을 추가합니다. 각 노드는 평가 시작, 완료(캐시됨/건너뜀), 오류 상태를 추적하는 이벤트를 발행합니다. 이벤트 타입 열거형은 세 가지 새로운 문제 종류로 확장됩니다.

변경 사항

코호트 / 파일(들) 요약
풍부화 노드 이벤트 통합
backend/app/graph/nodes/code_analysis_enrich.py, backend/app/graph/nodes/rag_enrich.py, backend/app/graph/nodes/web_search_enrich.py
각 노드에 이벤트 채널 유틸리티 임포트를 추가하고, 평가 ID를 기반으로 시작, 캐시됨, 건너뜀, 완료, 오류 이벤트를 조건부로 발행합니다. 기본 풍부화 로직은 변경 없음.
이벤트 타입 확장
backend/app/services/event_channel.py
EventType 열거형에 ENRICHMENT_START, ENRICHMENT_COMPLETE, ENRICHMENT_ERROR 세 가지 새로운 문제 종류 추가.

코드 검토 예상 소요 시간

🎯 2 (Simple) | ⏱️ ~12 분

🐰 세 갈래 길에 이벤트의 종을 울리니,
캐시는 속삭이고, 실패는 울부짖네,
풍부함의 여정을 추적하는 새로운 음성,
시작부터 완성까지 모든 이야기를 담아,
산책로에 선한 빛이 흐르네! 🌟

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed 풀 요청 제목이 변경 사항의 핵심을 명확하게 요약합니다. 세 개의 엔리치먼트 노드에 SSE 진행 이벤트를 추가하는 주요 변경 사항을 정확하게 설명합니다.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/enrichment-progress-events

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@chatgpt-codex-connector
Copy link

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request successfully adds real-time progress events for the enrichment phase (RAG, Web Search, Code Analysis), which is a great improvement for user experience.

However, my review has identified a critical security vulnerability related to leaking sensitive information (like GitHub tokens) in error messages. I've also pointed out significant code duplication across the new event-emitting logic in all three enrichment nodes.

Addressing the security issue is paramount. Refactoring the duplicated code will improve the long-term maintainability of this new feature. Please see the detailed comments for suggestions.

sommelier="code_analysis",
event_type="enrichment_error",
progress_percent=100,
message=f"Code analysis failed: {e}",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This line introduces a critical security vulnerability. Including the raw exception e in the SSE event message can leak sensitive information to the client. Specifically, if clone_and_analyze fails with a subprocess.TimeoutExpired exception, the exception's string representation will include the git clone command, which contains the GitHub access token in the URL. The detailed exception is already logged via logger.exception, so a generic message should be sent to the client.

Suggested change
message=f"Code analysis failed: {e}",
message="Code analysis failed due to an internal error.",

sommelier="rag",
event_type="enrichment_error",
progress_percent=100,
message=f"RAG enrichment failed: {e}",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Exposing raw exception messages to the client is a security risk as it can lead to information leakage about the application's internal workings. While less critical than in code_analysis_enrich, as it's less likely to contain secrets, it's still a bad practice. A generic error message should be sent to the client, while the detailed error is logged for debugging.

Suggested change
message=f"RAG enrichment failed: {e}",
message="RAG enrichment failed due to an internal error.",

sommelier="web_search",
event_type="enrichment_error",
progress_percent=100,
message=f"Web search failed: {e}",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Exposing raw exception messages to the client is a security risk as it can lead to information leakage about the application's internal workings. It's a bad practice to send potentially verbose or sensitive exception details in a user-facing message. A generic error message should be sent to the client, while the detailed error is logged for debugging.

Suggested change
message=f"Web search failed: {e}",
message="Web search failed due to an internal error.",

Comment on lines +21 to +31
if evaluation_id:
event_channel.emit_sync(
evaluation_id,
create_sommelier_event(
evaluation_id=evaluation_id,
sommelier="code_analysis",
event_type="enrichment_start",
progress_percent=0,
message="Code analysis starting...",
),
)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There is significant code duplication in how SSE events are emitted. This block, and four other similar blocks in this file, repeat the if evaluation_id: check and the call to event_channel.emit_sync. This pattern is also present in rag_enrich.py and web_search_enrich.py.

To improve maintainability and reduce redundancy, consider refactoring this logic into a local helper function within code_analysis_enrich. For example:

def _emit_event(event_type: str, progress: int, message: str):
    if evaluation_id:
        event = create_sommelier_event(
            evaluation_id=evaluation_id,
            sommelier="code_analysis",
            event_type=event_type,
            progress_percent=progress,
            message=message,
        )
        event_channel.emit_sync(evaluation_id, event)

# Then you can call it like this:
_emit_event("enrichment_start", 0, "Code analysis starting...")

Comment on lines +108 to +118
if evaluation_id:
event_channel.emit_sync(
evaluation_id,
create_sommelier_event(
evaluation_id=evaluation_id,
sommelier="rag",
event_type="enrichment_start",
progress_percent=0,
message="RAG context enrichment starting...",
),
)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to other enrichment nodes in this PR, there's significant code duplication in the event-emitting logic. The if evaluation_id: ... pattern is repeated multiple times.

To improve maintainability, this could be refactored into a helper function that encapsulates creating and emitting the event. This would make the main function body cleaner and easier to read.

Example helper:

def _emit_event(event_type: str, progress: int, message: str):
    if evaluation_id:
        event = create_sommelier_event(
            evaluation_id=evaluation_id,
            sommelier="rag",
            event_type=event_type,
            progress_percent=progress,
            message=message,
        )
        event_channel.emit_sync(evaluation_id, event)

Comment on lines +30 to +40
if evaluation_id:
event_channel.emit_sync(
evaluation_id,
create_sommelier_event(
evaluation_id=evaluation_id,
sommelier="web_search",
event_type="enrichment_start",
progress_percent=0,
message="Web search enrichment starting...",
),
)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This block for emitting an SSE event is repeated multiple times throughout the function, with slight variations. This pattern of if evaluation_id: event_channel.emit_sync(...) is also duplicated in the other enrichment nodes (rag_enrich.py, code_analysis_enrich.py).

Consider creating a small helper function to handle event creation and emission. This will reduce code duplication and make the logic more maintainable.

Example helper:

def _emit_event(event_type: str, progress: int, message: str):
    if evaluation_id:
        event = create_sommelier_event(
            evaluation_id=evaluation_id,
            sommelier="web_search",
            event_type=event_type,
            progress_percent=progress,
            message=message,
        )
        event_channel.emit_sync(evaluation_id, event)

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
backend/app/graph/nodes/rag_enrich.py (1)

164-169: ⚠️ Potential issue | 🔴 Critical

빈 문서 경로에서 이벤트가 누락되어 클라이언트가 "중단" 상태로 남습니다.

docs가 비어 있을 때 (Line 166-169) enrichment_start는 이미 발행되었지만 enrichment_completeenrichment_error가 발행되지 않고 바로 반환됩니다. SSE 구독자 관점에서 RAG enrichment가 영원히 진행 중인 것처럼 보이게 됩니다.

제안된 수정
         docs = _build_documents_from_context(repo_context)
         if not docs:
+            if evaluation_id:
+                event_channel.emit_sync(
+                    evaluation_id,
+                    create_sommelier_event(
+                        evaluation_id=evaluation_id,
+                        sommelier="rag",
+                        event_type="enrichment_complete",
+                        progress_percent=100,
+                        message="RAG enrichment complete (no documents)",
+                    ),
+                )
             return {
                 "rag_context": {"query": query, "chunks": [], "error": None},
             }
🤖 Fix all issues with AI agents
In `@backend/app/graph/nodes/code_analysis_enrich.py`:
- Around line 101-112: The event message always says "Code analysis complete"
even when the local variable status can be "skipped" or "partial"; update the
block that emits the sommelier event (event_channel.emit_sync using
create_sommelier_event with evaluation_id and clone_result.main_files) to
inspect the status variable and set a message and progress appropriately (e.g.,
"Code analysis complete", "Code analysis skipped", or "Code analysis partial (X
files)" and corresponding progress_percent) so the SSE payload accurately
reflects status instead of always reporting completion.

In `@backend/app/graph/nodes/web_search_enrich.py`:
- Around line 155-165: Replace the sensitive inline exception message sent to
clients with a generic error string and log the full exception server-side: in
web_search_enrich.py where event_channel.emit_sync(...) creates the sommelier
"enrichment_error" event (using create_sommelier_event and evaluation_id),
change the message payload to something non-sensitive like "Web search failed"
and ensure you call the server logger (e.g., processLogger or the module logger)
to record the full exception details and stacktrace; apply the same fix pattern
to the analogous occurrences in rag_enrich.py (around the create_sommelier_event
call) and code_analysis_enrich.py so client SSE events never contain raw
exception text while full errors remain in server logs.

In `@backend/app/services/event_channel.py`:
- Around line 69-72: Add the new ENRICHMENT_ERROR event to the
CRITICAL_EVENT_TYPES set so it is treated like sommelier_error and
technique_error; locate the CRITICAL_EVENT_TYPES definition and include
ENRICHMENT_ERROR (matching the constant name) to ensure emit_sync will not use
put_nowait for enrichment_error events.
🧹 Nitpick comments (2)
backend/app/graph/nodes/rag_enrich.py (2)

209-219: 에러 이벤트에서 progress_percent=100은 의미적으로 혼란스럽습니다.

에러 발생 시 progress_percent=100을 보내면 클라이언트가 "성공적으로 완료"로 오해할 수 있습니다. 이 패턴이 세 파일 모두에서 동일하게 사용되고 있어 의도적인 것으로 보이지만, 에러 상태에서는 -1 또는 실패 시점의 진행률을 보내는 것이 더 명확합니다.


105-118: 이벤트 발행 패턴이 세 enrichment 노드에서 반복됩니다 — 헬퍼 추출을 고려해 보세요.

rag_enrich.py, web_search_enrich.py, code_analysis_enrich.py 모두 동일한 if evaluation_id: emit_sync(create_sommelier_event(...)) 패턴을 start/complete/error/cached/skipped 경로에서 반복하고 있습니다. 이를 간단한 헬퍼 함수로 추출하면 boilerplate를 줄이고 일관성(예: 이벤트 누락 버그)을 방지할 수 있습니다.

예시:

def _emit_enrichment_event(
    event_channel, evaluation_id: str | None, source: str,
    event_type: str, progress: int, message: str,
) -> None:
    if not evaluation_id:
        return
    event_channel.emit_sync(
        evaluation_id,
        create_sommelier_event(
            evaluation_id=evaluation_id,
            sommelier=source,
            event_type=event_type,
            progress_percent=progress,
            message=message,
        ),
    )

Comment on lines +155 to +165
if evaluation_id:
event_channel.emit_sync(
evaluation_id,
create_sommelier_event(
evaluation_id=evaluation_id,
sommelier="web_search",
event_type="enrichment_error",
progress_percent=100,
message=f"Web search failed: {e}",
),
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

에러 메시지에 민감한 정보가 포함될 수 있습니다.

f"Web search failed: {e}" (Line 163)에서 예외 메시지를 그대로 SSE 이벤트에 포함시키고 있습니다. API 호출 실패 시 예외 메시지에 API 키, 내부 URL, 인증 토큰 등이 포함될 수 있으며, 이는 SSE를 통해 클라이언트에게 직접 전달됩니다. 이 패턴은 rag_enrich.py (Line 217)와 code_analysis_enrich.py (Line 141)에도 동일하게 적용됩니다.

에러 이벤트의 메시지에는 일반화된 문자열을 사용하고, 상세 예외 정보는 서버 측 로그에만 남기는 것이 안전합니다.

🤖 Prompt for AI Agents
In `@backend/app/graph/nodes/web_search_enrich.py` around lines 155 - 165, Replace
the sensitive inline exception message sent to clients with a generic error
string and log the full exception server-side: in web_search_enrich.py where
event_channel.emit_sync(...) creates the sommelier "enrichment_error" event
(using create_sommelier_event and evaluation_id), change the message payload to
something non-sensitive like "Web search failed" and ensure you call the server
logger (e.g., processLogger or the module logger) to record the full exception
details and stacktrace; apply the same fix pattern to the analogous occurrences
in rag_enrich.py (around the create_sommelier_event call) and
code_analysis_enrich.py so client SSE events never contain raw exception text
while full errors remain in server logs.

Fixes based on Gemini and CodeRabbit reviews:

1. [CRITICAL] Add 'enrichment_error' to CRITICAL_EVENT_TYPES
   - Prevents error events from being silently dropped when buffer full

2. [SECURITY] Remove raw exception from SSE error messages (3 files)
   - Prevents leaking sensitive info (e.g., GitHub tokens in git clone errors)
   - Generic 'internal error' message sent to client, full error logged server-side

3. [BUG] Add missing complete event for empty docs case in rag_enrich.py
   - Previously SSE subscribers would remain stuck in 'in progress' state

4. [IMPROVE] code_analysis_enrich.py: reflect actual status in message
   - Message now shows 'complete', 'partial', or 'skipped' accurately
Pre-existing issues fixed:
- Add extra='ignore' to Settings model_config for backward compatibility
  with legacy env vars (GEMINI_API_KEY, OPENAI_API_KEY)
- Add default empty values to GITHUB_CLIENT_ID and GITHUB_CLIENT_SECRET
  for testing/development environments

These changes allow tests to run without requiring full production config.
@ComBba ComBba merged commit cd6947b into main Feb 9, 2026
4 of 5 checks passed
@ComBba ComBba deleted the feat/enrichment-progress-events branch February 9, 2026 15:50
ComBba added a commit that referenced this pull request Feb 10, 2026
…events

feat(sse): add progress events for enrichment nodes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant