feat(sse): add progress events for enrichment nodes#264
Conversation
… Code Analysis) Users can now see real-time progress during the enrichment phase: - RAG context enrichment - Web search grounding - Code analysis Added ENRICHMENT_START/COMPLETE/ERROR event types to EventType enum and emit events at start, completion, and error states for all three enrichment nodes.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Summary of ChangesHello @ComBba, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the user experience by making the previously opaque enrichment stages of the system transparent. By integrating Server-Sent Events, users will now receive real-time progress notifications as RAG, Web Search, and Code Analysis operations proceed, providing immediate feedback and improving visibility into the system's workflow. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 워크스루세 개의 데이터 풍부화 노드(코드 분석, RAG, 웹 검색)에 이벤트 채널 통합을 추가합니다. 각 노드는 평가 시작, 완료(캐시됨/건너뜀), 오류 상태를 추적하는 이벤트를 발행합니다. 이벤트 타입 열거형은 세 가지 새로운 문제 종류로 확장됩니다. 변경 사항
코드 검토 예상 소요 시간🎯 2 (Simple) | ⏱️ ~12 분 시
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
There was a problem hiding this comment.
Code Review
This pull request successfully adds real-time progress events for the enrichment phase (RAG, Web Search, Code Analysis), which is a great improvement for user experience.
However, my review has identified a critical security vulnerability related to leaking sensitive information (like GitHub tokens) in error messages. I've also pointed out significant code duplication across the new event-emitting logic in all three enrichment nodes.
Addressing the security issue is paramount. Refactoring the duplicated code will improve the long-term maintainability of this new feature. Please see the detailed comments for suggestions.
| sommelier="code_analysis", | ||
| event_type="enrichment_error", | ||
| progress_percent=100, | ||
| message=f"Code analysis failed: {e}", |
There was a problem hiding this comment.
This line introduces a critical security vulnerability. Including the raw exception e in the SSE event message can leak sensitive information to the client. Specifically, if clone_and_analyze fails with a subprocess.TimeoutExpired exception, the exception's string representation will include the git clone command, which contains the GitHub access token in the URL. The detailed exception is already logged via logger.exception, so a generic message should be sent to the client.
| message=f"Code analysis failed: {e}", | |
| message="Code analysis failed due to an internal error.", |
| sommelier="rag", | ||
| event_type="enrichment_error", | ||
| progress_percent=100, | ||
| message=f"RAG enrichment failed: {e}", |
There was a problem hiding this comment.
Exposing raw exception messages to the client is a security risk as it can lead to information leakage about the application's internal workings. While less critical than in code_analysis_enrich, as it's less likely to contain secrets, it's still a bad practice. A generic error message should be sent to the client, while the detailed error is logged for debugging.
| message=f"RAG enrichment failed: {e}", | |
| message="RAG enrichment failed due to an internal error.", |
| sommelier="web_search", | ||
| event_type="enrichment_error", | ||
| progress_percent=100, | ||
| message=f"Web search failed: {e}", |
There was a problem hiding this comment.
Exposing raw exception messages to the client is a security risk as it can lead to information leakage about the application's internal workings. It's a bad practice to send potentially verbose or sensitive exception details in a user-facing message. A generic error message should be sent to the client, while the detailed error is logged for debugging.
| message=f"Web search failed: {e}", | |
| message="Web search failed due to an internal error.", |
| if evaluation_id: | ||
| event_channel.emit_sync( | ||
| evaluation_id, | ||
| create_sommelier_event( | ||
| evaluation_id=evaluation_id, | ||
| sommelier="code_analysis", | ||
| event_type="enrichment_start", | ||
| progress_percent=0, | ||
| message="Code analysis starting...", | ||
| ), | ||
| ) |
There was a problem hiding this comment.
There is significant code duplication in how SSE events are emitted. This block, and four other similar blocks in this file, repeat the if evaluation_id: check and the call to event_channel.emit_sync. This pattern is also present in rag_enrich.py and web_search_enrich.py.
To improve maintainability and reduce redundancy, consider refactoring this logic into a local helper function within code_analysis_enrich. For example:
def _emit_event(event_type: str, progress: int, message: str):
if evaluation_id:
event = create_sommelier_event(
evaluation_id=evaluation_id,
sommelier="code_analysis",
event_type=event_type,
progress_percent=progress,
message=message,
)
event_channel.emit_sync(evaluation_id, event)
# Then you can call it like this:
_emit_event("enrichment_start", 0, "Code analysis starting...")| if evaluation_id: | ||
| event_channel.emit_sync( | ||
| evaluation_id, | ||
| create_sommelier_event( | ||
| evaluation_id=evaluation_id, | ||
| sommelier="rag", | ||
| event_type="enrichment_start", | ||
| progress_percent=0, | ||
| message="RAG context enrichment starting...", | ||
| ), | ||
| ) |
There was a problem hiding this comment.
Similar to other enrichment nodes in this PR, there's significant code duplication in the event-emitting logic. The if evaluation_id: ... pattern is repeated multiple times.
To improve maintainability, this could be refactored into a helper function that encapsulates creating and emitting the event. This would make the main function body cleaner and easier to read.
Example helper:
def _emit_event(event_type: str, progress: int, message: str):
if evaluation_id:
event = create_sommelier_event(
evaluation_id=evaluation_id,
sommelier="rag",
event_type=event_type,
progress_percent=progress,
message=message,
)
event_channel.emit_sync(evaluation_id, event)| if evaluation_id: | ||
| event_channel.emit_sync( | ||
| evaluation_id, | ||
| create_sommelier_event( | ||
| evaluation_id=evaluation_id, | ||
| sommelier="web_search", | ||
| event_type="enrichment_start", | ||
| progress_percent=0, | ||
| message="Web search enrichment starting...", | ||
| ), | ||
| ) |
There was a problem hiding this comment.
This block for emitting an SSE event is repeated multiple times throughout the function, with slight variations. This pattern of if evaluation_id: event_channel.emit_sync(...) is also duplicated in the other enrichment nodes (rag_enrich.py, code_analysis_enrich.py).
Consider creating a small helper function to handle event creation and emission. This will reduce code duplication and make the logic more maintainable.
Example helper:
def _emit_event(event_type: str, progress: int, message: str):
if evaluation_id:
event = create_sommelier_event(
evaluation_id=evaluation_id,
sommelier="web_search",
event_type=event_type,
progress_percent=progress,
message=message,
)
event_channel.emit_sync(evaluation_id, event)There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
backend/app/graph/nodes/rag_enrich.py (1)
164-169:⚠️ Potential issue | 🔴 Critical빈 문서 경로에서 이벤트가 누락되어 클라이언트가 "중단" 상태로 남습니다.
docs가 비어 있을 때 (Line 166-169)enrichment_start는 이미 발행되었지만enrichment_complete나enrichment_error가 발행되지 않고 바로 반환됩니다. SSE 구독자 관점에서 RAG enrichment가 영원히 진행 중인 것처럼 보이게 됩니다.제안된 수정
docs = _build_documents_from_context(repo_context) if not docs: + if evaluation_id: + event_channel.emit_sync( + evaluation_id, + create_sommelier_event( + evaluation_id=evaluation_id, + sommelier="rag", + event_type="enrichment_complete", + progress_percent=100, + message="RAG enrichment complete (no documents)", + ), + ) return { "rag_context": {"query": query, "chunks": [], "error": None}, }
🤖 Fix all issues with AI agents
In `@backend/app/graph/nodes/code_analysis_enrich.py`:
- Around line 101-112: The event message always says "Code analysis complete"
even when the local variable status can be "skipped" or "partial"; update the
block that emits the sommelier event (event_channel.emit_sync using
create_sommelier_event with evaluation_id and clone_result.main_files) to
inspect the status variable and set a message and progress appropriately (e.g.,
"Code analysis complete", "Code analysis skipped", or "Code analysis partial (X
files)" and corresponding progress_percent) so the SSE payload accurately
reflects status instead of always reporting completion.
In `@backend/app/graph/nodes/web_search_enrich.py`:
- Around line 155-165: Replace the sensitive inline exception message sent to
clients with a generic error string and log the full exception server-side: in
web_search_enrich.py where event_channel.emit_sync(...) creates the sommelier
"enrichment_error" event (using create_sommelier_event and evaluation_id),
change the message payload to something non-sensitive like "Web search failed"
and ensure you call the server logger (e.g., processLogger or the module logger)
to record the full exception details and stacktrace; apply the same fix pattern
to the analogous occurrences in rag_enrich.py (around the create_sommelier_event
call) and code_analysis_enrich.py so client SSE events never contain raw
exception text while full errors remain in server logs.
In `@backend/app/services/event_channel.py`:
- Around line 69-72: Add the new ENRICHMENT_ERROR event to the
CRITICAL_EVENT_TYPES set so it is treated like sommelier_error and
technique_error; locate the CRITICAL_EVENT_TYPES definition and include
ENRICHMENT_ERROR (matching the constant name) to ensure emit_sync will not use
put_nowait for enrichment_error events.
🧹 Nitpick comments (2)
backend/app/graph/nodes/rag_enrich.py (2)
209-219: 에러 이벤트에서progress_percent=100은 의미적으로 혼란스럽습니다.에러 발생 시
progress_percent=100을 보내면 클라이언트가 "성공적으로 완료"로 오해할 수 있습니다. 이 패턴이 세 파일 모두에서 동일하게 사용되고 있어 의도적인 것으로 보이지만, 에러 상태에서는-1또는 실패 시점의 진행률을 보내는 것이 더 명확합니다.
105-118: 이벤트 발행 패턴이 세 enrichment 노드에서 반복됩니다 — 헬퍼 추출을 고려해 보세요.
rag_enrich.py,web_search_enrich.py,code_analysis_enrich.py모두 동일한if evaluation_id: emit_sync(create_sommelier_event(...))패턴을 start/complete/error/cached/skipped 경로에서 반복하고 있습니다. 이를 간단한 헬퍼 함수로 추출하면 boilerplate를 줄이고 일관성(예: 이벤트 누락 버그)을 방지할 수 있습니다.예시:
def _emit_enrichment_event( event_channel, evaluation_id: str | None, source: str, event_type: str, progress: int, message: str, ) -> None: if not evaluation_id: return event_channel.emit_sync( evaluation_id, create_sommelier_event( evaluation_id=evaluation_id, sommelier=source, event_type=event_type, progress_percent=progress, message=message, ), )
| if evaluation_id: | ||
| event_channel.emit_sync( | ||
| evaluation_id, | ||
| create_sommelier_event( | ||
| evaluation_id=evaluation_id, | ||
| sommelier="web_search", | ||
| event_type="enrichment_error", | ||
| progress_percent=100, | ||
| message=f"Web search failed: {e}", | ||
| ), | ||
| ) |
There was a problem hiding this comment.
에러 메시지에 민감한 정보가 포함될 수 있습니다.
f"Web search failed: {e}" (Line 163)에서 예외 메시지를 그대로 SSE 이벤트에 포함시키고 있습니다. API 호출 실패 시 예외 메시지에 API 키, 내부 URL, 인증 토큰 등이 포함될 수 있으며, 이는 SSE를 통해 클라이언트에게 직접 전달됩니다. 이 패턴은 rag_enrich.py (Line 217)와 code_analysis_enrich.py (Line 141)에도 동일하게 적용됩니다.
에러 이벤트의 메시지에는 일반화된 문자열을 사용하고, 상세 예외 정보는 서버 측 로그에만 남기는 것이 안전합니다.
🤖 Prompt for AI Agents
In `@backend/app/graph/nodes/web_search_enrich.py` around lines 155 - 165, Replace
the sensitive inline exception message sent to clients with a generic error
string and log the full exception server-side: in web_search_enrich.py where
event_channel.emit_sync(...) creates the sommelier "enrichment_error" event
(using create_sommelier_event and evaluation_id), change the message payload to
something non-sensitive like "Web search failed" and ensure you call the server
logger (e.g., processLogger or the module logger) to record the full exception
details and stacktrace; apply the same fix pattern to the analogous occurrences
in rag_enrich.py (around the create_sommelier_event call) and
code_analysis_enrich.py so client SSE events never contain raw exception text
while full errors remain in server logs.
Fixes based on Gemini and CodeRabbit reviews: 1. [CRITICAL] Add 'enrichment_error' to CRITICAL_EVENT_TYPES - Prevents error events from being silently dropped when buffer full 2. [SECURITY] Remove raw exception from SSE error messages (3 files) - Prevents leaking sensitive info (e.g., GitHub tokens in git clone errors) - Generic 'internal error' message sent to client, full error logged server-side 3. [BUG] Add missing complete event for empty docs case in rag_enrich.py - Previously SSE subscribers would remain stuck in 'in progress' state 4. [IMPROVE] code_analysis_enrich.py: reflect actual status in message - Message now shows 'complete', 'partial', or 'skipped' accurately
Pre-existing issues fixed: - Add extra='ignore' to Settings model_config for backward compatibility with legacy env vars (GEMINI_API_KEY, OPENAI_API_KEY) - Add default empty values to GITHUB_CLIENT_ID and GITHUB_CLIENT_SECRET for testing/development environments These changes allow tests to run without requiring full production config.
…events feat(sse): add progress events for enrichment nodes
Summary
Changes
ENRICHMENT_START,ENRICHMENT_COMPLETE,ENRICHMENT_ERRORtoEventTypeenumrag_enrich.py: Emit start/complete/error eventsweb_search_enrich.py: Emit start/complete/error eventscode_analysis_enrich.py: Emit start/complete/error eventsArchitecture Impact
Testing
Summary by CodeRabbit
개선 사항