-
Notifications
You must be signed in to change notification settings - Fork 189
Fix testcase generation message flooding by using internalMessages #3950
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix testcase generation message flooding by using internalMessages #3950
Conversation
- Change testcaseAnnotation from messages to internalMessages to avoid auto-concat to parent graph - Update generateTestcaseNode to process stream directly without streamLLMResponse to prevent SSE message dispatch - Add start message in continueToRequirements to notify processing once - Update routeAfterGenerate and test files to use internalMessages - Change start message to English to comply with linting rules This implements the simple approach specified in issue #5645 to control message output during parallel testcase generation. Co-Authored-By: hirotaka.miyagi@route06.co.jp <h.miyagi.cnw@gmail.com>
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
|
Finished running flow.
|
||||||||||||
|
The latest updates on your projects. Learn more about Vercel for GitHub.
2 Skipped Deployments
|
|
Updates to Preview Branch (devin/1761878005-testcase-generation-message-control) ↗︎
Tasks are run on every commit but only new migration files are pushed.
View logs for this Workflow Run ↗︎. |
|
Important Review skippedBot user detected. To trigger a single review, invoke the You can disable this status message by setting the Note Free review on us!CodeRabbit is offering free reviews until Thu Nov 06 2025 to showcase some of the refinements we've made. Comment |
Check changeset necessityStatus: NOT REQUIRED Reason:
Changeset (copy & paste):N/A – no changeset required because only "@liam-hq/agent" (ignored) is affected. |
- Add MessagesAnnotation.spec back to testcaseAnnotation for ToolNode compatibility - Keep internalMessages for actual message storage to prevent auto-concat to parent - Update saveToolNode to map between internalMessages and messages for ToolNode - Revert continueToRequirements to synchronous function (fire-and-forget dispatch) - Update distributeRequirements to pass both messages and internalMessages (empty) - Update test files to include both fields This fixes the agent-deep-modeling CI failure while maintaining the message flood prevention. Co-Authored-By: hirotaka.miyagi@route06.co.jp <h.miyagi.cnw@gmail.com>
🤖 Agent Deep Modeling ExecutionStarted at: 2025-11-05 03:32:31 UTC View DetailsCommand Output
RUN v3.2.4 /home/runner/work/liam/liam/frontend/internal-packages/agent (node:7691) ExperimentalWarning: WASI is an experimental feature and might change at any time ✅ [INFO] 2025-11-05T03:32:34.761Z 🧠 Thinking (pm)... Analyzing requirements for PM Agent I'm tasked with analyzing requirements and creating a structured Business Requirements Document (BRD). My first step is to create a concise checklist of 3-7 items to clarify what I will do. After that, I need to review the user's input, which is quite simple: "Design a simple user management system." I'll fill in assumptions and detail requirements for the Database Agent and QA Agent. When it comes time to save, I must ensure the test case categories are well-defined, with both positive and negative scenarios included. Structuring database design requirements I need to ensure that the Database Agent can design the database by including entities, attributes, relationships, and constraints. My primary task is to create the BRD, but the saving tool only captures the goal and test cases. The current schema structure is empty, so I need to craft a checklist of 3-7 conceptual bullet points and outline various sections for the BRD, like Scope/Objectives, Assumptions, Functional Requirements, and Acceptance Criteria. I must organize everything clearly, focusing on database design while preparing to save everything correctly. Drafting test cases and entities I need to list test cases grouped into categories, keeping in mind that the saving tool requires an object. I want to keep everything comprehensive yet concise. First, for the scope, I’ll outline the simple user management features like registration, email verification, login/logout, role assignment, password resets, and profile management. I’ll assume it’s a single-tenant application for simplicity. Next, I’ll detail the entities:
Deciding on roles and permissions For the simple system, I’m considering including roles and optional permissions to assist the Database Agent. I want to keep it minimal, so I’ll include roles and a basic structure for role permissions, which include keys and descriptions. While I'm concerned that adding permissions might complicate things, clarity for QA and database design is crucial. So, I’ll opt to include both roles and permissions but label the permissions as an optional extension. Also, regarding sessions, they’re necessary for authentication records, but only if we want persistent sessions in the database. Structuring requirements and test cases Since the Database Agent will design the system, I can include a sessions table for token management, especially if implementing JWT. We’ll provide functionality like login and password resets with tokens, including password_reset_tokens and email_verification_tokens. Next, I’ll outline audit logs, constraints for unique fields, and business rules — like setting user status during registration and implementing soft deletes. For GDPR compliance, I’ll ensure minimal storage of PII, while addressing non-functional needs like scalability and data retention. For test case categories, I'll aim for about 32 cases grouped into areas like Registration & Verification, Authentication and Lockout, and more, ensuring a good balance of positive and negative scenarios. I'll craft test case titles for detailed scenarios, focusing on typical operations and constraints. Drafting test cases and document structure I’m generating test cases for several key functionalities. For email verification, I'll create the token upon registration and ensure users can activate their accounts within token validity. For authentication, I'll track successful and failed login attempts while managing account locks after repeated failures. In roles and permissions, I’ll define tests for role creation, assignment, and restrictions on removing roles assigned to users. Profile management will focus on updating user data while ensuring format validity. For account status, I’ll include tests for suspending and deleting users. Password reset scenarios will cover token generation and validation, and audit logging will ensure entries are correctly created and filtered. Finally, I’ll outline integrity constraints to maintain data quality in the BRD. I’ll ensure the test cases are structured into categories for clarity before calling the processAnalyzedRequirementsTool. ⏺ (pm): Checklist:
Business Requirements Document (BRD)
Saving analyzed requirements: goal summary and categorized testcases with titles and SQL operation types. 🔧 processAnalyzedRequirementsTool() ⎿ processAnalyzedRequirementsTool: Requirements processed and streamed successfully 🧠 Thinking (db)... Designing database table structure I need to use a tool to create tables for users, roles, user_roles, email verification tokens, and more. I should clearly state the purpose and minimal inputs before making the call. It's essential to define enums for user status like 'pending_verification' and include constraints such as unique checks on email and username, as well as formatting for phone numbers. I also have to consider the foreign key constraints and how they affect deleting roles or users. I need to add detailed operational info as I go. ⏺ (db): Purpose: Create enums and core tables for a single-tenant user management system (users, roles, user_roles, email_verification_tokens, password_reset_tokens, audit_logs) with required columns, constraints, and indexes. Minimal inputs: enum values and table definitions including columns, primary keys, FKs, unique indexes, and comments. I'll now apply the schema changes. 🔧 createMigrationTool() ⎿ createMigrationTool: Schema successfully updated. The operations have been applied to the database schema, DDL validation successful (79/79 statements executed successfully), and new version created. ⏺ (db): Validation: Enum and tables created successfully with PKs, FKs, indexes, and comments. Summary of changes applied:
All constraints and indexes required by the session goal and requirements are present (unique email/username, FK integrity, token uniqueness, audit logging). If you want, I can now:
⏺ (qa): Generating test cases (processing 33 requirements)... ⎿ runTestTool: 0/33 test cases passed, 33 failed ❌ Test Case: Register a new user with unique email sets status to pending_verificationError:
|
Complete fix for testcase generation message flooding during parallel execution. Changes: - saveTestcaseTool.ts: Update Command to use internalMessages instead of messages - saveTestcaseTool.ts: Remove unused imports and ToolMessage variables - saveToolNode.ts: ToolNode integration with internalMessages ↔ messages mapping - saveToolNode.ts: Add ESLint suppressions for ToolNode result type issues This ensures ToolMessages stay isolated within the subgraph and don't propagate to the parent graph's timeline during parallel execution. Result: Only "Generating test cases..." and final runTestTool results appear, completely eliminating the 37x ToolMessage flooding. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Replace model.stream() with model.invoke() to simplify implementation. Since we're not dispatching SSE events (to prevent message flooding), there's no benefit to streaming - we can directly invoke and return the result without manual chunk accumulation. Changes: - Replace stream() with invoke() - Remove unused AIMessage and AIMessageChunk imports - Eliminate chunk accumulation loop - Return model response directly This maintains the same behavior while reducing code complexity. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
ce1e009 to
736ba30
Compare
Previously, continueToRequirements used dispatchCustomEvent which only sent SSE streaming messages but didn't persist messages in graph state. Since conditional edge functions cannot update state, the start message was lost. Solution: Add a dedicated prepareTestcaseGeneration node that: - Adds start message to messages state before parallel processing - Runs before continueToRequirements distributes work via Send API - Works for both initial execution (START) and retry (resetFailedSqlTests) Changes: - distributeRequirements/index.ts: Add prepareTestcaseGeneration node - distributeRequirements/index.ts: Remove dispatchCustomEvent from continueToRequirements - createQaAgentGraph.ts: Add prepareTestcaseGeneration node to graph - createQaAgentGraph.ts: Update edges to route through prepareTestcaseGeneration - README.md: Update graph diagram and component descriptions - createQaAgentGraph.test.ts: Update expected mermaid diagram Result: Start messages now properly persist in state and appear in conversation history, while maintaining clean separation between state updates (nodes) and routing logic (conditional edges). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
After establishing internalMessages as the sole message handling mechanism, the messages field is no longer needed. This cleanup removes: - MessagesAnnotation.spec from testcaseAnnotation - messages initialization from distributeRequirements - messages field from test states - messages spreading in saveToolNode 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Remove "Removed dispatchCustomEvent to prevent message flooding during parallel execution" comments that are no longer needed. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
The prepareTestcaseGeneration node now sends messages via both: 1. dispatchCustomEvent for real-time SSE streaming display 2. messages array for conversation history persistence Changes: - Add dispatchCustomEvent call with AIMessage - Keep messages array update for state persistence - Simplify implementation by using AIMessage for both SSE and state This ensures users see the "Generating test cases..." notification in real-time while also preserving it in conversation history. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Issue
Why is this change needed?
The QA Agent's
testcaseGenerationsubgraph runs in parallel for multiple requirements. Each subgraph was sending individual SSE messages viastreamLLMResponse, causing the timeline to flood with messages and burying important information.Changes
This PR implements a dual-state approach to prevent message flooding while maintaining ToolNode compatibility:
Core Solution
Dual Message State: Added
internalMessagesfield alongside existingmessagesintestcaseAnnotationmessages: Kept empty to prevent auto-concat to parent graph (but present for ToolNode compatibility)internalMessages: Used for actual message storage within the subgraphRemoved Streaming Events: Changed
generateTestcaseNodeto process LLM streams directly without dispatching SSE events, eliminating per-subgraph message floodingToolNode Adapter: Added mapping logic in
saveToolNodeto translate betweeninternalMessages(subgraph state) andmessages(ToolNode requirement)Single Start Message: Added fire-and-forget start message in
continueToRequirementsto notify users once before parallel processing beginsKey Implementation Details
continueToRequirementsremains synchronous (fire-and-forget dispatch) to avoid breaking graph introspectionsaveToolNodeadapter maps:internalMessages→messages(input) →internalMessages(output)internalMessagesReview Focus Areas
saveToolNode.ts(lines 17-35) - verify it correctly handles all edge cases and properly maps messages in both directionsmessages: []throughout subgraph execution doesn't break any LangGraph internals or assumptionsgenerateTestcaseNode,routeAfterGenerate, etc.) consistently useinternalMessagesTesting
agent-deep-modelingand E2E testsSUPABASE_SERVICE_ROLE_KEYAdditional Context