Skip to content

test: add unit tests for orchestration modules#426

Closed
nikolasdehor wants to merge 3 commits intoSynkraAI:mainfrom
nikolasdehor:test/consolidated-orchestration
Closed

test: add unit tests for orchestration modules#426
nikolasdehor wants to merge 3 commits intoSynkraAI:mainfrom
nikolasdehor:test/consolidated-orchestration

Conversation

@nikolasdehor
Copy link
Contributor

@nikolasdehor nikolasdehor commented Feb 19, 2026

Add test coverage for all orchestration modules (16 test files). All tests pass locally. Closes #425

Summary by CodeRabbit

  • Tests
    • Added extensive unit test coverage across core modules: orchestration pipeline components (invocation, checklist runner, CLI commands, context/condition/gate evaluators, dashboard integration, recovery/parallel/executor assignment), circuit breaker, agent/skill/subagent prompt behaviors, surface-checking and tech-stack detection, execution-profile resolution, and quality/summary utilities to improve reliability and surface-edge case handling.

Copilot AI review requested due to automatic review settings February 19, 2026 19:16
@vercel
Copy link

vercel bot commented Feb 19, 2026

@nikolasdehor is attempting to deploy a commit to the Pedro Valério Lopez's projects Team on Vercel.

A member of the Team first needs to authorize it.

@coderabbitai
Copy link

coderabbitai bot commented Feb 19, 2026

Walkthrough

Adds comprehensive unit tests for 16 core and orchestration modules (CircuitBreaker, AgentInvoker, ChecklistRunner, ConditionEvaluator, ContextManager, DashboardIntegration, ExecutionProfileResolver, ExecutorAssignment, GateEvaluator, ParallelExecutor, RecoveryHandler, SkillDispatcher, SubagentPromptBuilder, SurfaceChecker, TechStackDetector, and CLI commands), validating behaviors, edge cases, and state transitions without modifying production code.

Changes

Cohort / File(s) Summary
Resilience & infra tests
tests/core/ids/circuit-breaker.test.js
New tests exercise CircuitBreaker lifecycle, thresholds, timeouts, probe behavior, stats, and reset paths.
Agent & prompt orchestration
tests/core/orchestration/agent-invoker.test.js, tests/core/orchestration/subagent-prompt-builder.test.js
Tests for agent invocation, task/frontmatter parsing, loading agents/tasks/templates/checklists, prompt assembly, and parsing of agent outputs.
Checklist & CLI orchestration
tests/core/orchestration/checklist-runner.test.js, tests/core/orchestration/cli-commands.test.js
Tests for checklist parsing/evaluation, validators, CLI command flows (orchestrate/status/stop/resume) and error/edge behaviors.
Context, dashboard & integrations
tests/core/orchestration/context-manager.test.js, tests/core/orchestration/dashboard-integration.test.js
State persistence, phase outputs/handoffs, summary/confidence calculations, dashboard status builds, history/notifications, and IO error paths.
Execution policies & profiles
tests/core/orchestration/execution-profile-resolver.test.js, tests/core/orchestration/executor-assignment.test.js
Profile normalization/resolution, policy reasoning, executor assignment detection, validation, and mappings between story types and executors.
Conditioning & gate checks
tests/core/orchestration/condition-evaluator.test.js, tests/core/orchestration/gate-evaluator.test.js
Condition evaluation logic (boolean, dot-notation, AND/OR/negation), phase applicability, gate checks, scoring, verdicts, and summaries.
Parallelism & recovery
tests/core/orchestration/parallel-executor.test.js, tests/core/orchestration/recovery-handler.test.js
ParallelExecutor concurrency, task lifecycle, cancellation and summaries; RecoveryHandler error classification, retry/escalation strategies, and attempt tracking.
Quality surface & skill dispatch
tests/core/orchestration/surface-checker.test.js, tests/core/orchestration/skill-dispatcher.test.js
Surface criteria evaluation and message interpolation; SkillDispatcher mapping, payload building, output parsing, available agents, and logging/formatting.
Tech detection
tests/core/orchestration/tech-stack-detector.test.js
TechStackDetector detection flows for DB/frontend/backend/TS/tests, phase applicability, confidence scoring, and package.json handling.

Sequence Diagram(s)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • SynkraAI/aios-core#44: Adds/modified orchestration modules and behaviors that these tests target (ChecklistRunner, ConditionEvaluator, ContextManager, SkillDispatcher, TechStackDetector).
  • SynkraAI/aios-core#59: Changes to dashboard-integration and related event hooks referenced by the new DashboardIntegration tests.

Possibly related issues

  • #425: Implements requested unit test coverage for the listed orchestration and infrastructure modules.

Suggested labels

tests, core, workflows

Suggested reviewers

  • Pedrovaleriolopez
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'test: add unit tests for orchestration modules' is concise, clear, and accurately summarizes the primary change of adding comprehensive test coverage for 16 orchestration modules.
Linked Issues check ✅ Passed The PR successfully addresses issue #425 by adding unit tests for all 16 required orchestration modules (agent-invoker, checklist-runner, cli-commands, condition-evaluator, context-manager, dashboard-integration, execution-profile-resolver, executor-assignment, gate-evaluator, parallel-executor, recovery-handler, skill-dispatcher, subagent-prompt-builder, surface-checker, tech-stack-detector, and circuit-breaker).
Out of Scope Changes check ✅ Passed All changes are within scope: 16 new test files covering the specified orchestration modules with no unrelated modifications to production code or public declarations.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds comprehensive unit test coverage for 16 orchestration modules as described in issue #425. The tests cover agent invocation, checklist running, CLI commands, condition evaluation, context management, dashboard integration, execution profile resolution, executor assignment, gate evaluation, parallel execution, recovery handling, skill dispatching, subagent prompt building, surface checking, tech stack detection, and circuit breaker functionality.

Changes:

  • Adds 16 comprehensive test files covering all orchestration modules
  • All tests use proper mocking patterns for file system and external dependencies
  • Tests cover both happy paths and error cases with good edge case coverage

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated no comments.

Show a summary per file
File Description
tests/core/orchestration/tech-stack-detector.test.js Tests tech stack detection from project files with database, frontend, backend, and TypeScript detection
tests/core/orchestration/surface-checker.test.js Tests surface criteria evaluation for when Bob should ask human for decisions
tests/core/orchestration/subagent-prompt-builder.test.js Tests prompt assembly from agent definitions, tasks, checklists, and templates
tests/core/orchestration/skill-dispatcher.test.js Tests agent-to-skill mapping and dispatch payload building
tests/core/orchestration/recovery-handler.test.js Tests automatic error recovery with multiple strategies and retry logic
tests/core/orchestration/parallel-executor.test.js Tests concurrent phase execution with configurable concurrency
tests/core/orchestration/gate-evaluator.test.js Tests quality gate evaluation between epics
tests/core/orchestration/executor-assignment.test.js Tests dynamic executor assignment based on story content keywords
tests/core/orchestration/execution-profile-resolver.test.js Tests execution profile resolution with context-based enforcement
tests/core/orchestration/dashboard-integration.test.js Tests real-time orchestrator monitoring and status management
tests/core/orchestration/context-manager.test.js Tests workflow state persistence and handoff package management
tests/core/orchestration/condition-evaluator.test.js Tests workflow condition evaluation based on tech stack
tests/core/orchestration/cli-commands.test.js Tests CLI command handlers for orchestrator control
tests/core/orchestration/checklist-runner.test.js Tests validation checklist execution with code-based rules
tests/core/orchestration/agent-invoker.test.js Tests agent invocation interface with retry logic and audit logging
tests/core/ids/circuit-breaker.test.js Tests circuit breaker pattern with state transitions

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (11)
tests/core/ids/circuit-breaker.test.js (1)

219-251: Consider adding a brief comment explaining the probe-reset dance in the lifecycle test.

Lines 233–239 manually toggle _halfOpenProbeInFlight between recordSuccess() calls to simulate consecutive probe cycles. This is valid and necessary, but a one-line comment above the block (e.g., // Simulate 3 consecutive successful probes) would make it immediately clear to future readers why internal state is being manipulated mid-test.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/core/ids/circuit-breaker.test.js` around lines 219 - 251, The test
manipulates internal probe state (_halfOpenProbeInFlight) between
recordSuccess() calls which is necessary but unclear; add a one-line comment
above the block (e.g., "// Simulate 3 consecutive successful probes by resetting
the half-open probe flag between successes") to explain why the test toggles
_halfOpenProbeInFlight in the lifecycle test (references: recordSuccess(),
_halfOpenProbeInFlight, getState()).
tests/core/orchestration/skill-dispatcher.test.js (1)

427-431: Hardcoded agent count assertion is intentionally brittle but could benefit from a comment.

The assertion expect(agents).toHaveLength(11) will break whenever a new primary agent is added. This is a reasonable canary, but a brief comment explaining this intent (e.g., // Update this count when adding new primary agents) would help future contributors understand why their test is failing.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/core/orchestration/skill-dispatcher.test.js` around lines 427 - 431,
Add a brief explanatory comment above the brittle assertion so future
contributors know the length check is intentional; e.g., before the
expect(agents).toHaveLength(11) line, add a one-line comment referencing
dispatcher.getAvailableAgents() that says this is a canary for the current
number of primary agents and should be updated when new primary agents are added
(e.g., "// Intentional: update this count when adding/removing primary agents").
tests/core/orchestration/executor-assignment.test.js (1)

20-24: console.warn spy is never restored after tests complete.

jest.resetAllMocks() resets call counts and implementations but does not restore the original console.warn. Consider adding an afterEach with jest.restoreAllMocks() (or manually restoring the spy), consistent with the pattern used in cli-commands.test.js for console.log.

♻️ Proposed fix
   beforeEach(() => {
     jest.resetAllMocks();
     jest.spyOn(console, 'warn').mockImplementation();
   });
+
+  afterEach(() => {
+    jest.restoreAllMocks();
+  });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/core/orchestration/executor-assignment.test.js` around lines 20 - 24,
The test sets a spy on console.warn in the beforeEach (jest.spyOn(console,
'warn').mockImplementation()) but never restores it; add an afterEach that calls
jest.restoreAllMocks() (or explicitly restore the console.warn spy) to return
console.warn to its original implementation after each test, mirroring the
pattern used for console.log in cli-commands.test.js.
tests/core/orchestration/tech-stack-detector.test.js (1)

99-192: fs.pathExists mock returns raw booleans instead of Promises in mockImplementation callbacks.

The beforeEach correctly uses mockResolvedValue(false) (returning a Promise), but the mockImplementation overrides in individual tests (e.g., line 100, 117, 131) return raw booleans. This works because await true resolves to true, but it's inconsistent with the Promise-based contract of fs.pathExists. If the production code ever chains .then() instead of await, these mocks would break.

This is a low-risk nit since await handles it, but using Promise.resolve(...) in the implementations would be more robust.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/core/orchestration/tech-stack-detector.test.js` around lines 99 - 192,
The fs.pathExists mocks in tests like the 'detects Supabase from directory' and
'detects Prisma from directory' cases (the mockImplementation callbacks inside
those tests) return raw booleans instead of Promises; update each
mockImplementation to return Promise.resolve(boolean) (e.g.,
Promise.resolve(true) or Promise.resolve(false)) so fs.pathExists consistently
returns a Promise (matching the beforeEach mockResolvedValue(false)) and avoids
breaking code paths that use .then() instead of await.
tests/core/orchestration/surface-checker.test.js (1)

406-507: Repetitive SurfaceChecker instantiation and state setup across helper method tests.

Each test in getActionConfig, getCriteria, getDestructiveActions, isDestructiveAction, and getMetadata creates a new SurfaceChecker, assigns criteria and _loaded manually. This could be extracted into a shared beforeEach block for the outer describe or a helper function, reducing boilerplate.

♻️ Example: extract shared setup
+  // Helper to create a loaded checker with mock criteria
+  function loadedChecker(criteria = MOCK_CRITERIA) {
+    const checker = new SurfaceChecker();
+    checker.criteria = criteria;
+    checker._loaded = true;
+    return checker;
+  }
+
   describe('getActionConfig', () => {
-    let checker;
-
-    beforeEach(() => {
-      checker = new SurfaceChecker();
-      checker.criteria = MOCK_CRITERIA;
-      checker._loaded = true;
-    });
-
     test('returns action config when found', () => {
+      const checker = loadedChecker();
       const config = checker.getActionConfig('confirm_cost');
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/core/orchestration/surface-checker.test.js` around lines 406 - 507, The
tests for getActionConfig, getCriteria, getDestructiveActions,
isDestructiveAction, and getMetadata repeat SurfaceChecker instantiation and
state setup; refactor by moving the repeated setup into a shared beforeEach (or
a small helper like createChecker) so each describe uses a single let checker
and a beforeEach that does checker = new SurfaceChecker(); checker.criteria =
MOCK_CRITERIA; checker._loaded = true (and override in individual tests where
needed), updating tests to use that shared checker instead of duplicating the
setup.
tests/core/orchestration/parallel-executor.test.js (1)

19-23: Missing afterEach to restore console.log spy — same pattern issue as in executor-assignment.test.js.

Adding jest.restoreAllMocks() in an afterEach would ensure clean teardown.

♻️ Proposed fix
   beforeEach(() => {
     jest.resetAllMocks();
     jest.spyOn(console, 'log').mockImplementation();
     executor = new ParallelExecutor();
   });
+
+  afterEach(() => {
+    jest.restoreAllMocks();
+  });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/core/orchestration/parallel-executor.test.js` around lines 19 - 23, The
test sets up a spy on console.log in the beforeEach for ParallelExecutor but
never restores mocks, risking cross-test pollution; add an afterEach that calls
jest.restoreAllMocks() (matching the pattern used in
executor-assignment.test.js) to restore the console.log spy and other mocks so
ParallelExecutor tests are cleanly torn down.
tests/core/orchestration/agent-invoker.test.js (2)

432-441: Prefer not.toContainEqual over toBeLessThan for copy-semantics verification.

The current assertion expect(invoker.logs.length).toBeLessThan(logs.length) is weaker than the pattern used elsewhere in the suite (e.g., recovery-handler.test.js line 501–502). It's satisfiable by a coincidentally shorter original array rather than strictly proving the pushed element didn't propagate.

♻️ Suggested refactor
     const logs = invoker.getLogs();
-    logs.push({ fake: true });
-
-    expect(invoker.logs.length).toBeLessThan(logs.length);
+    logs.push({ fake: true });
+
+    expect(invoker.logs).not.toContainEqual({ fake: true });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/core/orchestration/agent-invoker.test.js` around lines 432 - 441, The
test "getLogs returns copy" should assert that modifying the returned array does
not affect the original by checking the original logs do not contain the pushed
object; call invoker.getLogs() to get a copy, push the fake entry onto that
local array, and replace the length comparison with an assertion like
expect(invoker.logs).not.toContainEqual({ fake: true }) so the test relies on
element-level equality and proves the copy semantics of invoker.getLogs().

128-135: Misleading comment: 'dev' is a known agent — it's the file that isn't loaded, not the agent itself.

// Agent also not found since pathExists returns false

'dev' exists in SUPPORTED_AGENTS, so the agent is recognized. When fs.pathExists returns false, _loadAgent('dev') returns {name: 'dev', loaded: false, content: null} (as tested at line 201–209). The failure is because loaded: false or the task is missing — not because the agent is unknown. The comment creates confusion with the "Unknown agent" error path tested just above.

✏️ Suggested comment update
-      // Agent also not found since pathExists returns false
-      expect(result.success).toBe(false);
+      // Agent file unloaded (pathExists = false) and task doesn't exist → invocation fails
+      expect(result.success).toBe(false);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/core/orchestration/agent-invoker.test.js` around lines 128 - 135,
Update the misleading test comment in the test 'returns failure when task not
found' to accurately reflect that the agent 'dev' is recognized (it's in
SUPPORTED_AGENTS) but not loaded because fs.pathExists() was mocked to return
false; mention that _loadAgent('dev') will return { name: 'dev', loaded: false,
content: null } and that the failure is due to loaded: false or a missing task
rather than an unknown agent, keeping references to invoker.invokeAgent and
fs.pathExists so maintainers can locate the logic being asserted.
tests/core/orchestration/checklist-runner.test.js (1)

116-125: Optional: decouple deduplication test from the 30-char threshold implementation detail.

The comment explicitly names the first-30-characters heuristic used by the implementation. If the deduplication algorithm changes (e.g., different threshold, Levenshtein distance, exact match), this test will either produce a false negative or an opaque failure without explaining why.

Consider restructuring the test content so the YAML and markdown items are either identical strings or clearly distinct, to test the intent (deduplication) rather than the mechanism:

♻️ Proposed refactor
-    test('skips duplicate items from YAML and markdown', () => {
-      const content = '```yaml\npre-conditions:\n  - Check files exist for deployment\n```\n- [ ] Check files exist for deployment readiness';
-      yaml.load.mockReturnValue({
-        'pre-conditions': ['Check files exist for deployment'],
-      });
-
-      const items = runner.parseChecklistItems(content);
-      // The markdown checkbox matches first 30 chars of YAML item, so should be skipped
-      expect(items).toHaveLength(1);
-    });
+    test('skips duplicate items from YAML and markdown', () => {
+      const item = 'Check files exist for deployment';
+      const content = '```yaml\npre-conditions:\n  - ' + item + '\n```\n- [ ] ' + item;
+      yaml.load.mockReturnValue({
+        'pre-conditions': [item],
+      });
+
+      const items = runner.parseChecklistItems(content);
+      // Exact duplicate between YAML and markdown should be collapsed to one item
+      expect(items).toHaveLength(1);
+    });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/core/orchestration/checklist-runner.test.js` around lines 116 - 125,
The test "skips duplicate items from YAML and markdown" leaks an implementation
detail (30-char prefix heuristic) and can break if deduplication changes; update
the test to assert deduplication intent by using exact identical strings for the
YAML and markdown entries: set a single variable (e.g., item = 'Check files
exist for deployment'), build content so the YAML block and markdown checkbox
use that same item, keep yaml.load.mockReturnValue returning that same item
under 'pre-conditions', then call runner.parseChecklistItems(content) and expect
a single item; this keeps the test focused on the deduplication behavior of
parseChecklistItems without referencing the internal threshold.
tests/core/orchestration/dashboard-integration.test.js (1)

108-113: Misleading comment in the idempotency test.

The comment // ensureDir should only be called once (from first start) reads as "only one total call", but toHaveBeenCalledTimes(2) reflects that a single start() invocation makes two ensureDir calls (two directories). The intent is correct — the second start() is a no-op — but the comment needs clarification.

✏️ Suggested comment update
-      // ensureDir should only be called once (from first start)
-      expect(fs.ensureDir).toHaveBeenCalledTimes(2);
+      // ensureDir called twice (2 dirs) only from the first start(); second call is a no-op
+      expect(fs.ensureDir).toHaveBeenCalledTimes(2);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/core/orchestration/dashboard-integration.test.js` around lines 108 -
113, Update the misleading comment in the test "start is idempotent" to clarify
that ensureDir is expected to be called twice by the first dashboard.start()
(creating two directories) and that the second start() should not add any more
calls; reference the test name and the fs.ensureDir assertion so the comment
explains "two calls total from first start, zero from second" instead of "only
once".
tests/core/orchestration/recovery-handler.test.js (1)

381-395: Misleading inline comment — the actual error trigger is fs.ensureDir, not orchestrator._log.

// Force an error in ESCALATE by providing a bad orchestrator._log
handler.orchestrator = { _log: jest.fn() };
// Mock fs to throw during escalation report save
fs.ensureDir.mockRejectedValueOnce(new Error('disk full'));

The first comment is incorrect — providing _log: jest.fn() doesn't force an error. The real trigger is fs.ensureDir rejecting. Consider removing or correcting the first comment.

✏️ Suggested fix
-      handler.orchestrator = {
-        // Force an error in ESCALATE by providing a bad orchestrator._log
-        _log: jest.fn(),
-      };
-
-      // Mock fs to throw during escalation report save
-      fs.ensureDir.mockRejectedValueOnce(new Error('disk full'));
+      // Force an error during escalation report save (fs.ensureDir rejects)
+      handler.orchestrator = { _log: jest.fn() };
+      fs.ensureDir.mockRejectedValueOnce(new Error('disk full'));
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/core/orchestration/recovery-handler.test.js` around lines 381 - 395,
The inline comment in the test is misleading: it claims an error is forced by
setting handler.orchestrator._log, but the failing operation is actually
fs.ensureDir rejecting; update the comment (or remove it) to accurately state
that fs.ensureDir.mockRejectedValueOnce(new Error('disk full')) is the error
trigger for handler._executeRecoveryStrategy when using
RecoveryStrategy.ESCALATE_TO_HUMAN, and keep the rest of the test
(handler.orchestrator = { _log: jest.fn() }, the mock of fs.ensureDir, and the
assertions) unchanged so it's clear where the failure originates.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/core/orchestration/context-manager.test.js`:
- Around line 469-478: The test "uses custom threshold from env" sets
process.env.AIOS_DELIVERY_CONFIDENCE_THRESHOLD but only deletes it at the end of
the test body, which leaks on failure; wrap the test body in a try/finally so
the env var is always cleaned up (set env, await mgr.savePhaseOutput(...);
getDeliveryConfidence()/expect assertions inside try, then in finally delete
process.env.AIOS_DELIVERY_CONFIDENCE_THRESHOLD), or move the setup/teardown to a
scoped beforeEach/afterEach for this describe block; reference the test name,
the env var key, and the methods mgr.savePhaseOutput and
mgr.getDeliveryConfidence when applying the change.

In `@tests/core/orchestration/dashboard-integration.test.js`:
- Around line 178-188: The console.warn spy created in the 'handles write errors
with console.warn when no error listener' test is not restored and leaks; add an
afterEach that calls console.warn.mockRestore() (or if you prefer
jest.restoreAllMocks()) to undo jest.spyOn(console,
'warn').mockImplementation(), and either move the spy into a beforeEach for the
whole suite or remove the inline spy in the test and rely on the global spy;
ensure the test still triggers dashboard.updateStatus() and asserts
console.warn, then restore the mock so subsequent tests are not affected.

---

Nitpick comments:
In `@tests/core/ids/circuit-breaker.test.js`:
- Around line 219-251: The test manipulates internal probe state
(_halfOpenProbeInFlight) between recordSuccess() calls which is necessary but
unclear; add a one-line comment above the block (e.g., "// Simulate 3
consecutive successful probes by resetting the half-open probe flag between
successes") to explain why the test toggles _halfOpenProbeInFlight in the
lifecycle test (references: recordSuccess(), _halfOpenProbeInFlight,
getState()).

In `@tests/core/orchestration/agent-invoker.test.js`:
- Around line 432-441: The test "getLogs returns copy" should assert that
modifying the returned array does not affect the original by checking the
original logs do not contain the pushed object; call invoker.getLogs() to get a
copy, push the fake entry onto that local array, and replace the length
comparison with an assertion like expect(invoker.logs).not.toContainEqual({
fake: true }) so the test relies on element-level equality and proves the copy
semantics of invoker.getLogs().
- Around line 128-135: Update the misleading test comment in the test 'returns
failure when task not found' to accurately reflect that the agent 'dev' is
recognized (it's in SUPPORTED_AGENTS) but not loaded because fs.pathExists() was
mocked to return false; mention that _loadAgent('dev') will return { name:
'dev', loaded: false, content: null } and that the failure is due to loaded:
false or a missing task rather than an unknown agent, keeping references to
invoker.invokeAgent and fs.pathExists so maintainers can locate the logic being
asserted.

In `@tests/core/orchestration/checklist-runner.test.js`:
- Around line 116-125: The test "skips duplicate items from YAML and markdown"
leaks an implementation detail (30-char prefix heuristic) and can break if
deduplication changes; update the test to assert deduplication intent by using
exact identical strings for the YAML and markdown entries: set a single variable
(e.g., item = 'Check files exist for deployment'), build content so the YAML
block and markdown checkbox use that same item, keep yaml.load.mockReturnValue
returning that same item under 'pre-conditions', then call
runner.parseChecklistItems(content) and expect a single item; this keeps the
test focused on the deduplication behavior of parseChecklistItems without
referencing the internal threshold.

In `@tests/core/orchestration/dashboard-integration.test.js`:
- Around line 108-113: Update the misleading comment in the test "start is
idempotent" to clarify that ensureDir is expected to be called twice by the
first dashboard.start() (creating two directories) and that the second start()
should not add any more calls; reference the test name and the fs.ensureDir
assertion so the comment explains "two calls total from first start, zero from
second" instead of "only once".

In `@tests/core/orchestration/executor-assignment.test.js`:
- Around line 20-24: The test sets a spy on console.warn in the beforeEach
(jest.spyOn(console, 'warn').mockImplementation()) but never restores it; add an
afterEach that calls jest.restoreAllMocks() (or explicitly restore the
console.warn spy) to return console.warn to its original implementation after
each test, mirroring the pattern used for console.log in cli-commands.test.js.

In `@tests/core/orchestration/parallel-executor.test.js`:
- Around line 19-23: The test sets up a spy on console.log in the beforeEach for
ParallelExecutor but never restores mocks, risking cross-test pollution; add an
afterEach that calls jest.restoreAllMocks() (matching the pattern used in
executor-assignment.test.js) to restore the console.log spy and other mocks so
ParallelExecutor tests are cleanly torn down.

In `@tests/core/orchestration/recovery-handler.test.js`:
- Around line 381-395: The inline comment in the test is misleading: it claims
an error is forced by setting handler.orchestrator._log, but the failing
operation is actually fs.ensureDir rejecting; update the comment (or remove it)
to accurately state that fs.ensureDir.mockRejectedValueOnce(new Error('disk
full')) is the error trigger for handler._executeRecoveryStrategy when using
RecoveryStrategy.ESCALATE_TO_HUMAN, and keep the rest of the test
(handler.orchestrator = { _log: jest.fn() }, the mock of fs.ensureDir, and the
assertions) unchanged so it's clear where the failure originates.

In `@tests/core/orchestration/skill-dispatcher.test.js`:
- Around line 427-431: Add a brief explanatory comment above the brittle
assertion so future contributors know the length check is intentional; e.g.,
before the expect(agents).toHaveLength(11) line, add a one-line comment
referencing dispatcher.getAvailableAgents() that says this is a canary for the
current number of primary agents and should be updated when new primary agents
are added (e.g., "// Intentional: update this count when adding/removing primary
agents").

In `@tests/core/orchestration/surface-checker.test.js`:
- Around line 406-507: The tests for getActionConfig, getCriteria,
getDestructiveActions, isDestructiveAction, and getMetadata repeat
SurfaceChecker instantiation and state setup; refactor by moving the repeated
setup into a shared beforeEach (or a small helper like createChecker) so each
describe uses a single let checker and a beforeEach that does checker = new
SurfaceChecker(); checker.criteria = MOCK_CRITERIA; checker._loaded = true (and
override in individual tests where needed), updating tests to use that shared
checker instead of duplicating the setup.

In `@tests/core/orchestration/tech-stack-detector.test.js`:
- Around line 99-192: The fs.pathExists mocks in tests like the 'detects
Supabase from directory' and 'detects Prisma from directory' cases (the
mockImplementation callbacks inside those tests) return raw booleans instead of
Promises; update each mockImplementation to return Promise.resolve(boolean)
(e.g., Promise.resolve(true) or Promise.resolve(false)) so fs.pathExists
consistently returns a Promise (matching the beforeEach
mockResolvedValue(false)) and avoids breaking code paths that use .then()
instead of await.

- Wrap env var setup/teardown in try/finally in context-manager test to prevent
  AIOS_DELIVERY_CONFIDENCE_THRESHOLD from leaking on test failure
- Add jest.restoreAllMocks() to afterEach in dashboard-integration to restore
  console.warn spy
@Pedrovaleriolopez
Copy link
Contributor

Hi @nikolasdehor — Please address CodeRabbit's feedback when you get a chance. Most issues are minor nitpicks (assertion specificity, edge cases). Once addressed, we'll approve. Thanks for the great test coverage work! 🎯

- Add try/finally for env var leak in context-manager (already done)
- Add jest.restoreAllMocks() in dashboard-integration afterEach (already done)
- Add clarifying comment on half-open probe reset in circuit-breaker
- Strengthen getLogs copy assertion in agent-invoker (not.toContainEqual)
- Fix misleading comment on agent file loading in agent-invoker
- Use exact duplicate strings in checklist-runner (no 30-char impl leak)
- Fix ensureDir comment in dashboard-integration (2 dirs, not 1 call)
- Add afterEach(jest.restoreAllMocks) in executor-assignment
- Add afterEach(jest.restoreAllMocks) in parallel-executor
- Fix misleading error trigger comment in recovery-handler
- Add canary comment on agent count in skill-dispatcher
- Extract loadedChecker factory in surface-checker helper methods
- Replace raw boolean returns with Promise.resolve in tech-stack-detector
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (8)
tests/core/orchestration/checklist-runner.test.js (2)

330-348: pathExists mock returns raw booleans instead of Promise.resolve().

Lines 332-333 return raw true/false from a mockImplementation of fs.pathExists, which is an async API. While await true resolves to true in JS, this is inconsistent with the rest of this file and other test files in this PR, which use Promise.resolve() (or mockResolvedValue). The commit history also mentions fixing this pattern.

Suggested fix
     fs.pathExists.mockImplementation((p) => {
-      if (p.includes('checklists')) return true;
-      return false;
+      if (p.includes('checklists')) return Promise.resolve(true);
+      return Promise.resolve(false);
     });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/core/orchestration/checklist-runner.test.js` around lines 330 - 348, In
the 'fails when blocker item fails' test, the fs.pathExists mock returns raw
booleans; update the mock so fs.pathExists returns promises (e.g., use
Promise.resolve(true)/Promise.resolve(false) or use
fs.pathExists.mockResolvedValue / mockResolvedValueOnce) to match the async API
and the rest of the tests—locate the mockImplementation for fs.pathExists in
that test and replace the boolean returns with resolved Promises.

16-22: Missing afterEach(() => jest.restoreAllMocks()).

Same as flagged in agent-invoker.test.js — adding afterEach with jest.restoreAllMocks() ensures any spies created during tests (e.g., on console) are properly cleaned up.

Suggested fix
   beforeEach(() => {
     jest.resetAllMocks();
     runner = new ChecklistRunner('/project');
   });
+
+  afterEach(() => {
+    jest.restoreAllMocks();
+  });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/core/orchestration/checklist-runner.test.js` around lines 16 - 22, Add
an afterEach hook to restore mocks after each test: in the test suite for
ChecklistRunner (where beforeEach creates runner = new
ChecklistRunner('/project')), add afterEach(() => jest.restoreAllMocks()) so any
spies or mocks (e.g., console spies) are cleaned up; ensure this mirrors the
change made in agent-invoker.test.js and sits alongside the existing beforeEach
block.
tests/core/orchestration/tech-stack-detector.test.js (2)

14-24: Missing afterEach(() => jest.restoreAllMocks()).

Same consistency issue flagged in the other test files.

Suggested fix
   beforeEach(() => {
     jest.resetAllMocks();
     fs.pathExists.mockResolvedValue(false);
     fs.readJson.mockResolvedValue(null);
     fs.readdir.mockResolvedValue([]);
     fs.readFile.mockResolvedValue('');
     detector = new TechStackDetector('/project');
   });
+
+  afterEach(() => {
+    jest.restoreAllMocks();
+  });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/core/orchestration/tech-stack-detector.test.js` around lines 14 - 24,
Add an afterEach hook to restore Jest mocks by calling jest.restoreAllMocks() to
mirror the beforeEach reset and keep tests isolated; specifically, in the
tests/core/orchestration/tech-stack-detector.test.js test suite that
instantiates TechStackDetector (variable detector) add afterEach(() =>
jest.restoreAllMocks()) so fs.* mocks and any other spies are fully restored
after each test.

198-340: Consider extracting the repeated pathExists + readJson mock setup into a helper.

Nearly every frontend detection test (and many backend/database tests) repeats:

fs.pathExists.mockImplementation((p) => Promise.resolve(p.endsWith('package.json')));
fs.readJson.mockResolvedValue({ dependencies: { ... } });

A small helper like mockPackageJson(deps, devDeps) would reduce boilerplate and make each test body focus on the assertion.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/core/orchestration/tech-stack-detector.test.js` around lines 198 - 340,
Many tests repeat the same fs.pathExists/fs.readJson mocking; add a helper to
reduce duplication. Create a helper function (e.g., mockPackageJson(deps,
devDeps)) used by the test suite that sets fs.pathExists.mockImplementation((p)
=> Promise.resolve(p.endsWith('package.json'))) and
fs.readJson.mockResolvedValue({ dependencies: deps, devDependencies: devDeps }),
then replace the repeated mock lines in each frontend test with calls to
mockPackageJson(...) before calling detector.detect() so tests still exercise
detect() and assertions (profile.frontend.*, profile.hasFrontend, etc.) remain
unchanged.
tests/core/orchestration/agent-invoker.test.js (2)

128-135: Test name is misleading — mock prevents agent loading too, not just task lookup.

fs.pathExists.mockResolvedValue(false) causes both the agent file and the task file to be missing. The test is named "returns failure when task not found" but the assertion only checks result.success === false without verifying the error relates to the task specifically. This makes it indistinguishable from the "unknown agent" test above.

Consider either (a) using mockImplementation to return true for the agent path and false for the task path, or (b) asserting on result.error to confirm the failure reason.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/core/orchestration/agent-invoker.test.js` around lines 128 - 135, The
test "returns failure when task not found" currently sets
fs.pathExists.mockResolvedValue(false) which prevents both agent and task
loading; update the test to specifically simulate a present agent but missing
task by using fs.pathExists.mockImplementation(path =>
path.endsWith('.agent.js') ? Promise.resolve(true) : Promise.resolve(false)) (or
equivalent path-matching) so invoker.invokeAgent('dev','nonexistent-task')
exercises task-not-found logic, and add an assertion on result.error (or
result.error.message) to verify the failure reason is the missing task rather
than an unknown agent.

20-26: Missing afterEach(() => jest.restoreAllMocks()) for spy cleanup.

Other test files in this PR (e.g., executor-assignment.test.js, lines 26-28) include an afterEach block that calls jest.restoreAllMocks(). This file only has jest.resetAllMocks() in beforeEach, which resets mock state but does not restore original implementations of spied-on methods. If any test (or the production code under test) spies on globals like console.warn, missing restoreAllMocks can leak spies across suites.

Suggested fix
   beforeEach(() => {
     jest.resetAllMocks();
     invoker = new AgentInvoker({ projectRoot: '/project' });
   });
+
+  afterEach(() => {
+    jest.restoreAllMocks();
+  });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/core/orchestration/agent-invoker.test.js` around lines 20 - 26, The
test suite for AgentInvoker currently calls jest.resetAllMocks() in the
beforeEach but lacks teardown to restore spied implementations; add an afterEach
hook that calls jest.restoreAllMocks() (mirroring other tests) to ensure spies
on globals (e.g., console.warn) or module methods are restored after each test;
update the tests/core/orchestration/AgentInvoker suite surrounding the existing
beforeEach that creates invoker and calls jest.resetAllMocks() to include
afterEach(() => jest.restoreAllMocks()) so spies created during AgentInvoker
tests are cleaned up.
tests/core/orchestration/surface-checker.test.js (2)

67-70: Missing afterEach(() => jest.restoreAllMocks()).

Same issue as the other test files. This is particularly relevant here because of the inline console.warn spy at line 237 — without afterEach cleanup, a test failure before consoleSpy.mockRestore() (line 240) would leave the spy attached. Adding afterEach would also let you simplify that inline spy/restore block.

Suggested fix
   beforeEach(() => {
     jest.resetAllMocks();
   });
+
+  afterEach(() => {
+    jest.restoreAllMocks();
+  });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/core/orchestration/surface-checker.test.js` around lines 67 - 70, Add a
global afterEach cleanup to restore Jest mocks in the SurfaceChecker test suite:
inside the describe('SurfaceChecker', ...) block add afterEach(() =>
jest.restoreAllMocks()) so any spies (e.g., the inline console.warn spy) are
always removed even if a test fails; then simplify/remove manual mockRestore()
calls around the console.warn spy (the inline spy at the console.warn usage)
since afterEach will handle restoration.

236-241: Inline spy restore can be skipped if afterEach cleanup is added.

If the afterEach(() => jest.restoreAllMocks()) block is added at the suite level, you can simplify this test by moving the spy setup into beforeEach (like executor-assignment.test.js does) or simply removing the manual consoleSpy.mockRestore() call.

If you prefer to keep the spy scoped to this single test, wrapping in try/finally would be more defensive:

Alternative with try/finally
     const consoleSpy = jest.spyOn(console, 'warn').mockImplementation();
-    expect(checker.evaluateCondition('invalid %%% condition', {})).toBe(false);
-    expect(consoleSpy).toHaveBeenCalledWith(expect.stringContaining('Unknown condition'));
-    consoleSpy.mockRestore();
+    try {
+      expect(checker.evaluateCondition('invalid %%% condition', {})).toBe(false);
+      expect(consoleSpy).toHaveBeenCalledWith(expect.stringContaining('Unknown condition'));
+    } finally {
+      consoleSpy.mockRestore();
+    }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/core/orchestration/surface-checker.test.js` around lines 236 - 241, Add
suite-level automatic mock restore to avoid manual consoleSpy.mockRestore(): add
an afterEach(() => jest.restoreAllMocks()) at the test suite level so individual
tests like the "returns false for unknown condition format" test can simply spy
on console with jest.spyOn(console, 'warn') without calling
consoleSpy.mockRestore(); alternatively, if you want the spy scoped only to this
test keep the jest.spyOn(console, 'warn') but wrap assertions and the spy setup
in a try/finally and call consoleSpy.mockRestore() in finally; update tests
referencing checker.evaluateCondition and consoleSpy accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tests/core/orchestration/agent-invoker.test.js`:
- Around line 128-135: The test "returns failure when task not found" currently
sets fs.pathExists.mockResolvedValue(false) which prevents both agent and task
loading; update the test to specifically simulate a present agent but missing
task by using fs.pathExists.mockImplementation(path =>
path.endsWith('.agent.js') ? Promise.resolve(true) : Promise.resolve(false)) (or
equivalent path-matching) so invoker.invokeAgent('dev','nonexistent-task')
exercises task-not-found logic, and add an assertion on result.error (or
result.error.message) to verify the failure reason is the missing task rather
than an unknown agent.
- Around line 20-26: The test suite for AgentInvoker currently calls
jest.resetAllMocks() in the beforeEach but lacks teardown to restore spied
implementations; add an afterEach hook that calls jest.restoreAllMocks()
(mirroring other tests) to ensure spies on globals (e.g., console.warn) or
module methods are restored after each test; update the
tests/core/orchestration/AgentInvoker suite surrounding the existing beforeEach
that creates invoker and calls jest.resetAllMocks() to include afterEach(() =>
jest.restoreAllMocks()) so spies created during AgentInvoker tests are cleaned
up.

In `@tests/core/orchestration/checklist-runner.test.js`:
- Around line 330-348: In the 'fails when blocker item fails' test, the
fs.pathExists mock returns raw booleans; update the mock so fs.pathExists
returns promises (e.g., use Promise.resolve(true)/Promise.resolve(false) or use
fs.pathExists.mockResolvedValue / mockResolvedValueOnce) to match the async API
and the rest of the tests—locate the mockImplementation for fs.pathExists in
that test and replace the boolean returns with resolved Promises.
- Around line 16-22: Add an afterEach hook to restore mocks after each test: in
the test suite for ChecklistRunner (where beforeEach creates runner = new
ChecklistRunner('/project')), add afterEach(() => jest.restoreAllMocks()) so any
spies or mocks (e.g., console spies) are cleaned up; ensure this mirrors the
change made in agent-invoker.test.js and sits alongside the existing beforeEach
block.

In `@tests/core/orchestration/surface-checker.test.js`:
- Around line 67-70: Add a global afterEach cleanup to restore Jest mocks in the
SurfaceChecker test suite: inside the describe('SurfaceChecker', ...) block add
afterEach(() => jest.restoreAllMocks()) so any spies (e.g., the inline
console.warn spy) are always removed even if a test fails; then simplify/remove
manual mockRestore() calls around the console.warn spy (the inline spy at the
console.warn usage) since afterEach will handle restoration.
- Around line 236-241: Add suite-level automatic mock restore to avoid manual
consoleSpy.mockRestore(): add an afterEach(() => jest.restoreAllMocks()) at the
test suite level so individual tests like the "returns false for unknown
condition format" test can simply spy on console with jest.spyOn(console,
'warn') without calling consoleSpy.mockRestore(); alternatively, if you want the
spy scoped only to this test keep the jest.spyOn(console, 'warn') but wrap
assertions and the spy setup in a try/finally and call consoleSpy.mockRestore()
in finally; update tests referencing checker.evaluateCondition and consoleSpy
accordingly.

In `@tests/core/orchestration/tech-stack-detector.test.js`:
- Around line 14-24: Add an afterEach hook to restore Jest mocks by calling
jest.restoreAllMocks() to mirror the beforeEach reset and keep tests isolated;
specifically, in the tests/core/orchestration/tech-stack-detector.test.js test
suite that instantiates TechStackDetector (variable detector) add afterEach(()
=> jest.restoreAllMocks()) so fs.* mocks and any other spies are fully restored
after each test.
- Around line 198-340: Many tests repeat the same fs.pathExists/fs.readJson
mocking; add a helper to reduce duplication. Create a helper function (e.g.,
mockPackageJson(deps, devDeps)) used by the test suite that sets
fs.pathExists.mockImplementation((p) =>
Promise.resolve(p.endsWith('package.json'))) and fs.readJson.mockResolvedValue({
dependencies: deps, devDependencies: devDeps }), then replace the repeated mock
lines in each frontend test with calls to mockPackageJson(...) before calling
detector.detect() so tests still exercise detect() and assertions
(profile.frontend.*, profile.hasFrontend, etc.) remain unchanged.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d17153e and 1b08230.

📒 Files selected for processing (11)
  • tests/core/ids/circuit-breaker.test.js
  • tests/core/orchestration/agent-invoker.test.js
  • tests/core/orchestration/checklist-runner.test.js
  • tests/core/orchestration/context-manager.test.js
  • tests/core/orchestration/dashboard-integration.test.js
  • tests/core/orchestration/executor-assignment.test.js
  • tests/core/orchestration/parallel-executor.test.js
  • tests/core/orchestration/recovery-handler.test.js
  • tests/core/orchestration/skill-dispatcher.test.js
  • tests/core/orchestration/surface-checker.test.js
  • tests/core/orchestration/tech-stack-detector.test.js
✅ Files skipped from review due to trivial changes (1)
  • tests/core/orchestration/recovery-handler.test.js
🚧 Files skipped from review as they are similar to previous changes (5)
  • tests/core/ids/circuit-breaker.test.js
  • tests/core/orchestration/context-manager.test.js
  • tests/core/orchestration/skill-dispatcher.test.js
  • tests/core/orchestration/dashboard-integration.test.js
  • tests/core/orchestration/parallel-executor.test.js

@nikolasdehor
Copy link
Contributor Author

Fechando — este PR batch foi superseded por PRs específicos com testes mais profundos e focados:

Os módulos restantes (agent-invoker, checklist-runner, cli-commands, etc.) serão cobertos em PRs individuais futuros.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add unit tests for orchestration modules

3 participants