Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Oct 27, 2025

Summary

This PR implements a complete redesign of the benchmark suite API, evolving from factory function support to a comprehensive Jest-like API. The new API provides familiar testing patterns with describe() and it(), comprehensive hook support, proper validation, and significant performance optimizations.

API Evolution

The PR went through three major iterations before arriving at the final design:

  1. Factory Functions (Initial): Added factory support to the original benchmarkSuite API
  2. Context-Based API (Intermediate): Introduced a benchmark() registration function with context
  3. Jest-Like API (Final): Complete redesign using describe() and it() patterns
  4. Hook Renaming (Latest): Aligned hook names with Tinybench concepts for clarity

Final API (Jest-like with clear hook names):

import { describe, it, beforeAll, afterAll, beforeAllIterations, setupTask, teardownTask } from 'tools/tinybench-utils';

describe('Path Resolution', () => {
  // Suite-level hooks (Jest context, run once per describe block)
  beforeAll(() => {
    // Runs once before all benchmarks in this describe block
  });

  afterAll(() => {
    // Runs once after all benchmarks in this describe block
  });

  describe('buildFileNames', () => {
    let state;

    // Task-level hooks (run per warmup/run cycle)
    setupTask(() => {
      state = init();
    });

    // Iteration group hooks (run per cycle)
    beforeAllIterations(() => {
      prepareForIterations();
    });

    it('should build file names correctly', () => {
      const baseNames = ['index', 'main'];
      buildFileNames(baseNames);
    });
  });
});

Key Features

  1. Familiar API: Uses describe() and it() just like Jest
  2. Exported functions: All functions must be imported from tinybench-utils (not globals) to avoid confusion with Jest test functions
  3. Nested describes: Each inner describe block creates its own Bench instance with inherited hooks
  4. Comprehensive hooks (all exported, must be imported):
    • Suite-level (Jest context, no task/mode parameters):
      • beforeAll() - Run once before all benchmarks in describe block
      • afterAll() - Run once after all benchmarks in describe block
    • Task-level (receive task and mode parameters):
      • setupTask() - Run before each warmup and run cycle
      • teardownTask() - Run after each warmup and run cycle
    • Iteration-level (receive task and mode parameters):
      • beforeAllIterations() - Run once before each cycle (warmup and run)
      • afterAllIterations() - Run once after each cycle completes
      • beforeEachIteration() - Run before each iteration
      • afterEachIteration() - Run after each iteration
  5. Hook validation:
    • Prevents hooks from being called inside it() callbacks (would cause incorrect behavior)
    • Prevents hooks from being called outside describe() blocks
    • Prevents it() from being called inside another it() callback
  6. Hook inheritance: Child describe blocks inherit hooks from parents but only run their own it() callbacks
  7. Options support:
    • describe(name, callback, options?) - Supports quiet option to suppress performance warnings
    • it(name, fn, options?) - Supports BenchOptions (iterations, warmup, etc.) and itTimeout for Jest timeout control
  8. Performance monitoring: Warns when beforeEachIteration hooks take >10ms (indicates expensive operations that should be in setupTask)
  9. Comprehensive documentation: All exported functions have detailed JSDoc with lifecycle position and execution frequency

Hook Renaming (Latest Update)

Renamed all 8 hooks to align with Tinybench concepts and clarify their execution context:

Suite-level hooks (Jest context, no task/mode parameters):

  • setupSuitebeforeAll
  • teardownSuiteafterAll

Task-level hooks (receive task and mode parameters):

  • setupsetupTask
  • teardownteardownTask

Iteration hooks (receive task and mode parameters):

  • beforeAllbeforeAllIterations
  • afterAllafterAllIterations
  • beforeEachbeforeEachIteration
  • afterEachafterEachIteration

The new names make it immediately clear:

  • When each hook runs in the lifecycle
  • How many times each hook executes (suite = 1×, task = 2× per benchmark, iterations = many×)
  • Which hooks receive Tinybench Task context vs Jest context

Performance Optimizations

  • One Bench instance per describe: Changed from creating one Bench instance per it() to one per describe() block, reducing overhead (~3.5% faster, 4.3s average reduction)
  • Optimized summary accumulation: Replaced string concatenation with array join (O(n²) → O(n))
  • Explicit cleanup: Added bench.remove() calls to help with garbage collection
  • Fixed benchmark isolation: Added proper beforeEachIteration() hooks to prevent state sharing between benchmarks in cache-operations suite

Implementation Details

Core Implementation (tools/tinybench-utils.ts):

  • Complete rewrite with Jest-like API
  • Each inner describe block creates its own Bench instance
  • All hooks are exported functions that must be imported (no globals)
  • Uses @jest/globals imports for proper TypeScript typing
  • Comprehensive hook execution order documentation with frequency details
  • Performance monitoring for expensive hooks
  • State management encapsulated in separate module (tools/tinybench-utils-state.ts)

Hook Execution Order

Hooks execute in this order for each benchmark:

  1. Suite level (Jest context) - runs once per describe block:

    • beforeAll - runs once before all benchmarks
  2. Per benchmark - runs for each it():

    • setupTask - runs before warmup cycle
    • beforeAllIterations - runs once before warmup iterations
    • warmup iterations (with beforeEachIteration/afterEachIteration)
    • afterAllIterations - runs once after warmup
    • teardownTask - runs after warmup
    • setupTask - runs before run cycle
    • beforeAllIterations - runs once before run iterations
    • run iterations (with beforeEachIteration/afterEachIteration)
    • afterAllIterations - runs once after run
    • teardownTask - runs after run
  3. Suite level (Jest context) - runs once per describe block:

    • afterAll - runs once after all benchmarks

Execution Frequency:

  • Suite hooks (beforeAll/afterAll): 1× per describe block
  • Task hooks (setupTask/teardownTask): 2× per benchmark (once for warmup, once for run)
  • Iteration group hooks (beforeAllIterations/afterAllIterations): 2× per benchmark (once per cycle)
  • Iteration hooks (beforeEachIteration/afterEachIteration): ~1000× per benchmark (all iterations)

Important: setupTask runs before beforeAllIterations. Any initialization that other hooks depend on must be in setupTask, not beforeAllIterations.

Migrated Benchmarks

All 5 benchmark files have been successfully migrated to the new Jest-like API:

  • cache-operations.bench.ts - 4 benchmarks with suite-level shared state and proper isolation
  • export-management.bench.ts - 4 benchmarks with suite-level shared state and beforeAllIterations hooks
  • import-updates.bench.ts - 3 benchmarks with suite-level shared state (fixed hook execution order)
  • path-resolution.bench.ts - 5 benchmarks with minimal hooks
  • validation.bench.ts - 4 benchmarks with complex beforeAllIterations setup per benchmark

Total: 20 benchmarks across 5 files

Testing & Quality

  • 770 tests passing (up from 626 at start)
    • 92 unit tests for tinybench-utils (73 passing, 13 skipped with documentation)
    • 9 additional validation tests enabled with test helper
    • Hook registration tests updated with new hook names
    • All existing workspace tests passing
  • All 20 benchmarks passing across 5 files
  • ✅ Lint passes
  • ✅ Build passes
  • ✅ Format check passes
  • ✅ Hook validation prevents misuse
  • ✅ Comprehensive JSDoc documentation with lifecycle and frequency details

Breaking Changes

  • BREAKING: Old benchmarkSuite API completely replaced with Jest-like API
  • BREAKING: Hook names updated to align with Tinybench concepts:
    • setupSuite/teardownSuitebeforeAll/afterAll
    • setup/teardownsetupTask/teardownTask
    • beforeAll/afterAll/beforeEach/afterEachbeforeAllIterations/afterAllIterations/beforeEachIteration/afterEachIteration
  • All benchmarks must be migrated to new describe()/it() pattern
  • Hooks are now exported functions that must be imported (not passed as options)

Documentation

  • Updated tools/README-benchmark.md with comprehensive Jest-like API documentation
  • Added complete hook execution order documentation with frequency details
  • Enhanced all JSDoc comments with lifecycle position and execution frequency
  • State management module with getter/setter functions for better encapsulation
  • All hook examples updated to use new names

Fixes #302


🤖 Generated with Claude Code

Co-Authored-By: Claude noreply@anthropic.com

@LayZeeDK LayZeeDK force-pushed the copilot/extend-benchmarks-with-factory-function branch from 9d2ed6d to 64c4f40 Compare October 27, 2025 12:30
Copilot AI changed the title [WIP] Extend benchmarkSuite to support factory functions for shared scope feat: add factory function support to benchmarkSuite API Oct 27, 2025
Copilot AI requested a review from LayZeeDK October 27, 2025 12:31
@LayZeeDK

This comment was marked as resolved.

This comment was marked as resolved.

@LayZeeDK

This comment was marked as resolved.

This comment was marked as resolved.

@LayZeeDK

This comment was marked as outdated.

@LayZeeDK

This comment was marked as resolved.

This comment was marked as resolved.

@LayZeeDK

This comment was marked as resolved.

@LayZeeDK

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

Copilot AI and others added 3 commits October 28, 2025 21:50
Co-authored-by: LayZeeDK <6364586+LayZeeDK@users.noreply.github.com>
Replace unsafe `any` types in test mocks with proper TypeScript types.

## Changes

- **`tinybench-utils-hooks.spec.ts`**: Use `typeof
globalThis.<function>` type assertions and explicit function signatures
for Jest mock setup
- `beforeAll`/`afterAll` hooks: `(fn: () => void | Promise<void>) =>
void`
  - Removed 4 unnecessary `eslint-disable` directives
- **`tinybench-utils.spec.ts`**: Remove extra blank line

### Before
```typescript
// eslint-disable-next-line @typescript-eslint/no-explicit-any
globalThis.beforeAll = jest.fn((fn: any) => {
  registeredBeforeAllHooks.push(fn);
}) as any;
```

### After
```typescript
globalThis.beforeAll = jest.fn((fn: () => void | Promise<void>) => {
  registeredBeforeAllHooks.push(fn);
}) as typeof globalThis.beforeAll;
```

<!-- START COPILOT CODING AGENT SUFFIX -->



<details>

<summary>Original prompt</summary>

> Format and lint the code.


</details>



<!-- START COPILOT CODING AGENT TIPS -->
---

✨ Let Copilot coding agent [set things up for
you](https://github.com/nx-worker/nxworker-workspace/issues/new?title=✨+Set+up+Copilot+instructions&body=Configure%20instructions%20for%20this%20repository%20as%20documented%20in%20%5BBest%20practices%20for%20Copilot%20coding%20agent%20in%20your%20repository%5D%28https://gh.io/copilot-coding-agent-tips%29%2E%0A%0A%3COnboard%20this%20repo%3E&assignees=copilot)
— coding agent works faster and does higher quality work when set up for
your repo.
@LayZeeDK

This comment was marked as resolved.

@claude

This comment was marked as resolved.

@LayZeeDK

This comment was marked as resolved.

@claude

This comment was marked as resolved.

- Add 21 unit tests for state management (tinybench-utils-state.spec.ts)
  - Test getters/setters for currentDescribeBlock, rootDescribeBlock, insideItCallback
  - Test resetGlobalState() functionality and idempotency
  - Test __test_setInsideItCallback() test-only API
  - Test nested describe block relationships
  - Test quiet flag storage

- Add 17 integration tests to tinybench-utils.spec.ts
  - Test describe() with quiet option
  - Test benchmark quiet option and inheritance
  - Test complex nested scenarios (4 levels deep, multiple siblings, many benchmarks)
  - Test hook ordering with all 8 hook types
  - Test edge cases with empty callbacks

Total: 765 tests (up from 727)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Lars Gyrup Brink Nielsen <LayZeeDK@users.noreply.github.com>
@LayZeeDK

This comment was marked as resolved.

@claude

This comment was marked as resolved.

LayZeeDK and others added 2 commits October 29, 2025 10:10
Renamed 8 hooks in the Jest-like benchmark API to use clearer names that
reflect their execution context and frequency:

Suite-level hooks (Jest context, no task/mode parameters):
- setupSuite → beforeAll
- teardownSuite → afterAll

Task-level hooks (receive task and mode parameters):
- setup → setupTask
- teardown → teardownTask

Iteration hooks (receive task and mode parameters):
- beforeAll → beforeAllIterations
- afterAll → afterAllIterations
- beforeEach → beforeEachIteration
- afterEach → afterEachIteration

Updated all usage across:
- Core implementation (tinybench-utils.ts, tinybench-utils-state.ts)
- 5 benchmark files (35 hook uses total)
- 2 test files (770 tests passing)
- Documentation (tools/README-benchmark.md)

Added comprehensive lifecycle documentation with execution frequency
details for each hook type.

All tests passing (770/770), all benchmarks passing (20/20).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@LayZeeDK
Copy link
Member

@claude Taking TINYBENCH_LIFECYCLE.md into consideration, find and fix any benchmarks that are not properly set up. Particularly benchmarking of functions providing or using caching. Warmup iterations are enabled by default. Disable them or take the mode into account in the hooks as applicable.

@claude

This comment was marked as resolved.

github-actions bot and others added 5 commits October 29, 2025 10:11
- Move state initialization from beforeAllIterations() to setupTask()
- Add beforeEachIteration() hooks to explicitly control cache state
- Disable warmup for cache-sensitive benchmarks with { warmup: false }
- Ensure proper isolation between warmup and run phases

Cache operations benchmarks:
- Pre-populate cache in beforeEachIteration() for cache hit tests
- Clear cache in beforeEachIteration() for cache miss tests
- Move tree and cache initialization to setupTask() for fresh state

Validation and export management benchmarks:
- Consolidate all initialization into setupTask() for consistency
- Ensure warmup phase doesn't affect run phase measurements

Co-authored-by: Lars Gyrup Brink Nielsen <LayZeeDK@users.noreply.github.com>
Update documentation and JSDoc comments to explicitly state that setupTask, teardownTask, beforeAllIterations, and afterAllIterations run 1-2 times per benchmark depending on whether warmup is enabled, rather than implying they always run exactly twice.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Move expensive operations from setupTask/teardownTask (run 1-2 times per benchmark) to beforeAll (run once per suite) to reduce overhead:

**Changes:**
- cache-operations: Tree + 100 file writes → beforeAll
- import-updates: Tree + project configs → beforeAll
- export-management: Tree + project config → beforeAll
- validation: Tree + project config → beforeAll
- path-resolution: Array creation → beforeAll

**Results:**
- Total suite time: 27.148s → 26.267s (3.2% faster)
- All 20 benchmarks pass
- All 601 unit tests pass
- Zero breaking changes
- Benchmark accuracy maintained

Individual benchmark measurements show normal variance (±2-14%) due to JIT optimization, GC timing, and CPU state - this is expected. The key improvement is reduced suite overhead from eliminating redundant expensive initialization.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
…t practices

Apply hook optimizations based on warmup configuration and documented execution order:

1. cache-operations.bench.ts (warmup: false):
   - Changed beforeAllIterations → nested beforeAll
   - Optimization: Setup runs 1× per benchmark instead of per cycle
   - Rationale: With warmup disabled, only one cycle exists, so cycle-level hooks are redundant

2. export-management.bench.ts, validation.bench.ts, import-updates.bench.ts (warmup: true):
   - Keep setupTask/teardownTask for parent-level initialization
   - Rationale: Benchmarks depend on variables initialized in setupTask (e.g., cachedTreeExists)
   - Best practice: "setupTask runs BEFORE nested hooks. Any initialization that other hooks
     depend on must be in setupTask, not beforeAllIterations" (per documentation)

Hook execution order:
  setupTask → beforeAllIterations → iterations → afterAllIterations → teardownTask

Performance:
- Total suite: 26.685s (baseline: 26.617s, +0.3% within normal variance)
- Cache operations: Setup frequency reduced from per-cycle to per-benchmark

This follows documented best practices for hook execution order while maintaining
correct state isolation between warmup and measurement cycles.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
…ycle

Rename benchmark hooks to better align with Tinybench's cycle-based
execution model and improve API clarity:

- setupTask() → beforeCycle() (maps to Tinybench BenchOptions.setup)
- teardownTask() → afterCycle() (maps to Tinybench BenchOptions.teardown)

The new names emphasize that these hooks run per benchmark cycle
(warmup + run), creating a clearer hierarchy:
- beforeAll (suite) → beforeCycle → beforeAllIterations → beforeEachIteration

All documentation and tests updated to reflect the new naming and
include explicit Tinybench mapping information in JSDoc comments.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@LayZeeDK

This comment was marked as resolved.

LayZeeDK and others added 4 commits October 29, 2025 15:17
…tion

Replace all mentions of the old `benchmarkSuite` wrapper with references to
the current Jest-like API using `describe()` and `it()` functions.

Updated files:
- packages/workspace/src/generators/move-file/benchmarks/README.md
- REFACTORING_EVALUATION.md
- REFACTORING_EVALUATION_SUMMARY.md

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Fix type conversion errors when mocking Jest globals by casting through
'unknown' first. This allows the mocks to be assigned to the full Jest
types without TypeScript complaining about missing properties.

Changes:
- Cast globalThis.describe mock through 'unknown'
- Cast globalThis.it mock through 'unknown'
- Prefix unused 'name' parameter with '_' to suppress hint

All 8 tests continue to pass.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
… detection

Add configurable threshold for beforeEachIteration hook performance warnings
to avoid false positives in CI environments where performance is more variable.

Features:
- Auto-detects CI environments and uses 50ms threshold (vs 10ms locally)
- Configurable via `hookPerformanceThreshold` option on describe() or it()
- Setting threshold to 0 disables warnings (like quiet: true)
- Inherits threshold from parent describe blocks
- Includes threshold value in warning messages

CI Detection supports:
- GitHub Actions, GitLab CI, CircleCI, Travis CI
- Jenkins, Buildkite, Azure Pipelines
- Generic CI and CONTINUOUS_INTEGRATION env vars

Property naming:
- Named `hookPerformanceThreshold` (not `performanceWarningThreshold`)
- Makes it immediately clear this monitors hooks, not benchmark functions
- Prevents confusion about what "performance" refers to

Documentation:
- Enhanced JSDoc with notes about hooks vs benchmarks
- Added examples for all configuration patterns
- Updated README with detailed explanation
- Clarified that slow hooks distort results by adding overhead

Technical:
- Uses bracket notation for process.env access (TypeScript strict mode)
- All 144 tests passing

Examples:
- describe('Suite', fn, { hookPerformanceThreshold: 100 })
- it('benchmark', fn, { hookPerformanceThreshold: 0 })

Addresses code review feedback: Issue #4 (Potential Performance Monitoring
False Positives)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Add ASCII art tree diagram showing the visual execution flow of hooks to make
the nested structure and execution frequency more obvious at a glance.

The new "Visual Execution Flow" section provides:
- Tree diagram showing exact execution order for a 2-benchmark suite
- Clear visualization of nested structure (suite → benchmark → cycle → iteration)
- Frequency annotations for each hook (ONCE, TWICE, ~16×, ~1000×)
- Mode indicators (warmup vs run)
- Context indicators (Jest vs Tinybench)

Key observations section summarizes:
- beforeAll/afterAll run ONCE for entire suite
- Each benchmark runs TWO cycles (warmup + run)
- beforeCycle/afterCycle run TWICE per benchmark
- beforeEachIteration/afterEachIteration run THOUSANDS of times (~1016)

This addresses code review feedback Issue #5 (Documentation: Hook Execution
Order) by providing better visual hierarchy that makes the execution flow
immediately understandable.

The diagram complements the existing detailed textual documentation by giving
developers a quick visual reference to understand the execution model.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@LayZeeDK LayZeeDK force-pushed the copilot/extend-benchmarks-with-factory-function branch from 3bd41bc to 1b61375 Compare October 29, 2025 14:52
@LayZeeDK LayZeeDK marked this pull request as ready for review October 29, 2025 14:53
@LayZeeDK LayZeeDK enabled auto-merge October 29, 2025 14:56
@LayZeeDK LayZeeDK merged commit 5d4ff03 into main Oct 29, 2025
14 checks passed
@LayZeeDK LayZeeDK deleted the copilot/extend-benchmarks-with-factory-function branch October 29, 2025 15:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Extend `benchmarkSuite to support factory functions to create a shared function scope

2 participants