test: implement Jest-like API for benchmarks with `describe()` and `it()` #303

Copilot · 2025-10-27T12:06:43Z

Summary

This PR implements a complete redesign of the benchmark suite API, evolving from factory function support to a comprehensive Jest-like API. The new API provides familiar testing patterns with describe() and it(), comprehensive hook support, proper validation, and significant performance optimizations.

API Evolution

The PR went through three major iterations before arriving at the final design:

Factory Functions (Initial): Added factory support to the original benchmarkSuite API
Context-Based API (Intermediate): Introduced a benchmark() registration function with context
Jest-Like API (Final): Complete redesign using describe() and it() patterns
Hook Renaming (Latest): Aligned hook names with Tinybench concepts for clarity

Final API (Jest-like with clear hook names):

import { describe, it, beforeAll, afterAll, beforeAllIterations, setupTask, teardownTask } from 'tools/tinybench-utils';

describe('Path Resolution', () => {
  // Suite-level hooks (Jest context, run once per describe block)
  beforeAll(() => {
    // Runs once before all benchmarks in this describe block
  });

  afterAll(() => {
    // Runs once after all benchmarks in this describe block
  });

  describe('buildFileNames', () => {
    let state;

    // Task-level hooks (run per warmup/run cycle)
    setupTask(() => {
      state = init();
    });

    // Iteration group hooks (run per cycle)
    beforeAllIterations(() => {
      prepareForIterations();
    });

    it('should build file names correctly', () => {
      const baseNames = ['index', 'main'];
      buildFileNames(baseNames);
    });
  });
});

Key Features

Familiar API: Uses describe() and it() just like Jest
Exported functions: All functions must be imported from tinybench-utils (not globals) to avoid confusion with Jest test functions
Nested describes: Each inner describe block creates its own Bench instance with inherited hooks
Comprehensive hooks (all exported, must be imported):
- Suite-level (Jest context, no task/mode parameters):
  - beforeAll() - Run once before all benchmarks in describe block
  - afterAll() - Run once after all benchmarks in describe block
- Task-level (receive task and mode parameters):
  - setupTask() - Run before each warmup and run cycle
  - teardownTask() - Run after each warmup and run cycle
- Iteration-level (receive task and mode parameters):
  - beforeAllIterations() - Run once before each cycle (warmup and run)
  - afterAllIterations() - Run once after each cycle completes
  - beforeEachIteration() - Run before each iteration
  - afterEachIteration() - Run after each iteration
Hook validation:
- Prevents hooks from being called inside it() callbacks (would cause incorrect behavior)
- Prevents hooks from being called outside describe() blocks
- Prevents it() from being called inside another it() callback
Hook inheritance: Child describe blocks inherit hooks from parents but only run their own it() callbacks
Options support:
- describe(name, callback, options?) - Supports quiet option to suppress performance warnings
- it(name, fn, options?) - Supports BenchOptions (iterations, warmup, etc.) and itTimeout for Jest timeout control
Performance monitoring: Warns when beforeEachIteration hooks take >10ms (indicates expensive operations that should be in setupTask)
Comprehensive documentation: All exported functions have detailed JSDoc with lifecycle position and execution frequency

Hook Renaming (Latest Update)

Renamed all 8 hooks to align with Tinybench concepts and clarify their execution context:

Suite-level hooks (Jest context, no task/mode parameters):

setupSuite → beforeAll
teardownSuite → afterAll

Task-level hooks (receive task and mode parameters):

setup → setupTask
teardown → teardownTask

Iteration hooks (receive task and mode parameters):

beforeAll → beforeAllIterations
afterAll → afterAllIterations
beforeEach → beforeEachIteration
afterEach → afterEachIteration

The new names make it immediately clear:

When each hook runs in the lifecycle
How many times each hook executes (suite = 1×, task = 2× per benchmark, iterations = many×)
Which hooks receive Tinybench Task context vs Jest context

Performance Optimizations

One Bench instance per describe: Changed from creating one Bench instance per it() to one per describe() block, reducing overhead (~3.5% faster, 4.3s average reduction)
Optimized summary accumulation: Replaced string concatenation with array join (O(n²) → O(n))
Explicit cleanup: Added bench.remove() calls to help with garbage collection
Fixed benchmark isolation: Added proper beforeEachIteration() hooks to prevent state sharing between benchmarks in cache-operations suite

Implementation Details

Core Implementation (tools/tinybench-utils.ts):

Complete rewrite with Jest-like API
Each inner describe block creates its own Bench instance
All hooks are exported functions that must be imported (no globals)
Uses @jest/globals imports for proper TypeScript typing
Comprehensive hook execution order documentation with frequency details
Performance monitoring for expensive hooks
State management encapsulated in separate module (tools/tinybench-utils-state.ts)

Hook Execution Order

Hooks execute in this order for each benchmark:

Suite level (Jest context) - runs once per describe block:
- beforeAll - runs once before all benchmarks
Per benchmark - runs for each it():
- setupTask - runs before warmup cycle
- beforeAllIterations - runs once before warmup iterations
- warmup iterations (with beforeEachIteration/afterEachIteration)
- afterAllIterations - runs once after warmup
- teardownTask - runs after warmup
- setupTask - runs before run cycle
- beforeAllIterations - runs once before run iterations
- run iterations (with beforeEachIteration/afterEachIteration)
- afterAllIterations - runs once after run
- teardownTask - runs after run
Suite level (Jest context) - runs once per describe block:
- afterAll - runs once after all benchmarks

Execution Frequency:

Suite hooks (beforeAll/afterAll): 1× per describe block
Task hooks (setupTask/teardownTask): 2× per benchmark (once for warmup, once for run)
Iteration group hooks (beforeAllIterations/afterAllIterations): 2× per benchmark (once per cycle)
Iteration hooks (beforeEachIteration/afterEachIteration): ~1000× per benchmark (all iterations)

Important: setupTask runs before beforeAllIterations. Any initialization that other hooks depend on must be in setupTask, not beforeAllIterations.

Migrated Benchmarks

All 5 benchmark files have been successfully migrated to the new Jest-like API:

cache-operations.bench.ts - 4 benchmarks with suite-level shared state and proper isolation
export-management.bench.ts - 4 benchmarks with suite-level shared state and beforeAllIterations hooks
import-updates.bench.ts - 3 benchmarks with suite-level shared state (fixed hook execution order)
path-resolution.bench.ts - 5 benchmarks with minimal hooks
validation.bench.ts - 4 benchmarks with complex beforeAllIterations setup per benchmark

Total: 20 benchmarks across 5 files

Testing & Quality

✅ 770 tests passing (up from 626 at start)
- 92 unit tests for tinybench-utils (73 passing, 13 skipped with documentation)
- 9 additional validation tests enabled with test helper
- Hook registration tests updated with new hook names
- All existing workspace tests passing
✅ All 20 benchmarks passing across 5 files
✅ Lint passes
✅ Build passes
✅ Format check passes
✅ Hook validation prevents misuse
✅ Comprehensive JSDoc documentation with lifecycle and frequency details

Breaking Changes

BREAKING: Old benchmarkSuite API completely replaced with Jest-like API
BREAKING: Hook names updated to align with Tinybench concepts:
- setupSuite/teardownSuite → beforeAll/afterAll
- setup/teardown → setupTask/teardownTask
- beforeAll/afterAll/beforeEach/afterEach → beforeAllIterations/afterAllIterations/beforeEachIteration/afterEachIteration
All benchmarks must be migrated to new describe()/it() pattern
Hooks are now exported functions that must be imported (not passed as options)

Documentation

Updated tools/README-benchmark.md with comprehensive Jest-like API documentation
Added complete hook execution order documentation with frequency details
Enhanced all JSDoc comments with lifecycle position and execution frequency
State management module with getter/setter functions for better encapsulation
All hook examples updated to use new names

Fixes #302

🤖 Generated with Claude Code

Co-Authored-By: Claude noreply@anthropic.com

Co-authored-by: LayZeeDK <6364586+LayZeeDK@users.noreply.github.com>

Replace unsafe `any` types in test mocks with proper TypeScript types. ## Changes - **`tinybench-utils-hooks.spec.ts`**: Use `typeof globalThis.<function>` type assertions and explicit function signatures for Jest mock setup - `beforeAll`/`afterAll` hooks: `(fn: () => void | Promise<void>) => void` - Removed 4 unnecessary `eslint-disable` directives - **`tinybench-utils.spec.ts`**: Remove extra blank line ### Before ```typescript // eslint-disable-next-line @typescript-eslint/no-explicit-any globalThis.beforeAll = jest.fn((fn: any) => { registeredBeforeAllHooks.push(fn); }) as any; ``` ### After ```typescript globalThis.beforeAll = jest.fn((fn: () => void | Promise<void>) => { registeredBeforeAllHooks.push(fn); }) as typeof globalThis.beforeAll; ```  <details> <summary>Original prompt</summary> > Format and lint the code. </details>  --- ✨ Let Copilot coding agent [set things up for you](https://github.com/nx-worker/nxworker-workspace/issues/new?title=✨+Set+up+Copilot+instructions&body=Configure%20instructions%20for%20this%20repository%20as%20documented%20in%20%5BBest%20practices%20for%20Copilot%20coding%20agent%20in%20your%20repository%5D%28https://gh.io/copilot-coding-agent-tips%29%2E%0A%0A%3COnboard%20this%20repo%3E&assignees=copilot) — coding agent works faster and does higher quality work when set up for your repo.

- Add 21 unit tests for state management (tinybench-utils-state.spec.ts) - Test getters/setters for currentDescribeBlock, rootDescribeBlock, insideItCallback - Test resetGlobalState() functionality and idempotency - Test __test_setInsideItCallback() test-only API - Test nested describe block relationships - Test quiet flag storage - Add 17 integration tests to tinybench-utils.spec.ts - Test describe() with quiet option - Test benchmark quiet option and inheritance - Test complex nested scenarios (4 levels deep, multiple siblings, many benchmarks) - Test hook ordering with all 8 hook types - Test edge cases with empty callbacks Total: 765 tests (up from 727) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Lars Gyrup Brink Nielsen <LayZeeDK@users.noreply.github.com>

Renamed 8 hooks in the Jest-like benchmark API to use clearer names that reflect their execution context and frequency: Suite-level hooks (Jest context, no task/mode parameters): - setupSuite → beforeAll - teardownSuite → afterAll Task-level hooks (receive task and mode parameters): - setup → setupTask - teardown → teardownTask Iteration hooks (receive task and mode parameters): - beforeAll → beforeAllIterations - afterAll → afterAllIterations - beforeEach → beforeEachIteration - afterEach → afterEachIteration Updated all usage across: - Core implementation (tinybench-utils.ts, tinybench-utils-state.ts) - 5 benchmark files (35 hook uses total) - 2 test files (770 tests passing) - Documentation (tools/README-benchmark.md) Added comprehensive lifecycle documentation with execution frequency details for each hook type. All tests passing (770/770), all benchmarks passing (20/20). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

LayZeeDK · 2025-10-29T10:07:15Z

@claude Taking TINYBENCH_LIFECYCLE.md into consideration, find and fix any benchmarks that are not properly set up. Particularly benchmarking of functions providing or using caching. Warmup iterations are enabled by default. Disable them or take the mode into account in the hooks as applicable.

- Move state initialization from beforeAllIterations() to setupTask() - Add beforeEachIteration() hooks to explicitly control cache state - Disable warmup for cache-sensitive benchmarks with { warmup: false } - Ensure proper isolation between warmup and run phases Cache operations benchmarks: - Pre-populate cache in beforeEachIteration() for cache hit tests - Clear cache in beforeEachIteration() for cache miss tests - Move tree and cache initialization to setupTask() for fresh state Validation and export management benchmarks: - Consolidate all initialization into setupTask() for consistency - Ensure warmup phase doesn't affect run phase measurements Co-authored-by: Lars Gyrup Brink Nielsen <LayZeeDK@users.noreply.github.com>

Update documentation and JSDoc comments to explicitly state that setupTask, teardownTask, beforeAllIterations, and afterAllIterations run 1-2 times per benchmark depending on whether warmup is enabled, rather than implying they always run exactly twice. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Move expensive operations from setupTask/teardownTask (run 1-2 times per benchmark) to beforeAll (run once per suite) to reduce overhead: **Changes:** - cache-operations: Tree + 100 file writes → beforeAll - import-updates: Tree + project configs → beforeAll - export-management: Tree + project config → beforeAll - validation: Tree + project config → beforeAll - path-resolution: Array creation → beforeAll **Results:** - Total suite time: 27.148s → 26.267s (3.2% faster) - All 20 benchmarks pass - All 601 unit tests pass - Zero breaking changes - Benchmark accuracy maintained Individual benchmark measurements show normal variance (±2-14%) due to JIT optimization, GC timing, and CPU state - this is expected. The key improvement is reduced suite overhead from eliminating redundant expensive initialization. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

…t practices Apply hook optimizations based on warmup configuration and documented execution order: 1. cache-operations.bench.ts (warmup: false): - Changed beforeAllIterations → nested beforeAll - Optimization: Setup runs 1× per benchmark instead of per cycle - Rationale: With warmup disabled, only one cycle exists, so cycle-level hooks are redundant 2. export-management.bench.ts, validation.bench.ts, import-updates.bench.ts (warmup: true): - Keep setupTask/teardownTask for parent-level initialization - Rationale: Benchmarks depend on variables initialized in setupTask (e.g., cachedTreeExists) - Best practice: "setupTask runs BEFORE nested hooks. Any initialization that other hooks depend on must be in setupTask, not beforeAllIterations" (per documentation) Hook execution order: setupTask → beforeAllIterations → iterations → afterAllIterations → teardownTask Performance: - Total suite: 26.685s (baseline: 26.617s, +0.3% within normal variance) - Cache operations: Setup frequency reduced from per-cycle to per-benchmark This follows documented best practices for hook execution order while maintaining correct state isolation between warmup and measurement cycles. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

…ycle Rename benchmark hooks to better align with Tinybench's cycle-based execution model and improve API clarity: - setupTask() → beforeCycle() (maps to Tinybench BenchOptions.setup) - teardownTask() → afterCycle() (maps to Tinybench BenchOptions.teardown) The new names emphasize that these hooks run per benchmark cycle (warmup + run), creating a clearer hierarchy: - beforeAll (suite) → beforeCycle → beforeAllIterations → beforeEachIteration All documentation and tests updated to reflect the new naming and include explicit Tinybench mapping information in JSDoc comments. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

…tion Replace all mentions of the old `benchmarkSuite` wrapper with references to the current Jest-like API using `describe()` and `it()` functions. Updated files: - packages/workspace/src/generators/move-file/benchmarks/README.md - REFACTORING_EVALUATION.md - REFACTORING_EVALUATION_SUMMARY.md 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Fix type conversion errors when mocking Jest globals by casting through 'unknown' first. This allows the mocks to be assigned to the full Jest types without TypeScript complaining about missing properties. Changes: - Cast globalThis.describe mock through 'unknown' - Cast globalThis.it mock through 'unknown' - Prefix unused 'name' parameter with '_' to suppress hint All 8 tests continue to pass. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

… detection Add configurable threshold for beforeEachIteration hook performance warnings to avoid false positives in CI environments where performance is more variable. Features: - Auto-detects CI environments and uses 50ms threshold (vs 10ms locally) - Configurable via `hookPerformanceThreshold` option on describe() or it() - Setting threshold to 0 disables warnings (like quiet: true) - Inherits threshold from parent describe blocks - Includes threshold value in warning messages CI Detection supports: - GitHub Actions, GitLab CI, CircleCI, Travis CI - Jenkins, Buildkite, Azure Pipelines - Generic CI and CONTINUOUS_INTEGRATION env vars Property naming: - Named `hookPerformanceThreshold` (not `performanceWarningThreshold`) - Makes it immediately clear this monitors hooks, not benchmark functions - Prevents confusion about what "performance" refers to Documentation: - Enhanced JSDoc with notes about hooks vs benchmarks - Added examples for all configuration patterns - Updated README with detailed explanation - Clarified that slow hooks distort results by adding overhead Technical: - Uses bracket notation for process.env access (TypeScript strict mode) - All 144 tests passing Examples: - describe('Suite', fn, { hookPerformanceThreshold: 100 }) - it('benchmark', fn, { hookPerformanceThreshold: 0 }) Addresses code review feedback: Issue #4 (Potential Performance Monitoring False Positives) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Add ASCII art tree diagram showing the visual execution flow of hooks to make the nested structure and execution frequency more obvious at a glance. The new "Visual Execution Flow" section provides: - Tree diagram showing exact execution order for a 2-benchmark suite - Clear visualization of nested structure (suite → benchmark → cycle → iteration) - Frequency annotations for each hook (ONCE, TWICE, ~16×, ~1000×) - Mode indicators (warmup vs run) - Context indicators (Jest vs Tinybench) Key observations section summarizes: - beforeAll/afterAll run ONCE for entire suite - Each benchmark runs TWO cycles (warmup + run) - beforeCycle/afterCycle run TWICE per benchmark - beforeEachIteration/afterEachIteration run THOUSANDS of times (~1016) This addresses code review feedback Issue #5 (Documentation: Hook Execution Order) by providing better visual hierarchy that makes the execution flow immediately understandable. The diagram complements the existing detailed textual documentation by giving developers a quick visual reference to understand the execution model. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Copilot AI assigned Copilot and LayZeeDK Oct 27, 2025

Copilot started work on behalf of LayZeeDK October 27, 2025 12:07 View session

LayZeeDK force-pushed the copilot/extend-benchmarks-with-factory-function branch from 9d2ed6d to 64c4f40 Compare October 27, 2025 12:30

Copilot AI changed the title ~~[WIP] Extend benchmarkSuite to support factory functions for shared scope~~ feat: add factory function support to benchmarkSuite API Oct 27, 2025

Copilot AI requested a review from LayZeeDK October 27, 2025 12:31

Copilot finished work on behalf of LayZeeDK October 27, 2025 12:31