-
-
Notifications
You must be signed in to change notification settings - Fork 0
test: implement Jest-like API for benchmarks with describe() and it()
#303
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
LayZeeDK
merged 46 commits into
main
from
copilot/extend-benchmarks-with-factory-function
Oct 29, 2025
Merged
test: implement Jest-like API for benchmarks with describe() and it()
#303
LayZeeDK
merged 46 commits into
main
from
copilot/extend-benchmarks-with-factory-function
Oct 29, 2025
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
9d2ed6d to
64c4f40
Compare
Copilot
AI
changed the title
[WIP] Extend benchmarkSuite to support factory functions for shared scope
feat: add factory function support to benchmarkSuite API
Oct 27, 2025
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
Co-authored-by: LayZeeDK <6364586+LayZeeDK@users.noreply.github.com>
Replace unsafe `any` types in test mocks with proper TypeScript types.
## Changes
- **`tinybench-utils-hooks.spec.ts`**: Use `typeof
globalThis.<function>` type assertions and explicit function signatures
for Jest mock setup
- `beforeAll`/`afterAll` hooks: `(fn: () => void | Promise<void>) =>
void`
- Removed 4 unnecessary `eslint-disable` directives
- **`tinybench-utils.spec.ts`**: Remove extra blank line
### Before
```typescript
// eslint-disable-next-line @typescript-eslint/no-explicit-any
globalThis.beforeAll = jest.fn((fn: any) => {
registeredBeforeAllHooks.push(fn);
}) as any;
```
### After
```typescript
globalThis.beforeAll = jest.fn((fn: () => void | Promise<void>) => {
registeredBeforeAllHooks.push(fn);
}) as typeof globalThis.beforeAll;
```
<!-- START COPILOT CODING AGENT SUFFIX -->
<details>
<summary>Original prompt</summary>
> Format and lint the code.
</details>
<!-- START COPILOT CODING AGENT TIPS -->
---
✨ Let Copilot coding agent [set things up for
you](https://github.com/nx-worker/nxworker-workspace/issues/new?title=✨+Set+up+Copilot+instructions&body=Configure%20instructions%20for%20this%20repository%20as%20documented%20in%20%5BBest%20practices%20for%20Copilot%20coding%20agent%20in%20your%20repository%5D%28https://gh.io/copilot-coding-agent-tips%29%2E%0A%0A%3COnboard%20this%20repo%3E&assignees=copilot)
— coding agent works faster and does higher quality work when set up for
your repo.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
- Add 21 unit tests for state management (tinybench-utils-state.spec.ts) - Test getters/setters for currentDescribeBlock, rootDescribeBlock, insideItCallback - Test resetGlobalState() functionality and idempotency - Test __test_setInsideItCallback() test-only API - Test nested describe block relationships - Test quiet flag storage - Add 17 integration tests to tinybench-utils.spec.ts - Test describe() with quiet option - Test benchmark quiet option and inheritance - Test complex nested scenarios (4 levels deep, multiple siblings, many benchmarks) - Test hook ordering with all 8 hook types - Test edge cases with empty callbacks Total: 765 tests (up from 727) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Lars Gyrup Brink Nielsen <LayZeeDK@users.noreply.github.com>
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
Renamed 8 hooks in the Jest-like benchmark API to use clearer names that reflect their execution context and frequency: Suite-level hooks (Jest context, no task/mode parameters): - setupSuite → beforeAll - teardownSuite → afterAll Task-level hooks (receive task and mode parameters): - setup → setupTask - teardown → teardownTask Iteration hooks (receive task and mode parameters): - beforeAll → beforeAllIterations - afterAll → afterAllIterations - beforeEach → beforeEachIteration - afterEach → afterEachIteration Updated all usage across: - Core implementation (tinybench-utils.ts, tinybench-utils-state.ts) - 5 benchmark files (35 hook uses total) - 2 test files (770 tests passing) - Documentation (tools/README-benchmark.md) Added comprehensive lifecycle documentation with execution frequency details for each hook type. All tests passing (770/770), all benchmarks passing (20/20). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Member
|
@claude Taking |
This comment was marked as resolved.
This comment was marked as resolved.
- Move state initialization from beforeAllIterations() to setupTask()
- Add beforeEachIteration() hooks to explicitly control cache state
- Disable warmup for cache-sensitive benchmarks with { warmup: false }
- Ensure proper isolation between warmup and run phases
Cache operations benchmarks:
- Pre-populate cache in beforeEachIteration() for cache hit tests
- Clear cache in beforeEachIteration() for cache miss tests
- Move tree and cache initialization to setupTask() for fresh state
Validation and export management benchmarks:
- Consolidate all initialization into setupTask() for consistency
- Ensure warmup phase doesn't affect run phase measurements
Co-authored-by: Lars Gyrup Brink Nielsen <LayZeeDK@users.noreply.github.com>
Update documentation and JSDoc comments to explicitly state that setupTask, teardownTask, beforeAllIterations, and afterAllIterations run 1-2 times per benchmark depending on whether warmup is enabled, rather than implying they always run exactly twice. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Move expensive operations from setupTask/teardownTask (run 1-2 times per benchmark) to beforeAll (run once per suite) to reduce overhead: **Changes:** - cache-operations: Tree + 100 file writes → beforeAll - import-updates: Tree + project configs → beforeAll - export-management: Tree + project config → beforeAll - validation: Tree + project config → beforeAll - path-resolution: Array creation → beforeAll **Results:** - Total suite time: 27.148s → 26.267s (3.2% faster) - All 20 benchmarks pass - All 601 unit tests pass - Zero breaking changes - Benchmark accuracy maintained Individual benchmark measurements show normal variance (±2-14%) due to JIT optimization, GC timing, and CPU state - this is expected. The key improvement is reduced suite overhead from eliminating redundant expensive initialization. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
…t practices
Apply hook optimizations based on warmup configuration and documented execution order:
1. cache-operations.bench.ts (warmup: false):
- Changed beforeAllIterations → nested beforeAll
- Optimization: Setup runs 1× per benchmark instead of per cycle
- Rationale: With warmup disabled, only one cycle exists, so cycle-level hooks are redundant
2. export-management.bench.ts, validation.bench.ts, import-updates.bench.ts (warmup: true):
- Keep setupTask/teardownTask for parent-level initialization
- Rationale: Benchmarks depend on variables initialized in setupTask (e.g., cachedTreeExists)
- Best practice: "setupTask runs BEFORE nested hooks. Any initialization that other hooks
depend on must be in setupTask, not beforeAllIterations" (per documentation)
Hook execution order:
setupTask → beforeAllIterations → iterations → afterAllIterations → teardownTask
Performance:
- Total suite: 26.685s (baseline: 26.617s, +0.3% within normal variance)
- Cache operations: Setup frequency reduced from per-cycle to per-benchmark
This follows documented best practices for hook execution order while maintaining
correct state isolation between warmup and measurement cycles.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
…ycle Rename benchmark hooks to better align with Tinybench's cycle-based execution model and improve API clarity: - setupTask() → beforeCycle() (maps to Tinybench BenchOptions.setup) - teardownTask() → afterCycle() (maps to Tinybench BenchOptions.teardown) The new names emphasize that these hooks run per benchmark cycle (warmup + run), creating a clearer hierarchy: - beforeAll (suite) → beforeCycle → beforeAllIterations → beforeEachIteration All documentation and tests updated to reflect the new naming and include explicit Tinybench mapping information in JSDoc comments. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This comment was marked as resolved.
This comment was marked as resolved.
…tion Replace all mentions of the old `benchmarkSuite` wrapper with references to the current Jest-like API using `describe()` and `it()` functions. Updated files: - packages/workspace/src/generators/move-file/benchmarks/README.md - REFACTORING_EVALUATION.md - REFACTORING_EVALUATION_SUMMARY.md 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Fix type conversion errors when mocking Jest globals by casting through 'unknown' first. This allows the mocks to be assigned to the full Jest types without TypeScript complaining about missing properties. Changes: - Cast globalThis.describe mock through 'unknown' - Cast globalThis.it mock through 'unknown' - Prefix unused 'name' parameter with '_' to suppress hint All 8 tests continue to pass. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
… detection
Add configurable threshold for beforeEachIteration hook performance warnings
to avoid false positives in CI environments where performance is more variable.
Features:
- Auto-detects CI environments and uses 50ms threshold (vs 10ms locally)
- Configurable via `hookPerformanceThreshold` option on describe() or it()
- Setting threshold to 0 disables warnings (like quiet: true)
- Inherits threshold from parent describe blocks
- Includes threshold value in warning messages
CI Detection supports:
- GitHub Actions, GitLab CI, CircleCI, Travis CI
- Jenkins, Buildkite, Azure Pipelines
- Generic CI and CONTINUOUS_INTEGRATION env vars
Property naming:
- Named `hookPerformanceThreshold` (not `performanceWarningThreshold`)
- Makes it immediately clear this monitors hooks, not benchmark functions
- Prevents confusion about what "performance" refers to
Documentation:
- Enhanced JSDoc with notes about hooks vs benchmarks
- Added examples for all configuration patterns
- Updated README with detailed explanation
- Clarified that slow hooks distort results by adding overhead
Technical:
- Uses bracket notation for process.env access (TypeScript strict mode)
- All 144 tests passing
Examples:
- describe('Suite', fn, { hookPerformanceThreshold: 100 })
- it('benchmark', fn, { hookPerformanceThreshold: 0 })
Addresses code review feedback: Issue #4 (Potential Performance Monitoring
False Positives)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Add ASCII art tree diagram showing the visual execution flow of hooks to make the nested structure and execution frequency more obvious at a glance. The new "Visual Execution Flow" section provides: - Tree diagram showing exact execution order for a 2-benchmark suite - Clear visualization of nested structure (suite → benchmark → cycle → iteration) - Frequency annotations for each hook (ONCE, TWICE, ~16×, ~1000×) - Mode indicators (warmup vs run) - Context indicators (Jest vs Tinybench) Key observations section summarizes: - beforeAll/afterAll run ONCE for entire suite - Each benchmark runs TWO cycles (warmup + run) - beforeCycle/afterCycle run TWICE per benchmark - beforeEachIteration/afterEachIteration run THOUSANDS of times (~1016) This addresses code review feedback Issue #5 (Documentation: Hook Execution Order) by providing better visual hierarchy that makes the execution flow immediately understandable. The diagram complements the existing detailed textual documentation by giving developers a quick visual reference to understand the execution model. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
3bd41bc to
1b61375
Compare
LayZeeDK
approved these changes
Oct 29, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR implements a complete redesign of the benchmark suite API, evolving from factory function support to a comprehensive Jest-like API. The new API provides familiar testing patterns with
describe()andit(), comprehensive hook support, proper validation, and significant performance optimizations.API Evolution
The PR went through three major iterations before arriving at the final design:
benchmarkSuiteAPIbenchmark()registration function with contextdescribe()andit()patternsFinal API (Jest-like with clear hook names):
Key Features
describe()andit()just like Jesttinybench-utils(not globals) to avoid confusion with Jest test functionsdescribeblock creates its own Bench instance with inherited hooksbeforeAll()- Run once before all benchmarks in describe blockafterAll()- Run once after all benchmarks in describe blocksetupTask()- Run before each warmup and run cycleteardownTask()- Run after each warmup and run cyclebeforeAllIterations()- Run once before each cycle (warmup and run)afterAllIterations()- Run once after each cycle completesbeforeEachIteration()- Run before each iterationafterEachIteration()- Run after each iterationit()callbacks (would cause incorrect behavior)describe()blocksit()from being called inside anotherit()callbackit()callbacksdescribe(name, callback, options?)- Supportsquietoption to suppress performance warningsit(name, fn, options?)- Supports BenchOptions (iterations,warmup, etc.) anditTimeoutfor Jest timeout controlbeforeEachIterationhooks take >10ms (indicates expensive operations that should be insetupTask)Hook Renaming (Latest Update)
Renamed all 8 hooks to align with Tinybench concepts and clarify their execution context:
Suite-level hooks (Jest context, no task/mode parameters):
setupSuite→beforeAllteardownSuite→afterAllTask-level hooks (receive task and mode parameters):
setup→setupTaskteardown→teardownTaskIteration hooks (receive task and mode parameters):
beforeAll→beforeAllIterationsafterAll→afterAllIterationsbeforeEach→beforeEachIterationafterEach→afterEachIterationThe new names make it immediately clear:
Performance Optimizations
it()to one perdescribe()block, reducing overhead (~3.5% faster, 4.3s average reduction)bench.remove()calls to help with garbage collectionbeforeEachIteration()hooks to prevent state sharing between benchmarks in cache-operations suiteImplementation Details
Core Implementation (
tools/tinybench-utils.ts):@jest/globalsimports for proper TypeScript typingtools/tinybench-utils-state.ts)Hook Execution Order
Hooks execute in this order for each benchmark:
Suite level (Jest context) - runs once per describe block:
beforeAll- runs once before all benchmarksPer benchmark - runs for each
it():setupTask- runs before warmup cyclebeforeAllIterations- runs once before warmup iterationsbeforeEachIteration/afterEachIteration)afterAllIterations- runs once after warmupteardownTask- runs after warmupsetupTask- runs before run cyclebeforeAllIterations- runs once before run iterationsbeforeEachIteration/afterEachIteration)afterAllIterations- runs once after runteardownTask- runs after runSuite level (Jest context) - runs once per describe block:
afterAll- runs once after all benchmarksExecution Frequency:
beforeAll/afterAll): 1× per describe blocksetupTask/teardownTask): 2× per benchmark (once for warmup, once for run)beforeAllIterations/afterAllIterations): 2× per benchmark (once per cycle)beforeEachIteration/afterEachIteration): ~1000× per benchmark (all iterations)Important:
setupTaskruns beforebeforeAllIterations. Any initialization that other hooks depend on must be insetupTask, notbeforeAllIterations.Migrated Benchmarks
All 5 benchmark files have been successfully migrated to the new Jest-like API:
cache-operations.bench.ts- 4 benchmarks with suite-level shared state and proper isolationexport-management.bench.ts- 4 benchmarks with suite-level shared state and beforeAllIterations hooksimport-updates.bench.ts- 3 benchmarks with suite-level shared state (fixed hook execution order)path-resolution.bench.ts- 5 benchmarks with minimal hooksvalidation.bench.ts- 4 benchmarks with complex beforeAllIterations setup per benchmarkTotal: 20 benchmarks across 5 files
Testing & Quality
Breaking Changes
benchmarkSuiteAPI completely replaced with Jest-like APIsetupSuite/teardownSuite→beforeAll/afterAllsetup/teardown→setupTask/teardownTaskbeforeAll/afterAll/beforeEach/afterEach→beforeAllIterations/afterAllIterations/beforeEachIteration/afterEachIterationdescribe()/it()patternDocumentation
tools/README-benchmark.mdwith comprehensive Jest-like API documentationFixes #302
🤖 Generated with Claude Code
Co-Authored-By: Claude noreply@anthropic.com