-
-
Notifications
You must be signed in to change notification settings - Fork 14
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Summary
Add a Go benchmark suite that measures dedup pipeline performance across different data sizes and configurations. Publish results in CI.
Motivation
Publishable benchmark numbers are useful for:
- Blog posts and conference talks ("12ms for 50 chunks" backed by reproducible data)
- Detecting performance regressions on PRs
- Comparing algorithm variants (e.g., average vs complete linkage)
- Giving users confidence in production readiness
Benchmarks to add
// pkg/contextlab/
BenchmarkCluster_10Chunks
BenchmarkCluster_50Chunks
BenchmarkCluster_100Chunks
BenchmarkCluster_500Chunks
BenchmarkMMR_10Chunks
BenchmarkMMR_50Chunks
BenchmarkSelector_10Clusters
BenchmarkSelector_50Clusters
// pkg/dedup/
BenchmarkDistanceMatrix_50
BenchmarkDistanceMatrix_200
// pkg/compress/
BenchmarkCompress_ShortText
BenchmarkCompress_LongText
// End-to-end
BenchmarkFullPipeline_50Chunks
BenchmarkFullPipeline_200ChunksCI integration
- Add
benchmarkjob toci.ymlthat runsgo test -bench=. -benchmem ./... - Use
benchstatto compare against baseline on PRs - Store baseline results in
testdata/benchmarks/baseline.txt - Comment on PR with performance diff if regression > 10%
Deliverables
- Benchmark functions in
*_test.gofiles - Synthetic test data generator (deterministic embeddings)
-
make benchtarget - CI workflow for benchmark comparison
-
BENCHMARKS.mdwith latest results table
Acceptance Criteria
-
go test -bench=. ./...runs all benchmarks - Results are reproducible (deterministic test data)
- CI detects regressions > 10%
- README links to benchmark results
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request