Performance testing by AivanF · Pull Request #8 · MotleyAI/slayer

AivanF · 2026-04-01T17:55:04Z

Designed and implemented:

Parametrised scalable pseudo-random deterministic (via weighted random selection algorithm) e-commerce with seasonality data seeding.
Generalised performance benchmarking with easily configurable sets of data sizes and of test cases.
A script to convert output to CSV for building charts.

Summary by CodeRabbit

Tests
- Added a full performance benchmarking suite: deterministic dataset generation and seeding, session-scoped benchmark environments across multiple dataset scales (1k–40k), parameterized query workloads, and a tool to convert pytest-benchmark JSON into pivoted CSV with estimated complexity.
Chores
- Added pytest-benchmark to dev dependencies, updated .gitignore to ignore local benchmark artifacts, and introduced Makefile targets to run, export, and convert benchmark reports.

coderabbitai · 2026-04-01T17:55:18Z

Caution

Review failed

Pull request was closed or merged during review

📝 Walkthrough

Walkthrough

Adds a performance benchmarking suite: deterministic data generator and seeder, pytest session fixtures for multiple dataset scales, parameterized pytest-benchmark tests, a bench JSON→CSV converter and CLI, Makefile targets for running/reporting benchmarks, and dev tooling/config updates.

Changes

Cohort / File(s)	Summary
Config `\.gitignore`, `pyproject.toml`	Added `benchmarks/` ignore entry under `examples/postgres/slayer_data/`; added dev dependency `pytest-benchmark = "^5.2.3"`.
Benchmark CSV Tool `tests/perf/bench_to_csv.py`	New CLI/module to convert pytest-benchmark JSON into a pivoted CSV; infers per-query complexity via log-log median exponent; supports stdout or `-o` output file.
Fixtures & Params `tests/perf/conftest.py`, `tests/perf/params.py`	New session-scoped `env_<scale>` fixtures driven by `SCALES`; create temporary or external DB, seed data, apply INDEXES, configure YAMLStorage, instantiate SlayerQueryEngine, and warm up queries. `params.py` centralizes DB backend, scales, dates, indexes, and seed.
Data Generation & Seeding `tests/perf/seed.py`	New deterministic LCG-based dataset generator and SQL seeder with dataclasses (`Region, Shop, Customer, Order, Dataset`), deterministic timestamps/fields, weighted sampling, and batched raw-SQL inserts.
Benchmark Tests `tests/perf/test_bench.py`	New pytest-benchmark test module: many predefined query workloads; dynamically generated per-scale benchmark classes that run queries via fixtures and record timings.
Make Targets `Makefile`	New Makefile with `BENCH_FLAGS` and targets: `test`, `lint`, `bench`, `bench-report`, `bench-csv` to run benchmarks, produce JSON reports, and convert to CSV.

Sequence Diagram

sequenceDiagram
    participant Runner as Test Runner
    participant Fixture as pytest fixture
    participant Generator as Data Generator
    participant DB as Database
    participant Engine as SlayerQueryEngine

    note over Runner,Engine: benchmark setup and execution

    Runner->>Fixture: request env_<scale>
    Fixture->>Generator: generate_dataset(order_count)
    activate Generator
    Generator-->>Fixture: Dataset
    deactivate Generator

    Fixture->>DB: create/connect DB and seed_database(dataset)
    activate DB
    DB-->>Fixture: DB ready (schema, rows, INDEXES applied)
    deactivate DB

    Fixture->>Engine: init (YAMLStorage + DB)
    Fixture-->>Runner: return (Engine, Dataset)

    Runner->>Engine: execute SlayerQuery (benchmarked)
    activate Engine
    Engine->>DB: run SQL
    DB-->>Engine: rows
    Engine-->>Runner: SlayerResponse (timing)
    deactivate Engine

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Suggested reviewers

ZmeiGorynych

Poem

🐰 I seeded fields with careful hops and code,
Shops, customers, and orders down the road.
Fixtures hum, engines wake to run—
Benchmarks captured, CSVs begun.
Hop—timings stored and tales of speed are told.

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 52.63% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check	❓ Inconclusive	The title 'Performance testing' is vague and generic; it does not clearly specify the main change (e.g., adding benchmarking infrastructure, seeder, or CSV conversion tools).	Consider a more specific title such as 'Add performance benchmarking infrastructure with parametrized seeder and CSV conversion' to clearly convey the primary changes.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch aivan/2026-04-01-Performance-Testing

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

tests/perf/seed.py (2)
408-459: Consider keyword-only parameters and moving import to top of file.

Two minor suggestions:

Per coding guidelines, functions with >1 parameter should use keyword arguments

The sqlalchemy import could be moved to the top of the file for consistency
♻️ Optional refactor

Move import to top:
+import sqlalchemy as sa
 from dataclasses import dataclass
 from datetime import datetime, timedelta
Use keyword-only parameters:
-def seed_database(engine, dataset: Dataset) -> None:
+def seed_database(*, engine, dataset: Dataset) -> None:
As per coding guidelines: "Use keyword arguments for functions with more than 1 parameter" and "Place imports at the top of files".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/perf/seed.py` around lines 408 - 459, The function seed_database
currently takes positional parameters (engine, dataset) and imports sqlalchemy
inside the function; change the signature to require keyword-only parameters
(def seed_database(*, engine, dataset): or def seed_database(engine, *,
dataset):) so callers must pass by name, move the "import sqlalchemy as sa" to
the top-level imports, and update any call sites to pass engine=... and
dataset=...; ensure the function still references sa in its body (e.g., sa.text)
after the import move.
140-154: Consider using keyword-only arguments for public API.

Per coding guidelines, functions with more than 1 parameter should use keyword arguments. The generate_dataset function signature could be clearer with keyword-only parameters.
♻️ Optional refactor
 def generate_dataset(
-    order_count: int,
-    start_date: str = "2023-01-01",
-    end_date: str = "2024-12-31",
-    seed: int = 42,
+    *,
+    order_count: int,
+    start_date: str = "2023-01-01",
+    end_date: str = "2024-12-31",
+    seed: int = 42,
 ) -> Dataset:
This would require updating the call site in conftest.py:
dataset = generate_dataset(order_count=order_count)
As per coding guidelines: "Use keyword arguments for functions with more than 1 parameter".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/perf/seed.py` around lines 140 - 154, The public function
generate_dataset currently accepts multiple positional parameters; change its
signature to enforce keyword-only arguments (e.g., def
generate_dataset(order_count: int, *, start_date: str = "2023-01-01", end_date:
str = "2024-12-31", seed: int = 42) -> Dataset) so callers must pass start_date,
end_date, and seed by name; update any call sites (such as the test conftest.py
that invokes generate_dataset) to use keyword invocation like
generate_dataset(order_count=order_count) and leave internal usage of _prng
unchanged.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/perf/bench_to_csv.py`:
- Around line 68-69: The code reads the output path via
sys.argv[sys.argv.index("-o") + 1] which will raise IndexError if "-o" is the
last arg; update the logic around out_path to first find the index =
sys.argv.index("-o") and confirm index + 1 < len(sys.argv) (or use a try/except)
before accessing the next element, and if missing, print a clear error/usage
message or exit; modify the block that sets out_path to validate presence of the
following value rather than directly indexing into sys.argv.

---

Nitpick comments:
In `@tests/perf/seed.py`:
- Around line 408-459: The function seed_database currently takes positional
parameters (engine, dataset) and imports sqlalchemy inside the function; change
the signature to require keyword-only parameters (def seed_database(*, engine,
dataset): or def seed_database(engine, *, dataset):) so callers must pass by
name, move the "import sqlalchemy as sa" to the top-level imports, and update
any call sites to pass engine=... and dataset=...; ensure the function still
references sa in its body (e.g., sa.text) after the import move.
- Around line 140-154: The public function generate_dataset currently accepts
multiple positional parameters; change its signature to enforce keyword-only
arguments (e.g., def generate_dataset(order_count: int, *, start_date: str =
"2023-01-01", end_date: str = "2024-12-31", seed: int = 42) -> Dataset) so
callers must pass start_date, end_date, and seed by name; update any call sites
(such as the test conftest.py that invokes generate_dataset) to use keyword
invocation like generate_dataset(order_count=order_count) and leave internal
usage of _prng unchanged.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: bc476819-2dae-4fe8-b2bb-8b063f9b8ec3

📥 Commits

Reviewing files that changed from the base of the PR and between 0fc1d72 and 2e94060.

⛔ Files ignored due to path filters (1)

poetry.lock is excluded by !**/*.lock

📒 Files selected for processing (7)

.gitignore
pyproject.toml
tests/perf/__init__.py
tests/perf/bench_to_csv.py
tests/perf/conftest.py
tests/perf/seed.py
tests/perf/test_bench.py

coderabbitai

Actionable comments posted: 7

🧹 Nitpick comments (2)

Makefile (1)

5-5: Consider adding all and clean phony targets.

checkmake is already flagging this file for missing standard entrypoints. Thin aliases are enough if you want to keep the Makefile minimal.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@Makefile` at line 5, Add the standard phony targets "all" and "clean" to
satisfy checkmake: declare them alongside the existing .PHONY line and implement
thin aliases so "all" depends on the primary validation targets (e.g., test
lint) and "clean" removes build/test artifacts (or delegates to existing cleanup
commands) using the Makefile's existing targets and conventions (refer to .PHONY
and the test, lint, bench, bench-report, bench-csv target names to locate where
to wire these aliases).

tests/perf/seed.py (1)

92-133: Use Pydantic models for the benchmark entities.

Region, Shop, Customer, Order, and Dataset are the module’s canonical models, but they’re plain dataclasses. That diverges from the repo standard and drops the validation/serialization layer right where bad seed data is hardest to debug.

As per coding guidelines, "Use Python 3.11+ with Pydantic v2 for all models".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/perf/seed.py` around lines 92 - 133, Replace the plain dataclasses with
Pydantic v2 models: convert Region, Shop, Customer, Order, and Dataset to
inherit from pydantic.BaseModel (import from pydantic) and use typing
annotations (e.g., int, str, datetime, list[int], Optional[datetime]) and
pydantic.Field where defaults/metadata are needed; ensure Order fields
completed_at and cancelled_at are Optional[datetime], keep avg_cost as int
(cents) and avg_frequency/size as int, and make Dataset reference lists of the
Pydantic model classes (list[Region], list[Shop], etc.) so
validation/serialization are automatic. Also add any lightweight validators (if
needed) on Shop.avg_cost or Order.cost to enforce non-negative values using
model validators in the corresponding model names (Region, Shop, Customer,
Order, Dataset).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@Makefile`:
- Around line 1-3: The comment and BENCH_FLAGS disagree: the comment says
minimum 10 rounds but BENCH_FLAGS sets --benchmark-min-rounds=5; pick one and
make them consistent by either updating BENCH_FLAGS to use
--benchmark-min-rounds=10 or changing the comment to state 5 rounds; edit the
Makefile, update the BENCH_FLAGS definition (the variable name BENCH_FLAGS and
its --benchmark-min-rounds flag) or the top-line comment to match the chosen
value so documentation and flags align.

In `@tests/perf/conftest.py`:
- Around line 126-132: The current loop in db_engine.connect() that executes
INDEXES swallows all exceptions and can let benchmarks run on the wrong schema;
update the block around conn.execute(sa.text(idx_sql)) to capture failures
(collect failing idx_sql values and their exception messages), log them via the
test logger or process logger including the dialect (db_engine.dialect.name) and
then fail fast by raising an exception after the loop (or re-raising
immediately) so the test run aborts for that dialect instead of proceeding
without the intended indexes; reference the INDEXES constant, the
conn.execute(...) call, and the surrounding with db_engine.connect() /
conn.commit() logic when making the change.
- Around line 112-117: The current DB_BACKEND=="url" branch calls
seed_database(engine=db_engine, dataset=dataset, clean=True) unconditionally
which can destructively reset any external DB; change this to require an
explicit opt-in before running destructive cleanup: add a check for a dedicated
benchmark marker or a boolean env/param like BENCH_DB_RESET (or require the
DB_URL to reference a dedicated schema/database name such as containing
"bench"), and only call seed_database when that opt-in is present; otherwise
raise a clear ValueError requiring BENCH_DB_RESET or a bench-specific DB_URL.
Update the logic around DB_BACKEND, DB_URL, DB_TYPE, seed_database and
DatasourceConfig so the DatasourceConfig is still created for read-only runs but
destructive seed_database(clean=True) is gated behind the explicit opt-in.
- Around line 156-162: The fixtures created inside the loop are registered with
the inner function name "_fixture" so pytest can't find names like "env_1k";
update the decorator call in _make_fixture so the pytest fixture uses the
desired name by passing name=f"env_{_name}" to `@pytest.fixture` (i.e., change the
decorator on _fixture inside _make_fixture to `@pytest.fixture`(scope="session",
name=f"env_{_name}")), ensuring SCALES, _make_fixture and the returned fixtures
match the names requested by request.getfixturevalue(f"env_{scale_name}").

In `@tests/perf/seed.py`:
- Around line 140-145: In generate_dataset, validate inputs up front: ensure
order_count is non-negative (raise ValueError if order_count < 0) and ensure the
date window is valid (parse start_date and end_date and raise ValueError if
start_date > end_date or if parsing fails) so day_cumsum/total_day_weight cannot
be empty/zero; also apply the same precondition checks to the similar generator
at lines 160-162. Use clear ValueError messages referencing the invalid
parameter (order_count or start_date/end_date) so invalid perf params fail fast.
- Around line 313-315: The current remap assigns late-opening orders to an
arbitrary early shop, breaking the customer→shop affinity; change the branch
that handles (shop_id > late_shop_threshold and created_at < mid_date) to
instead re-roll the shop_id by sampling from that customer’s eligible early
shops (i.e., filter the customer’s weighted shop list to ids <=
late_shop_threshold and sample using the same weights) and assign that sampled
id to shop_id; if the customer has no eligible early shops, shift created_at
forward to mid_date (or the customer’s first available shop open date) so the
original late shop assignment remains valid. Ensure you reference and use the
same customer shop list and weighting logic used elsewhere in seed.py when
re-rolling to preserve the affinity distribution.
- Around line 463-475: The insert currently stringifies timestamps with
.isoformat() in the batch comprehension inside conn.execute (the INSERT INTO
orders using sa.text); change the batch payload so the "created_at",
"completed_at", and "cancelled_at" values are the native datetime objects (use
o.created_at, o.completed_at or None, o.cancelled_at or None) instead of calling
.isoformat(), so SQLAlchemy/DBAPI can bind them as TIMESTAMPs with the correct
dialect-specific processing.

---

Nitpick comments:
In `@Makefile`:
- Line 5: Add the standard phony targets "all" and "clean" to satisfy checkmake:
declare them alongside the existing .PHONY line and implement thin aliases so
"all" depends on the primary validation targets (e.g., test lint) and "clean"
removes build/test artifacts (or delegates to existing cleanup commands) using
the Makefile's existing targets and conventions (refer to .PHONY and the test,
lint, bench, bench-report, bench-csv target names to locate where to wire these
aliases).

In `@tests/perf/seed.py`:
- Around line 92-133: Replace the plain dataclasses with Pydantic v2 models:
convert Region, Shop, Customer, Order, and Dataset to inherit from
pydantic.BaseModel (import from pydantic) and use typing annotations (e.g., int,
str, datetime, list[int], Optional[datetime]) and pydantic.Field where
defaults/metadata are needed; ensure Order fields completed_at and cancelled_at
are Optional[datetime], keep avg_cost as int (cents) and avg_frequency/size as
int, and make Dataset reference lists of the Pydantic model classes
(list[Region], list[Shop], etc.) so validation/serialization are automatic. Also
add any lightweight validators (if needed) on Shop.avg_cost or Order.cost to
enforce non-negative values using model validators in the corresponding model
names (Region, Shop, Customer, Order, Dataset).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a69d49c7-bdf6-4112-bcf9-d44f9528cd0d

📥 Commits

Reviewing files that changed from the base of the PR and between 2bf0ee7 and 5538c8c.

📒 Files selected for processing (7)

.gitignore
Makefile
tests/perf/bench_to_csv.py
tests/perf/conftest.py
tests/perf/params.py
tests/perf/seed.py
tests/perf/test_bench.py

✅ Files skipped from review due to trivial changes (2)

.gitignore
tests/perf/params.py

🚧 Files skipped from review as they are similar to previous changes (2)

tests/perf/bench_to_csv.py
tests/perf/test_bench.py

AivanF added 3 commits April 1, 2026 15:24

Performance testing

60cad5f

Generalise perf test and convert to CSV

1385dae

Transpose perf test CSV

2e94060

AivanF requested a review from ZmeiGorynych April 1, 2026 17:55

coderabbitai bot reviewed Apr 1, 2026

View reviewed changes

Comment thread tests/perf/bench_to_csv.py

AivanF added 4 commits April 1, 2026 21:10

Algorithmic complexity estimate

2bf0ee7

Parametrised Perf Test

17e4f0f

Quick benchmark command

441b537

Perf test against external DB

5538c8c

coderabbitai bot reviewed Apr 2, 2026

View reviewed changes

Comment thread Makefile Outdated

Comment thread tests/perf/conftest.py

Comment thread tests/perf/conftest.py

Comment thread tests/perf/conftest.py Outdated

Comment thread tests/perf/seed.py

Comment thread tests/perf/seed.py

Comment thread tests/perf/seed.py

Fixes from review

7c5ac60

AivanF merged commit 147f9b5 into main Apr 2, 2026
2 of 3 checks passed

coderabbitai bot mentioned this pull request Apr 7, 2026

Reworked SubQueries #13

Merged

coderabbitai bot mentioned this pull request Apr 14, 2026

Better Async support #29

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance testing#8

Performance testing#8
AivanF merged 8 commits intomainfrom
aivan/2026-04-01-Performance-Testing

AivanF commented Apr 1, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Apr 1, 2026 •

edited

Loading

Review failed

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Suggested reviewers

Poem

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

AivanF commented Apr 1, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Suggested reviewers

Poem

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

AivanF commented Apr 1, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 1, 2026 •

edited

Loading