Skip to content

Parallelize fitness evaluation with rayon#9

Merged
urmzd merged 8 commits intomainfrom
feat/parallel-fitness-benchmark
Mar 29, 2026
Merged

Parallelize fitness evaluation with rayon#9
urmzd merged 8 commits intomainfrom
feat/parallel-fitness-benchmark

Conversation

@urmzd
Copy link
Copy Markdown
Owner

@urmzd urmzd commented Mar 29, 2026

Summary

  • Parallelizes fitness evaluation using rayon::par_iter_mut(), replacing the sequential loop over population individuals
  • Adds Send + Sync + Clone bounds to State and environment traits to support parallel execution
  • Refactors eval_fitness to take &[Self::State] (immutable slice) instead of &mut Vec<Self::State>, cloning trials per-individual for thread safety
  • Adds a Criterion benchmark comparing sequential vs parallel fitness evaluation on the Iris problem
  • Fixes debug/verbose logging from stalling computation by switching all stdout writers to non-blocking I/O via tracing_appender::non_blocking

Benchmark Results

Population Size Sequential Parallel Speedup
50 1.57 ms 462 µs 3.4x
100 1.83 ms 416 µs 4.4x
200 3.17 ms 600 µs 5.3x
500 6.18 ms 1.04 ms 5.9x

Speedup scales with population size as expected — rayon distributes individual evaluations across available cores.

Test plan

  • cargo bench --bench parallel_fitness runs successfully and shows speedups
  • cargo test passes (no functional changes to evaluation logic)
  • Verify existing performance_after_training benchmark still works with --features gym

urmzd added 8 commits March 28, 2026 22:29
Implement parallel fitness evaluation using rayon's par_iter_mut() to evaluate individuals concurrently across multiple trials.

Change eval_fitness signature from mutable trials vector to immutable slice reference to support parallel iteration. Add Clone + Send + Sync bounds to Core::State trait to enable safe parallel access.

Refactor the fitness calculation logic to compute total score in parallel and reduce to average, improving performance on multi-core systems.
Enable gym environments to work with parallel fitness evaluation by adding Send + Sync trait bounds.

Add Send + Sync requirements to GymRsEnvExt trait and its Observation type parameter across all impl blocks. This allows gym-based problem definitions to be safely used in the parallel evaluation framework.
Make IrisState cloneable to support parallel fitness evaluation.

Add Clone derive macro to enable trial state cloning in the parallel fitness evaluation pipeline. Required by updated Core::State trait bounds.
Update benchmark tools to align with the new eval_fitness signature that accepts immutable trials slice.

Change trials from mutable vector binding to immutable to match the updated API contract. Update eval_fitness call to pass immutable reference.
…talling computation

Stdout logging was using blocking I/O, causing the system to hang or
crawl in verbose/debug mode due to the high volume of trace events from
hot paths (instruction execution, fitness eval). Switch all stdout
writers to tracing_appender::non_blocking, matching the existing file
logging approach. Introduce TracingGuard to hold all WorkerGuards.
Compare rayon-parallelized fitness evaluation against the sequential
baseline across population sizes (50, 100, 200, 500) using the Iris
problem. Demonstrates 3.4x-5.9x speedups scaling with population size.
Add --n-threads option to HyperParameters, ExperimentParams, and
ExperimentConfig to control the number of rayon threads used for
parallel fitness evaluation. Defaults to all available cores when
not specified.
@urmzd urmzd merged commit cffdc5e into main Mar 29, 2026
2 checks passed
@urmzd urmzd deleted the feat/parallel-fitness-benchmark branch March 29, 2026 04:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant