-
Notifications
You must be signed in to change notification settings - Fork 13
multithreading via Rayon #50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The initial numbers are promising, with over 3x faster DiT at size 16777216. The gains for smaller sizes are smaller, and the smallest sizes that use this codepath regress. |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #50 +/- ##
==========================================
+ Coverage 99.29% 99.82% +0.52%
==========================================
Files 12 13 +1
Lines 2277 2281 +4
==========================================
+ Hits 2261 2277 +16
+ Misses 16 4 -12 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…halves" This reverts commit 42cb42b.
…s by eliminating the thread spawning and termination overhead
…awning overhead is no longer a concern; benchmarks show improvement even at 15 on my machine but I'm being conservative for now, we'll need to auto-tune COBRA in the future because hardware varies so much
|
With rayon thread pool also being used for COBRA instead of spawning brand new threads, sizes 131072 and above are 2x faster compared to main, and our largest size 16777216 is 2.5x faster. Admittedly my CPU has a lot of threads so the gains may be less pronounced on lower core counts. @smu160 I'd appreciate benchmarks on ARM |
…'t have to stop testing README snippets that Rust treats as doctest
|
So, fun fact: disabling Not sure if it's a code layout artifact or the result of more inlining or what. We never had the chance to see these gains before because std::thread::scope codepath was compiled unconditionally. I'm not going to do anything about it just yet, but it is something to keep in mind for the future. |
…hmarks show that it regresses small sizes far less, but not enough to break even with the single-threaded implementation; but also regresses large sizes a lot. So it loses out to both rayon and single-threaded depending on the size and doesn't seem to be worth it.
…on. Benchmarks show that it regresses small sizes far less, but not enough to break even with the single-threaded implementation; but also regresses large sizes a lot. So it loses out to both rayon and single-threaded depending on the size and doesn't seem to be worth it." This reverts commit d703fe3.
… not be split further




Multi-thread the part that spans the two halvesDoesn't seem to be beneficial, difficult to measure due to noise