-
Notifications
You must be signed in to change notification settings - Fork 698
Description
tl;dr: What's the proper "number of threads" when using OpenCilk as the parallelism backend for FFTW?
I am trying to parallelize FFTW using Cilk. Following the manual, I have provided Cilk to FFTW using fftw_threads_set_callback. However, I've noticed that:
- The parallel routine only gets called if the "number of threads" is set using
fftw_plan_with_nthreads. - The number of jobs (
njobsparameter) never seems to exceed the number of threads previously passed in.
While this makes sense for a pthreads or OpenMP-based parallelism backend where the work is divided evenly among threads, it does not extend well to Cilk (or TBB) where all semantic parallelism can be expressed. So when parallelizing with 8 threads, the parallel loop (cilk_for) should really contain at least an order of magnitude more iterations.
To remedy this, I tried setting the number of threads to the problem size, which causes a SEGFAULT. If I just pass 1024, it seems to work, and the parallel_loop hook will receive njobs equal to 1024, 2, or 512.
1024 is likely enough parallelism, but 1024 is just an arbitrary number, so I wanted to know if established practice existed. I'm benchmarking a cilk_for scheduling change so I don't want to be making guesses.
Full code example (compiled with OpenCilk 2.0.0):
#include "fftw3.h"
void parallel_for(void *(*work)(char *), char *jobdata, size_t elsize,
int njobs, void *data) {
// std::cout << "parallel_for: " << njobs << std::endl;
cilk_for(int i = 0; i < njobs; ++i) { work(jobdata + i * elsize); }
}
int main(int argc, char **argv) {
fftw_init_threads();
fftw_plan_with_nthreads(1024);
fftw_threads_set_callback(parallel_for, nullptr);
int const N = 16 * 1024 * 1024;
// this causes a segfault
// fftw_plan_with_nthreads(N);
fftw_complex *buf = fftw_malloc(...);
fftw_plan_dft(...);
fftw_execute();
}