-
Notifications
You must be signed in to change notification settings - Fork 21
[Question] Slow CPU computations (even with MKL) #66
Description
Hello!
I have been testing computations on the CPU using the MKL library. When I compared them with GPU computations, the difference in execution time was enormous. The reason I made this comparison is that I only have easy access to CPUs, not GPUs.
Because of this, I have several questions:
Am I correct in understanding that currently there is no MKL + MPI backend available in wakis for CPU computations?
Have you performed benchmarks comparing GPU and CPU performance, and if so, did you observe a similar difference? (I want to make sure that the issue is not on my side.)
My investigation suggests that the main bottleneck in CPU computations is the SpMV operation, which runs much faster on GPUs. Would it be faster on CPUs to replace the SpMV operation with a straightforward iteration over all grid nodes?
PS1. There may be inaccuracies in point 3, since I have limited experience with computations of this type. What I mainly mean is moving away from the matrix-based approach toward something similar to what is implemented in wakis/solver3D.py (yes, the methods are different there, but I hope the general idea is clear). I did a small investigation, and ChatGPT suggests that this could provide a noticeable speedup on CPUs. However, I also tried to find research papers supporting this idea but could not find confirmation. What do you think about this?
PS2. If you know the reason why CPU computations are slow and are aware of strategies that could reliably speed them up (within the wakis architecture), could you please share them?