Skip to content

Conversation

@WillTrojak
Copy link
Member

Added jit'ing for float4 and double2 vector types in the CUDA generator.

@FreddieWitherden
Copy link
Contributor

Do you have any benchmarks for these? It will increase compile times so I want to be sure that we do see a benefit somewhere.

gimmik/cuda.py Outdated
platform = 'cuda'
basemeta = {'block': (128, 1, 1), 'width': 1, 'shared': 0,
'dynamic_shared': 0}
vtypes = {'float': {'float2': (2, 2), 'float4': (4, 4)},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we need to bother with the alignment.

@WillTrojak
Copy link
Member Author

Here is the FP64 performance improvement in % for N=48^3

p mat hex pri tet
2 m0 -3.533371763 4.843308571 1.9023938
2 m3 -0.295154775 0.297843008 5.808760296
2 m6 -1.418440302 -0.049529643 -0.071124285
2 m132 -0.097657075 -4.690991397 11.03387621
2 m460 -1.240993204 0.059032363 2.869200715
3 m0 8.185044182 1.21527681 0
3 m3 0 -2.477560435 -0.112825122
3 m6 -1.608576253 0.772438562 -0.080645839
3 m132 -1.10576634 1.523184537 3.296704216
3 m460 -0.029065774 0 0.058428688
4 m0 -1.746725526 -0.626782048 1.411388864
4 m3 0.653198534 0 1.183432388
4 m6 -0.139722193 5.780346028 0.009443318
4 m132 -0.512157025 0.557107275 -0.012451606
4 m460 -1.4669897 -0.008854826 0
5 m0 0.008564597 -0.011397145 -0.766288992
5 m3 1.480361034 0.222750352 -0.922382174
5 m6 -0.720272118 -0.297150933 -0.326509939
5 m132 -0.807297097 0.494840479 -1.078167653
5 m460 0.510315992 12.28417455 -0.014238971
6 m0 0.1712243 0 0.040380291
6 m3 10.2580566 -0.13422927 -0.007912969
6 m6 0.118645903 0.049778555 0.006923897
6 m132 0.036737528 5.175943762 0.004347303
6 m460 0.494356025 -0.675444632 -0.00839014

@WillTrojak
Copy link
Member Author

It's not clear to me why the performance would go down; it could be noise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants