Added spill to shared and launch bounds #16

WillTrojak · 2025-11-30T19:40:59Z

Added spilling to shared for the two kernels that don't already use shared memory. This feature requires cuda >= 12.9.

Additionally, I added launch bounds to the cuda kernels. This generally gives a boost to performance, but especially helps when spilling to shared.

FreddieWitherden · 2025-12-01T01:25:44Z

gimmik/kernels/cuda/bstream.mako

         const ${dtype}* __restrict__ b, int ldb,
         ${dtype}* __restrict__ c, int ldc)
 {
+#if ( ( defined(__CUDACC_VER_MAJOR__) && ( __CUDACC_VER_MAJOR__ >= 13 ) ) || \


When would CUDACC_VER_MAJOR not be defined?

No, I can't think of a time when those wouldn't be defined when compiled with Nvidia tools. But there are some third-party tools, like SCALE, that claim to be able to compile CUDA for other accelerators, and I have no idea for those. So I thought it was good practice to check if they exist first.

I think we can just check directly. Also do we need to care about CUDA 12? Seems easier to just require 13 or later.

Ok, changed this to just cuda 13.

FreddieWitherden · 2025-12-01T15:11:01Z

gimmik/kernels/cuda/bstream.mako

         const ${dtype}* __restrict__ b, int ldb,
         ${dtype}* __restrict__ c, int ldc)
 {
+#if ( __CUDACC_VER_MAJOR__ >= 13 )


Does it have to go at the start of a function or can we move it down after the variable declarations so that it only needs to appear once?

No, it needs to come first.

WillTrojak · 2025-12-05T20:18:57Z

Here is the FP64 performance improvement in % for N = 48^3.

p	mat	hex	pri	tet
2	m0	-1.826484905	2.009701195	6.944436481
2	m3	-0.295154775	8.019251568	-7.427670957
2	m6	-2.818445786	-2.887391268	2.33066366
2	m132	-5.844456427	-7.246381148	7.939180555
2	m460	0.203088554	0	-11.92196406
3	m0	8.157557453	-0.128480629	0
3	m3	0.017075499	-2.372395288	0.037665092
3	m6	-1.078167805	0.760461761	-0.100782278
3	m132	-0.080955364	0.077853383	3.225807287
3	m460	5.879638405	0	0.058428688
4	m0	0.896861374	-0.018543607	1.434853369
4	m3	-0.615201494	-0.514135272	0
4	m6	0.290004409	26.83560534	0.009443318
4	m132	-0.007251841	0.051970419	0
4	m460	-0.239808226	2.616519779	0.012443141
5	m0	0.559627573	3.384434896	0
5	m3	0.830226822	2.435323222	0
5	m6	-5.626592215	17.21821181	0
5	m132	-0.603549919	6.455492999	0
5	m460	0.863485539	33.61152995	0.45769551
6	m0	0.343045801	10.4882413	0.032300137
6	m3	0.683529765	1.207283004	0.47698948
6	m6	3.375551986	2.312175893	-0.114073593
6	m132	0.071276331	21.72202425	0.021733253
6	m460	1.651250203	9.148251482	0.004187673

FreddieWitherden · 2025-12-05T21:19:32Z

Do you have absolute numbers so peak of FLOPs/bandwdith we achieve?

Added spill to shared and launch bounds

963b85c

FreddieWitherden reviewed Dec 1, 2025

View reviewed changes

removed 12.9 check

678017a

FreddieWitherden reviewed Dec 1, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Added spill to shared and launch bounds #16

Added spill to shared and launch bounds #16

Uh oh!

WillTrojak commented Nov 30, 2025

Uh oh!

FreddieWitherden Dec 1, 2025

Uh oh!

WillTrojak Dec 1, 2025 •

edited

Loading

Uh oh!

FreddieWitherden Dec 1, 2025

Uh oh!

WillTrojak Dec 1, 2025

Uh oh!

FreddieWitherden Dec 1, 2025

Uh oh!

WillTrojak Dec 1, 2025

Uh oh!

WillTrojak commented Dec 5, 2025

Uh oh!

FreddieWitherden commented Dec 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Added spill to shared and launch bounds #16

Are you sure you want to change the base?

Added spill to shared and launch bounds #16

Uh oh!

Conversation

WillTrojak commented Nov 30, 2025

Uh oh!

FreddieWitherden Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

WillTrojak Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

FreddieWitherden Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

WillTrojak Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

FreddieWitherden Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

WillTrojak Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

WillTrojak commented Dec 5, 2025

Uh oh!

FreddieWitherden commented Dec 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

WillTrojak Dec 1, 2025 •

edited

Loading