-
Notifications
You must be signed in to change notification settings - Fork 10
Open
Description
There's an ugly hack in the SwiGLU reference here:
IRON/operators/swiglu_decode/reference.py
Line 30 in 130b6ea
| w_gate = torch.randn(N, K, dtype=torch.bfloat16).T * val_range # gate projection |
The first two weight matrices are generated with the wrong dimensions, then transposed. The reason I did it this way initially was to ensure identical outputs between the old CMake-based implementation and the new state (generating random weights in the same order to ensure identical inputs). Now that this is verified and the old CMake-based implementation is gone (#37), it's time to remove this hack.
Thanks @asyms for pointing this out.
Metadata
Metadata
Assignees
Labels
No labels