Skip to content

Clean up SwiGLU reference code weight generation #52

@andrej

Description

@andrej

There's an ugly hack in the SwiGLU reference here:

w_gate = torch.randn(N, K, dtype=torch.bfloat16).T * val_range # gate projection

The first two weight matrices are generated with the wrong dimensions, then transposed. The reason I did it this way initially was to ensure identical outputs between the old CMake-based implementation and the new state (generating random weights in the same order to ensure identical inputs). Now that this is verified and the old CMake-based implementation is gone (#37), it's time to remove this hack.

Thanks @asyms for pointing this out.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions