Skip to content

Clean up SwiGLU reference code weight generation #52

@andrej

Description

@andrej

There's an ugly hack in the SwiGLU reference here:

w_gate = torch.randn(N, K, dtype=torch.bfloat16).T * val_range # gate projection

The first two weight matrices are generated with the wrong dimensions, then transposed. The reason I did it this way initially was to ensure identical outputs between the old CMake-based implementation and the new state (generating random weights in the same order to ensure identical inputs). Now that this is verified and the old CMake-based implementation is gone (#37), it's time to remove this hack.

Thanks @asyms for pointing this out.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions