-
Couldn't load subscription status.
- Fork 1.4k
Open
Description
We have a custom defined LayerNorm
annotated-transformer/the_annotated_transformer.py
Lines 315 to 327 in debc9fd
| class LayerNorm(nn.Module): | |
| "Construct a layernorm module (See citation for details)." | |
| def __init__(self, features, eps=1e-6): | |
| super(LayerNorm, self).__init__() | |
| self.a_2 = nn.Parameter(torch.ones(features)) | |
| self.b_2 = nn.Parameter(torch.zeros(features)) | |
| self.eps = eps | |
| def forward(self, x): | |
| mean = x.mean(-1, keepdim=True) | |
| std = x.std(-1, keepdim=True) | |
| return self.a_2 * (x - mean) / (std + self.eps) + self.b_2 |
From the look of line 326, there is no specification of 'correction=0'. By default, this means 'correction=1', which applies a Bessel’s correction. Had we removed this correction, we could easily implement with PyTorch's native LayerNorm class. Is there any reason we opted for the custom route? Thank you.
PangLuoandrecosta90
Metadata
Metadata
Assignees
Labels
No labels