Is there any reason we used a different LayerNorm implementation?

We have a custom defined LayerNorm 
https://github.com/harvardnlp/annotated-transformer/blob/debc9fd747bb2123160a98046ad1c2d4da44a567/the_annotated_transformer.py#L315-L327

From the look of [line 326](https://github.com/harvardnlp/annotated-transformer/blob/debc9fd747bb2123160a98046ad1c2d4da44a567/the_annotated_transformer.py#L326), there is no specification of 'correction=0'. By default, this means 'correction=1', which applies a [Bessel’s correction](https://en.wikipedia.org/wiki/Bessel%27s_correction). Had we removed this correction, we could easily implement with PyTorch's native LayerNorm class. Is there any reason we opted for the custom route? Thank you.


	class LayerNorm(nn.Module):
	"Construct a layernorm module (See citation for details)."

	def __init__(self, features, eps=1e-6):
	super(LayerNorm, self).__init__()
	self.a_2 = nn.Parameter(torch.ones(features))
	self.b_2 = nn.Parameter(torch.zeros(features))
	self.eps = eps

	def forward(self, x):
	mean = x.mean(-1, keepdim=True)
	std = x.std(-1, keepdim=True)
	return self.a_2 * (x - mean) / (std + self.eps) + self.b_2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Is there any reason we used a different LayerNorm implementation? #122

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Is there any reason we used a different LayerNorm implementation? #122

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions