Fix typo error for lstm operations#385
Conversation
index.bs
Outdated
| - *bias*: an {{MLOperand}}. The 1-D input bias tensor of shape [4 * hidden_size]. The ordering of the bias vectors in the first dimension of the tensor shape is specified according to the *options.layout* argument. | ||
| - *recurrentBias*: an {{MLOperand}}. The 1-D recurrent bias tensor of shape [4 * hidden_size]. The ordering of the bias vectors in the first dimension of the tensor shape is specified according to the *options.layout* argument. | ||
| - *peepholeWeight*: an {{MLOperand}}. The 1-D weight tensor for peepholes of shape [3 * hidden_size]. The pack ordering of the weight vectors is for the *input (i)*, *output (o)*, and *forget (f)* gate respectively. | ||
| - *peepholeWeight*: an {{MLOperand}}. The 1-D weight tensor for peepholes of shape [4 * hidden_size]. The pack ordering of the weight vectors is for the *input (i)*, *output (o)*, and *forget (f)* gate respectively. |
There was a problem hiding this comment.
So I'm not sure about TF and PT (I can't figure out yet which one corresponds to the "peephole", since they evidently use a different term), but the DML API definitely shows PeepholeTensor sizes = { 1, 1, num_directions, 3 * hidden_size } and ONNX too P (optional, differentiable) : The weight tensor for peepholes. ... It has shape [num_directions, 3*hidden_size], not 4?
- https://www.tensorflow.org/api_docs/python/tf/keras/layers/LSTM
- https://pytorch.org/docs/stable/generated/torch.nn.LSTM.html
- https://learn.microsoft.com/en-us/windows/win32/api/directml/ns-directml-dml_lstm_operator_desc
- https://github.com/onnx/onnx/blob/main/docs/Operators.md#lstm
That would be consistent with the wording after it, concatenating 3 tensors: The pack ordering of the weight vectors is for the *input (i)*, *output (o)*, and *forget (f)* gate respectively.
There was a problem hiding this comment.
Thanks @fdwr!
I've updated commit with referring to above your sharing materials and also this arXiv paper LSTM: A Search Space Odyssey, and updated implementation of lstm for WebNN-Polyfill API webmachinelearning/webnn-polyfill@b25e817, please take another look, thanks.
In the Spec, lstm operation describes peepholeWeight as bellowing:
peepholeWeight: an [MLOperand](https://webmachinelearning.github.io/webnn/#mloperand).
The 2-D weight tensor for peepholes of shape [num_directions, 4 * hidden_size].
The pack ordering of the weight vectors is for the input (i), output (o), and forget (f) gate respectively.
So I did this modification for lstmCell operation to align with the second value of above shape [num_directions, 4 * hidden_size].
This is a typo error of peepholeWeight shape being [num_directions, 4 * hidden_size], it should be [num_directions, 3 * hidden_size], then it has the shape like next explanation line "The pack ordering of the weight vectors is for the input (i), output (o), and forget (f) gate respectively." .
And this shape [num_directions, 3 * hidden_size] could be verified by this much clearer descriptive part II. VANILLA LSTM paper of LSTM: A Search Space Odyssey, please see these three screenshots:
- figure-1
but the DML API definitely shows PeepholeTensor sizes = { 1, 1, num_directions, 3 * hidden_size } and ONNX too P (optional, differentiable) : The weight tensor for peepholes. ... It has shape [num_directions, 3*hidden_size]
I verified that [3 * hidden_size] couldn't be the shape of peepholeWeight for lstm operation with the error when executing slice compute, while the shape of peepholeWeight for lstmCell operation being [3 * hidden_size] works.
The error when executing slice compute is due to not modify currentPeepholeWeight setting of for loop as below
for (let dir = 0; dir < numDirections; ++dir) {
.....
currentPeepholeWeight.push(options.peepholeWeight ?
- (builder.squeeze(builder.slice(options.peepholeWeight, [dir, 0], [1, 4 * hidden_size]), { axes: [0] })) : null);
+ (builder.squeeze(builder.slice(options.peepholeWeight, [dir, 0], [1, 3 * hidden_size]), { axes: [0] })) : null);
}
There was a problem hiding this comment.
Oof, you had to dig back into the original paper. Thank you for the spec wording and pseudocode fix.
006af72 to
1732b82
Compare
wchao1115
left a comment
There was a problem hiding this comment.
Looks right. Thanks for fixing it.
|
@huningxin feel free to merge at will if you're happy with this fix. |
|
@BruceDai What is the reason this PR hasn't been merged? It was approved a few weeks ago? |
|
@anssiko @huningxin Would you please help merge this PR, thanks. |
|
@BruceDai thanks for this contribution! |



The fix is verified by the webnn-polyfill lstm and lstmCell implementations and tests: webmachinelearning/webnn-polyfill#227.
@wchao1115 @fdwr @huningxin PTAL, thanks.
Preview | Diff