The comment # output : [batch_size, len_seq, n_hidden] should indeed be corrected to # output : [batch_size, len_seq, n_hidden*2] because the Bi-LSTM model is bidirectional. In a bidirectional LSTM, the hidden size is effectively doubled, as it concatenates the forward and backward hidden states. Therefore, the correct shape of the output after permutation is [batch_size, len_seq, n_hidden * 2].