I was studying your paper/code and had a question regarding Section A.17, formula 17. When I try to apply the formula, the resulting attention score matrix does not have rows or columns that sum to 1, which is different from standard attention matrices. Could you please clarify if I am misunderstanding something, or if there is an additional normalization step required to ensure each row/column sums to 1?

I was studying your paper/code and had a question regarding Section A.17, formula 17. When I try to apply the formula, the resulting attention score matrix does not have rows or columns that sum to 1, which is different from standard attention matrices. Could you please clarify if I am misunderstanding something, or if there is an additional normalization step required to ensure each row/column sums to 1?