HAN的attention里为什么加reduce_sum和reduce_max？

在HAN的attention里面看到：
`
attetion_logits = tf.reduce_sum(hidden_state_context_similarity,axis = 2)
attention_logits_max = tf.reduce_max(attention_logits, axis = 1,keep_dims = True)
p_attention = tf.nn.softmax(attetion_logits-attention_logits_max)`
原论文里没看到这个操作，请问这是为什么呢？