May I ask if the model diagram in the paper is drawn incorrectly

Hello author, I noticed that your model image is a multiplication of the largest feature map in the pixel decoder with the class branch. However, when I look at the code and the model image of the mask2former, it seems that you are multiplying the feature map with the mask branch. So, may I ask if you have drawn this part incorrectly