Is the mask decoder weight inherited from the teacher models' decoder?

If so, in the full-stage knowledge distillation, the image encoder is randomly initialized, is the mask decoder finetuned at a smaller learning rate than the light weight image encoder? Is this consistent with your implementation?