MoEUTmnist Test Implementation of Mixture of Experts Universal Transformer on conditional MNIST generation The main goal of this short project was to familiarize myself with expert parallel and MoE, but I also wanted to do parameter sharing for fun.