In this notebook I explored the Transformer architecture, a neural network that takes advantage of parallel processing and allows to substantially speed up the training process.
The Transformer algorithm was due to Vaswani et al. (2017).
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin (2017). Attention Is All You Need
- Deep learning specialization : https://www.coursera.org/specializations/deep-learning