Understanding Transformer-Based Self-Supervised Architectures — Transformer (Vaswani et. al.) is great, it attends to a longer context, it offers parallelization in computation which RNNs don’t, and most importantly, they have the state of the art results. In this article, we’ll be covering the Reformer Model, which was proposed in the paper Reformer: The Efficient Transformer…