Understanding Transformer-Based Self-Supervised Architectures

Transformer (Vaswani et. al.) is great, it attends to a longer context, it offers parallelization in computation which RNNs don’t, and most importantly, they have the state of the art results.

In this article, we’ll be covering the Reformer Model, which was proposed in the paper Reformer: The Efficient Transformer

BERT (Devlin et. al.) is a pioneering Language Model that is pretrained for a Denoising Autoencoding objective to produce state of the art results in many NLP tasks. However, there is still room for improvement in the original BERT model w.r.t its pretraining objectives, the data on which it is…

Generating Meaningful Data from Noise

This (GANs), and the variations that are now being proposed is the most interesting idea in the last 10 years in ML, in my opinion

— Yann LeCun (VP & Chief AI Scientist at Facebook) in his Quora Session

GANs are one of the latest wonders of Deep Learning. These…

Towards Training Better Reinforcement Learning Agents

Dynamic Programming is a mathematical optimization approach typically used to improvise recursive algorithms. It basically involves simplifying a large problem into smaller sub-problems. There are two properties that a problem must exhibit to be solved using dynamic programming:

  1. Overlapping Subproblems
  2. Optimal Substructure

We’ll be discussing ‘Planning in RL’ using dynamic…

Transformer-based language models have been leading the NLP benchmarks lately. Models like BERT, RoBERTa have been state-of-the-art for a while. However, one major drawback of these models is that they cannot “attend” to longer sequences. For example, BERT is limited to a max of 512 tokens at a time.


Rohan Jagtap

Immensely interested in AI Research | I read papers and post my notes on Medium

