How I did at the Google Foobar Challenge, Twice!

Google Foobar Invite

Foobar is “said to be” Google’s secret hiring challenge, and it’s really exciting. It is an invite-only challenge, so “you have to be chosen.” The picture above is how I received my invite.

In this article, I’ll be sharing my experience of the foobar challenge. I’ll give a walkthrough of the levels and a glimpse of what one can expect from the challenge.

Note that I won’t be sharing any of the questions or solutions in this post as it kills all the fun.

How to Get an Invite?

Well, it’s not in our hands. There’s no specific word from Google, but it’s said that…

Understanding Transformer-Based Self-Supervised Architectures

GPT-3 in Action via OpenAI Blog

In this article, we’ll be discussing the renowned GPT-3 model proposed in the paper “Language Models are Few-Shot Learners” by OpenAI. It is the successor of GPT-2, which has a very similar architecture to that of GPT-3.

If you’re unaware of GPT-2, consider giving my article on GPT-2 a read, as most of GPT-3 is based on it and would help in understanding the model better.

Quick Recap

Going back to GPT-2, it is essentially an autoregressive model based on the Transformer architecture (Vaswani et al.). But the novelty of GPT-2 lies in its pre-training approach.

The pre-training leverages multi-task learning at…

Towards Training Better Reinforcement Learning Agents

Grid World Scenarios via Value Iteration Networks Paper

Dynamic Programming is a mathematical optimization approach typically used to improvise recursive algorithms. It basically involves simplifying a large problem into smaller sub-problems. There are two properties that a problem must exhibit to be solved using dynamic programming:

  1. Overlapping Subproblems
  2. Optimal Substructure

We’ll be discussing ‘Planning in RL’ using dynamic programming. Planning mainly requires the complete environment’s knowledge (usually an MDP) or a model of the environment in advance. And using this knowledge, we can solve for the optimal policy.

In my previous article on Reinforcement Learning, I have covered the formulation of RL problems as a Markov Decision Process…

Understanding Transformer-Based Self-Supervised Architectures

Photo by Joe Gardner on Unsplash

Transformer-based language models have been leading the NLP benchmarks lately. Models like BERT, RoBERTa have been state-of-the-art for a while. However, one major drawback of these models is that they cannot “attend” to longer sequences. For example, BERT is limited to a max of 512 tokens at a time.

To overcome these long sequence issues, several approaches burgeoned. Models like Transformer-XL and Reformer propose decent ways to reduce the model parameters, and hence, the complexity. I have already covered Transformer-XL in this and the Reformer in this article, respectively. Consider giving them a read if you’re interested.

In this article…

Optimizing GPU Performance with TensorFlow

TensorFlow Profiler Walkthrough by Author

We want our models to train real fast. We use GPUs to make operations execute faster. However, it is possible that even after speeding up the computations, the model may have inefficiencies in the pipeline itself, and thus, may train slower. In such cases, it becomes really difficult to debug the code, or as a matter of fact, even tell what is slow.

This can be addressed by using the TensorFlow Profiler. The Profiler ‘profiles’ the TensorFlow code execution. We’ll be discussing the Profiler, how to use it, best practices, and how to optimize the GPU performance in this article.

Understanding Transformer-Based Self-Supervised Architectures

Photo by Edurne Chopeitia on Unsplash

Multilingual Language Models are one of the recent milestones of NLP research and a step towards generalizing NLP algorithms. Masked Language Models (MLM) like multilingual BERT (mBERT), XLM (Cross-lingual Language Model) have achieved state of the art in these objectives.

In this article, we’ll discuss the XLM-RoBERTa (or XLM-R) model proposed in “Unsupervised Cross-lingual Representation Learning at Scale.” This paper essentially analyses how training a cross-lingual model at scale can highly boost the performance, and propose a new model that achieves state of the art in this task.


XLM-RoBERTa is trained on a multi-lingual language modeling objective using only monolingual…

Understanding Transformer-Based Self-Supervised Architectures

Photo by Prateek Katyal on Unsplash

BERT pretraining is the pioneer of language modeling. The state of the art in NLP has been evolving ever since. However, the convention says larger models perform better. But, large models hinder scaling. It is difficult and expensive to train them. Moreover, the training speed decreases with the increasing size of the model.

In this article, we’ll be discussing the ALBERT model by Google AI proposed in the paper, “ALBERT: A Lite BERT for Self-supervised Learning of Language Representations.” This paper essentially proposes 2 techniques for parameter reduction (to overcome the above issues) with the original BERT architecture:

  1. Factorized Embedding…

Training Faster RNNs with Quasi-RNN

Photo by Braden Collum on Unsplash

Recurrent Neural Networks (RNNs) have been in the sequence modeling business for a long time. But RNNs are slow; they process one token at a time. Moreover, the recurrent architecture adds a limitation of fixed-length encoding vectors for the complete sequence. To overcome these issues, architectures like CNN-LSTM, Transformer, QRNNs burgeoned.

In this article, we’ll be discussing the QRNN model proposed in the paper, “Quasi-Recurrent Neural Networks.” It is essentially an approach for adding convolution to recurrence and recurrence to convolution. You will get this as you proceed through the article.

Long Short-Term Memory (LSTM)

Towards Training Better Reinforcement Learning Agents


In this article, we’ll be discussing the objective using which most of the Reinforcement Learning (RL) problems can be addressed— a Markov Decision Process (MDP) is a mathematical framework used for modeling decision-making problems where the outcomes are partly random and partly controllable. We’ll discuss MDPs in greater detail as we walk through the article.

We are essentially going to describe the RL problem in a broad sense. Moreover, we’ll try to get an intuition on this using real-life examples framed as RL tasks.

This article is inspired by David Silver’s Lecture on MDP, and the equations used in this…

Understanding Transformer-Based Self-Supervised Architectures

Photo by Leonardo Toshiro Okubo on Unsplash

Models like BERT (Devlin et. al.) or GPT (Radford et. al.) have achieved the state of the art in language understanding. However, these models are pre-trained only on one language. Recently, efforts have been made towards mitigating monolingual representations and building universal cross-lingual models that would be capable of encoding any sentence into a shared embedding space.

In this article, we will be discussing the paper, Cross-lingual Language Model Pretraining, proposed by Facebook AI. The authors propose 2 approaches for cross-lingual language modeling:

  1. Unsupervised, relies on monolingual data
  2. Supervised, relies on parallel data.

Cross-lingual Language Model (XLM)

In this section, we will discuss the…

Rohan Jagtap

Immensely interested in AI Research | I read papers and post my notes on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store