Rohan Jagtap
1 min readFeb 9, 2021

--

I got your point. Actually it's been a while I have seen/done this. So I missed this part. It struck me after seeing your reply, so thank you.

And yes, the stack answer explains the expression in terms of probabilities so it has become a bit complicated. Moreover, it explains it over MDP, hence the parameters, pi and action 'a' (my solution concerns MRPs so no actions; hardly changes anything in the solution though). But no worries.

So what you're looking at is called the 'The Law of Iterated Expectations'. It is defined as,

E[E[X | Y]] = E[X]

Or its nested form,

E[E[X | Y] | Z] = E[X | Z] ... we want this one.

So, we can write:

E[v(S_{t+1}) | S_t] = E[E[G_{t + 1} | S_{t + 1}] | S_t]

By the law of iterated expectations we get:

E[E[G_{t + 1} | S_{t + 1}] | S_t] = E[G_{t + 1} | S_t]

Hence, E[G_{t + 1} | S_t] = E[v(S_{t+1}) | S_t]

Now it can be combined back into the Bellman Equation as:

v(s) = E[R_{t + 1} + γv(S_{t+1}) | S_t = s]

Hope it makes sense to you :)

I found a video for the aforementioned concept: https://www.youtube.com/watch?v=6EauZqeAxcM

--

--

Rohan Jagtap
Rohan Jagtap

Written by Rohan Jagtap

Immensely interested in AI Research | I read papers and post my notes on Medium