1

Let $(\Omega, \mathcal{A}, \mathbb{P})$ be a probability space, $X$ be integrable w.r.t. this space, and $\mathcal{G} \subset \mathcal{F} \subset \mathcal{A}$ be sigma algebras.

Then by the tower property, we have that $\mathbb{E}[\mathbb{E}[X|\mathcal{F}]| \mathcal{G}] = \mathbb{E}[\mathbb{E}[X|\mathcal{G}]| \mathcal{F}] = \mathbb{E}[X|\mathcal{G}]$.

Now in some text about Markov Decision Process, people write, with $s_{0}$ denoting the state of the system at time $0$, and $a_{0}$ the action taken at time $0$, things like $\sum_{a' \in \mathcal{A}} \mathbb{P}(a_{0} = a'|s_{0}=s')\mathbb{E}[X|s_{0}=s', a_{0}=a'] = \mathbb{E}[\mathbb{E}[X|s_{0}=s', a_{0}=a']|s_{0}=s'] = \mathbb{E}[X|s_{0}=s']$.

It seems like tower property is used here for the last equality. However, here $(s_{0}=s', a_{0}=a')$, and $(s_{0}=s')$, they are events instead of sigma-algebras, so why can we apply the tower property? Recall that unlike $\mathbb{E}[X|\mathcal{G}]$, where $\mathcal{G}$ is a sigma algebra, $\mathbb{E}[X|A]$ is defined in another way, i.e., it is $\int X\mathbb{I}(A) d\mathbb{P}/\mathbb{P}(A)$ when $A$ is an event.

Tom
  • 414
  • 1
    Mathematically, this leaves something to be desired, but, given that the process is a markov decision process, then, in order to have all information at time $n$, all one needs to know is the current state at time $n$ So, any sigma algebra reduces to knowing that state. This is essentially why one can condition on the specific states. – mark leeds Sep 26 '23 at 07:13
  • @markleeds Are you saying I can safely view an event after a vertical bar (in a conditional probability or expectation) $A$, as the same as putting a sigma algebra there, in settings of Markov chains? Or are you saying originally the thing after the vertical bar should be some kind of a sigma algebra, but since it's a Markov chain, we simply reduces the notation to an event (e.g., if the thing before the vertical bar is about "the future" then the sigma algebra reduces to an event about "the present")? – Tom Sep 26 '23 at 14:03
  • Hi Tom: I'm not a mathematician but, the tower property applies to both sigma algebras and random variables. So, if $X$ and $Y$ are events ( we call them random variables in statistics ) , the tower property says E(E(Y|X)) = E(Y). That's true regardless of whether the$Y$ process is markov. But, since the process is markov, then knowing the current state, is all of the information so the current state kind of "is" the sigma algebra. But it doesn't necessarily have to be. To answer your question: It reduces to an event but it doesn't have to. It could have been an event to begin with. – mark leeds Sep 26 '23 at 19:19
  • Let me see if I can find something useful on the net because I'm not being all that effective in my explanation. – mark leeds Sep 26 '23 at 19:19
  • Hi Tom: Go to this link and then, when your there, click on the link labelled as "post" by Kevin Kim. That points to a link where a few people have given really useful takes on the tower property concept. I just glanced but I'm famiilar with some of the people who answered and they will have nicer explanations than I gave. See if that link is helpful. https://math.stackexchange.com/questions/41536/intuitive-explanation-of-the-tower-property-of-conditional-expectation – mark leeds Sep 26 '23 at 19:25
  • @markleeds Hi, I really appreciate your time and help! Thank you for posting the resources here. I did check the post by Kevin Kim, and it still explains the conditional expectation in terms of sigma algebras. Translating into my language in this question, it says, $\mathbb{E}[X|s_{0}] = \mathbb{E}[X|\sigma(s_{0})]$, i.e., a random variable (measurable map on the probability space) after the vertical bar means the sigma algebra generated by that r.v., or, the smallest sigma algebra w.r.t. which the r.v. is measurable. That is, however, still different than $s_{0} = s'$ after the bar – Tom Sep 26 '23 at 19:41
  • After the vertical bar, when we put $\sigma(s_{0})$, the thing is a sigma algebra. When we put $s_{0}$, which is a random variable, we actually still mean a sigma algebra. But $s_{0} = s'$ is an event, and this is what's confusing me – Tom Sep 26 '23 at 19:43
  • Hi Tom: Definitely conditional expectation and the tower property can use random variables ( events, if you will ) or sigma algebras. Even though, since it's a markov process, the current state kind of "is" the sigma algebra, I wouldn't think of it that way because that's just making things more confusing. This is my fault. – mark leeds Sep 27 '23 at 16:54
  • When we have $s_{0} = s^{\prime}$, it is true that kind of is ALL of he information because of the markov nature of the process but forget that I said that ( about the event kind of being the sigma algebra ). Let me see if I can find something that deals just with random variables and never mentions sigma algebras. Like I said, conditioning is applicable in either case but, IMHO, the sigma algebra case kind of helps one to lose the intuition for what conditioning means. – mark leeds Sep 27 '23 at 16:58
  • Tom: Even though the topic is "martingales", I think this does a pretty good job of showing how conditional expectation can use random variables rather than sigma algebras. The idea is that as long as the resulting RV ( the conditional one ) is measurable with respect to the sigma algebra GENERATED by the thing being conditioned upon, then this is also a conditional expectaton. I'm still looking for something that talks about why one can go either way: rvs or sigma algebras. https://www.math.uwaterloo.ca/%7Edlmcleis/book/appendixa2.pdf – mark leeds Sep 27 '23 at 17:09
  • Tom: This document differentiates nicely between the case of partitions ( i.e. RVs ) and the case of sigma algebras. I hope this helps because, otherwise, I'm sort of at a loss for how to explain why one can use both. Maybe someone else can help ? https://www.statlect.com/fundamentals-of-probability/conditional-probability-as-a-random-variable – mark leeds Sep 27 '23 at 17:22
  • @markleeds I agree with you that Markov property makes it feel acceptable to just put an event after the vertical bar. I once studied a book Measure Theory by Donald Cohn, and the last chapter is about probability. I now kinda know why he included martingale, Brownian motion but didn't include Markov chain in that chapter as it can get nasty and hard to explain in the rigorous mathematical language. Also, THANK YOU for your time and help! – Tom Sep 27 '23 at 21:08
  • Hi Tom: I was trying to provide some intuition as to why one can condition on an event in the markov chain and think of that event as the sigma algebra. But, even though, there may be something there, I wouldn't go that route for understanding conditional expectation and the tower property. I haven't done it yet myself but I highly recommend going over that statlect.com link thoroughly. It differentiates between the rv viewpoint and the sigma algebra viewpoint. It might clarify why rvs and sigma algebras can be conditioned upon. It's a great question that I've always wondered about myself. – mark leeds Sep 29 '23 at 03:46
  • Tom: Just one more thing: Notice that, once you understand conditioning in the RV case, then the markov chain understanding is almost there. So, it's best to think of your question as NOT involving the markov chain because, including it in the question, kind of confuses the issue. It's best to get conditioning and the tower property understood first, independently of markov chains, and THEN attack markov decision processes. Atleast that's my advice. All the best. Oh, I was hoping a math person would chime in with a formal-mathematical explanation but so far not. – mark leeds Sep 29 '23 at 03:54
  • 1
    Tom: The last answer in the link below kind of made it click for me. Its the clearest explanation that I have seen. https://math.stackexchange.com/questions/3130155/conditioning-on-an-event-vs-a-sigma-algebra. statlect link didn't live up to my expectations. – mark leeds Sep 29 '23 at 05:09
  • @markleeds Thank you very much for your help! – Tom Sep 30 '23 at 20:25
  • Tom: Your welcome. I didn't help much but hopefully the various links might, especially the last one. All the best. – mark leeds Oct 01 '23 at 21:28
  • 1
    It’s just a different notation. It’s not actually conditioning on an event, it’s still conditioning on a sigma algebra, that’s why you can apply iterated expectation here. Though I do agree, it’s quite a poor notation – Andrew Oct 03 '23 at 01:03
  • @markleeds Do you think my answer now makes any sense? – Tom Oct 03 '23 at 01:04
  • @Andrew Could you plz check my answer? I do agree that it's just a notation, which I have to believe when I reached the step that $A_{1}$ doesn't belong to the sigma algebra $2^{\mathcal{B}}$ so cannot be an event... – Tom Oct 03 '23 at 01:06
  • Hi Tom: I'll print out and read carefully and see if I can respond with anything useful. It may take me some time ? – mark leeds Oct 04 '23 at 04:52
  • @markleeds If you find my answer hard to follow just forget about it. I really appreciate your help and don't want to bother you more. What I'm basically saying in the answer is as Andrew said, in some cases the thing after the vertical bar is just a notation, not something being conditioned on – Tom Oct 04 '23 at 14:36
  • 1
    Your welcome tom but I really want to understand it. Just a lot of things going on right now. What do they say: "I had a plan and then life happened !!!". All the best and I hope you improved your knowledge but I'm still gonna try to improve mine even if it means emailing some probabilty theorist. Sometimes they respond !!!! – mark leeds Oct 05 '23 at 19:21

1 Answers1

1

Suppose we are in a Markov decision process setting. $S, U$ are spaces of states and actions/controls, resp. Assume $U$ is discrete, and the policy is stationary, i.e., $u_{t}$ depends on the value of $s_{t}$ but not on the time $t$ or anything that happened before $t$. How should we justify the iterated expectation formula (with $s^{1}$ known) $\mathbf{E}[\mathbf{E}[X|s_{t} = s^{1}, u_{t}]|s_{t} = s^{1}] = \sum_{u \in U} \mathbf{P}(u_{t} = u|s_{t} = s^{1})\mathbf{E}[X|s_{t} = s^{1}, u_{t} = u],$ if we want to think of $s_{t} = s^{1}, u_{t} = u$ as an event?

Let $\mathcal{F}$ be the sigma algebra of all possible trajectories of the MDP, i.e., the sigma algebra involved when we calculate the expectation of any measurable functions of the process. Let $\mathcal{A}, \mathcal{B} \subset \mathcal{F}$ be subcollections of events. Assume $\mathcal{B} = \{B_{1}, B_{2}, ...\}$ is discrete. Think of $B$ as asking "$u_{t} = ?$" and $B_{1}$ as the event $\{u_{t} = u^{1} \in U\} = \{\omega: u_{t} = u^{1} \in U\}$, and $A_{i}$ as the event $\{s_{t} = s^{i}\}$, etc. For each $A_{i} \in \mathcal{A}$, we have $\mathbf{P}[B_{j}|A_{i}]$ for each index $j$. Therefore, we can define a probability space $(\mathcal{B}, 2^{\mathcal{B}}, \mu_{A_{i}})$, where $2^{\mathcal{B}}$ denotes the power set of $\mathcal{B}$. Here, $\mu_{A_{i}}(\{B_{j}\}) := \mathbf{P}[B_{j}|A_{i}]$.

Consider $(*) := \mathbf{E}[\mathbf{E}[X|A_{1}, B] | A_{1}]$ = $\mathbf{E}[\mathbf{E}[X|s_{t} = s^{1}, u_{t}]|s_{t} = s^{1}]$ . Notice the inner expectation is taken with respect to $\omega$, and it is actually a function of $B$, which means the outer expectation is taken w.r.t. $B$. So $(*)$ can be viewed as $\mathbf{E}[f(B) | A_{1}].$ However, here, we are unable to view $A_{1}$ as a sigma algebra. We can not view it as an event either, because when $B \in \mathcal{B}$ is the variable, the "events" mean elements of the sigma algebra $2^{\mathcal{B}}$, and $A_{1} \in \mathcal{F}$, but not in $2^{\mathcal{B}}$. Instead, we should treat the notation $\mathbf{E}[f(B) | A_{1}]$ as $\mathbf{E}_{A_{1}}[f(B)] = \mathbf{E}_{A_{1}, M}[f(B)],$ where the subscript $M$ which is often omitted refers to the randomness of the transition brought by the Markov process, and $A_{1}$ means the underlying probability measure $\mu_{A_{1}}$ with respect to which we integrate $f(B)$ relies on $A_{1}$. This subscripting is reasonable because there are two types of randomness in the whole process where $s_{t}, a_{t}$ are generated as time $t$ goes on: $a_{t}$ is determined by our random policy based on $s_{t}$, and $s_{t+1}$ is determined by $(s_{t}, a_{t})$ and the randomness of the transition from $(s_{t}, a_{t})$ imposed by the Markov process.

Therefore, $\mathbf{E}[f(B)|A_{1}] := \mathbf{E}_{A_{1}}[f(B)] = \int f(B) d\mu_{A_{1}}(B) = \sum_{B_{i} \in \mathcal{B}} \mu_{A_{1}}(\{B_{i}\}) f(B_{i}) = \sum_{B_{i} \in \mathcal{B}} \mu_{A_{1}}(\{B_{i}\}) \mathbf{E}[X|A_{1}, B_{i}] = \sum_{B_{i} \in \mathcal{B}} \mathbf{P}(B_{i}|A_{1}) \frac{1}{\mathbf{P}(A_{1} \cap B_{i})} \int 1_{A_{1}\cap B_{i}}(\omega) X(\omega) d\mathbf{P}(\omega) = \sum_{B_{i} \in \mathcal{B}} \frac{1}{\mathbf{P}(A_{1})} \int 1_{A_{1}\cap B_{i}}(\omega) X(\omega) d\mathbf{P}(\omega) = \frac{1}{\mathbf{P}(A_{1})} \int 1_{A_{1}}(\omega) X(\omega) d\mathbf{P}(\omega) = \mathbf{E}[X|A_{1}]$

Tom
  • 414
  • Hi Tom: I read your answer and I think you're on the right track but I can't say anything possibly useful until I understand it myself !!!! So, I'm going through some of the links I put out earlier and seeing if any of those help me. Hopefully, it won't take me so long but I'm also hoping one of the people on this list could provide a good, formal explanation on conditioning on events versus conditioning on sigma algebras. – mark leeds Oct 05 '23 at 07:54