Suppose we are in a Markov decision process setting. $S, U$ are spaces of states and actions/controls, resp. Assume $U$ is discrete, and the policy is stationary, i.e., $u_{t}$ depends on the value of $s_{t}$ but not on the time $t$ or anything that happened before $t$. How should we justify the iterated expectation formula (with $s^{1}$ known) $\mathbf{E}[\mathbf{E}[X|s_{t} = s^{1}, u_{t}]|s_{t} = s^{1}] = \sum_{u \in U} \mathbf{P}(u_{t} = u|s_{t} = s^{1})\mathbf{E}[X|s_{t} = s^{1}, u_{t} = u],$ if we want to think of $s_{t} = s^{1}, u_{t} = u$ as an event?
Let $\mathcal{F}$ be the sigma algebra of all possible trajectories of the MDP, i.e., the sigma algebra involved when we calculate the expectation of any measurable functions of the process. Let $\mathcal{A}, \mathcal{B} \subset \mathcal{F}$ be subcollections of events. Assume $\mathcal{B} = \{B_{1}, B_{2}, ...\}$ is discrete. Think of $B$ as asking "$u_{t} = ?$" and $B_{1}$ as the event $\{u_{t} = u^{1} \in U\} = \{\omega: u_{t} = u^{1} \in U\}$, and $A_{i}$ as the event $\{s_{t} = s^{i}\}$, etc. For each $A_{i} \in \mathcal{A}$, we have $\mathbf{P}[B_{j}|A_{i}]$ for each index $j$. Therefore, we can define a probability space $(\mathcal{B}, 2^{\mathcal{B}}, \mu_{A_{i}})$, where $2^{\mathcal{B}}$ denotes the power set of $\mathcal{B}$. Here, $\mu_{A_{i}}(\{B_{j}\}) := \mathbf{P}[B_{j}|A_{i}]$.
Consider $(*) := \mathbf{E}[\mathbf{E}[X|A_{1}, B] | A_{1}]$ = $\mathbf{E}[\mathbf{E}[X|s_{t} = s^{1}, u_{t}]|s_{t} = s^{1}]$ . Notice the inner expectation is taken with respect to $\omega$, and it is actually a function of $B$, which means the outer expectation is taken w.r.t. $B$. So $(*)$ can be viewed as $\mathbf{E}[f(B) | A_{1}].$ However, here, we are unable to view $A_{1}$ as a sigma algebra. We can not view it as an event either, because when $B \in \mathcal{B}$ is the variable, the "events" mean elements of the sigma algebra $2^{\mathcal{B}}$, and $A_{1} \in \mathcal{F}$, but not in $2^{\mathcal{B}}$. Instead, we should treat the notation $\mathbf{E}[f(B) | A_{1}]$ as $\mathbf{E}_{A_{1}}[f(B)] = \mathbf{E}_{A_{1}, M}[f(B)],$ where the subscript $M$ which is often omitted refers to the randomness of the transition brought by the Markov process, and $A_{1}$ means the underlying probability measure $\mu_{A_{1}}$ with respect to which we integrate $f(B)$ relies on $A_{1}$. This subscripting is reasonable because there are two types of randomness in the whole process where $s_{t}, a_{t}$ are generated as time $t$ goes on: $a_{t}$ is determined by our random policy based on $s_{t}$, and $s_{t+1}$ is determined by $(s_{t}, a_{t})$ and the randomness of the transition from $(s_{t}, a_{t})$ imposed by the Markov process.
Therefore,
$\mathbf{E}[f(B)|A_{1}] := \mathbf{E}_{A_{1}}[f(B)] = \int f(B) d\mu_{A_{1}}(B) = \sum_{B_{i} \in \mathcal{B}} \mu_{A_{1}}(\{B_{i}\}) f(B_{i}) = \sum_{B_{i} \in \mathcal{B}} \mu_{A_{1}}(\{B_{i}\}) \mathbf{E}[X|A_{1}, B_{i}] = \sum_{B_{i} \in \mathcal{B}} \mathbf{P}(B_{i}|A_{1}) \frac{1}{\mathbf{P}(A_{1} \cap B_{i})} \int 1_{A_{1}\cap B_{i}}(\omega) X(\omega) d\mathbf{P}(\omega) = \sum_{B_{i} \in \mathcal{B}} \frac{1}{\mathbf{P}(A_{1})} \int 1_{A_{1}\cap B_{i}}(\omega) X(\omega) d\mathbf{P}(\omega) = \frac{1}{\mathbf{P}(A_{1})} \int 1_{A_{1}}(\omega) X(\omega) d\mathbf{P}(\omega) = \mathbf{E}[X|A_{1}]$