I stumbled across this seemingly elementary question while studying the theory of Markov Decision Processes. Suppose $\mathcal{M} = (\mathcal{S}, \mathcal{A}, \mathcal{P}, \mathcal{R})$ is an MDP and fix an action $a \in \mathcal{A}$. Does $(i, j) \mapsto \mathcal{P}(i, a, j)$ define a transition matrix? If that is so, how to accommodate the possibility that only a strict subset of $\mathcal{A}$ may be available from a given state, i.e. what is $\mathcal{P}(i, a, j)$ for $a \notin A(i)$?
Asked
Active
Viewed 18 times
0
-
1the question can be clearer. For example what is (i,j)? – Jan 14 '23 at 11:57
-
@user58136 these are two states, I believe this is standard notation – Othman El Hammouchi Jan 14 '23 at 21:37