0

I stumbled across this seemingly elementary question while studying the theory of Markov Decision Processes. Suppose $\mathcal{M} = (\mathcal{S}, \mathcal{A}, \mathcal{P}, \mathcal{R})$ is an MDP and fix an action $a \in \mathcal{A}$. Does $(i, j) \mapsto \mathcal{P}(i, a, j)$ define a transition matrix? If that is so, how to accommodate the possibility that only a strict subset of $\mathcal{A}$ may be available from a given state, i.e. what is $\mathcal{P}(i, a, j)$ for $a \notin A(i)$?

0 Answers0