Given measure spaces $(X, \mathcal{X})$ and $(Y, \mathcal{Y})$ we define measure kernel $\pi : \mathcal{X} \times Y \to [0,\infty]$ such that $\pi(\cdot|y)$ is a measure on $\mathcal{X}$ for every $y \in Y$ and $\pi(A|\cdot)$ is $\mathcal{Y}$-measurable for every $A \in \mathcal{X}$. Probability kernel is then a measure kernel with $\pi(X|y) = 1$ for every $y \in Y$.
Now, this definition is a little abstruse, so I'd like to gain some intuition about it. I can see similarities with usual kernels of operators (because measure is a generalization of a function, at least naively). Also, I can think of the kernel as a collection of measures indexed by $y \in Y$. But I am not sure which of these two views (or if any) gives much insight into why this concept is natural and useful.
- What is the intuition behind the probability kernels?
- What are some applications that show their usefulness?
To be more precise about what I am after. There is a definition of group action $\rho$ of group $G$ on set $M$ as $\rho : G \times M \to M$ satisfying certain axioms. But this doesn't really give me any insight. If however someone told me that group action is actually nothing else than homomorphism $\rho : G \to {\rm Aut}(M)$ then I can immediately see the usefulness (given that I know enough group theory, of course). Is there something similar behind probability kernels too?