1

Let $P$ be a stochastic matrix (of an irreducible Markov Chain) with stationary distribution $\pi^T$ (i.e. $\pi^T P = \pi^T$) and let further $E$ be the matrix of all $1$'s.

Given an $\alpha \in [0,1]$, is it possible to find an expression for the stationary distribution of $$\alpha P + \frac{(1-\alpha)}{n}E,$$ depending on $\pi$ and $\frac{1}{n}\mathbb{1}$, where $\mathbb{1}$ is the vector of all $1$'s?

More generally; given two transition matrices of irreducible Markov Chains $P_1$ and $P_2$ with stationary distributions $\pi_1^T$ and $\pi_2^T$, respectively. Can one find a general formula to calculate the stationary distribution of $$\alpha P_1 + (1-\alpha)P_2 \quad,$$ for $\alpha \in [0,1]$?

1 Answers1

1

If all you know is the stationary distributions, then there is no general formula.

Here is a simple example:

Take $P$, $Q$ to be irreducible, and $A$ the everywhere self loop chain, then $P := eP + (1-e) A$ and $Q' := fQ + (1-f) A$ are irreducible with the same stationary distributions as $P$ and $Q$. $P'$ and $Q'$ are basically just lazy versions of $P$ and $Q$.

Moreover, for every fixed $\lambda \in (0,1)$, you can pick different $e,f \in (0,1)$ to make $ X_{\lambda}(e,f) := \lambda P' + (1 - \lambda) Q' = \lambda e P + \lambda f Q + (1 - \lambda)( 2 - e - f) A$ have extremely different stationary distributions.

Specifically, you can show that for all $\lambda$, as the stationary distribution of $X_{\lambda}(e,f)$ converges to $\pi_P = \pi_{P'}$ as $f \to 0$ and it converges to $\pi_Q = \pi_{Q'}$ as $e \to 0$.


This is a lot of words -- all I'm saying is something very intuitive: if $P'$ is extremely lazy, then mixing it with $Q'$, may have very little effect on the stationary distribution of the mixture. Like if $P'$ is so lazy that it only takes a step to a new state once in a million years, and $Q'$ energetically takes steps to new states every second, then alternating $P'$ and $Q'$ is basically indistinguishable from a lazier version of $Q'$.


Note that you can talk about continuity of the stationary distribution in $\lambda$ -- by considering the $1$ eigenspaces of $P$ and $Q$, this is basically saying that if I have a continuous path of matrices, with one dimensional kernels, then the kernels change continuously. (This is how I would show the claim about letting $e \to 0$ above.)

Proof: If $A_n \to A$, let $v_n$ generate the kernels of $A_n$. We can assume $||v_n|| = 1$, so by compactness we can pass to a convergent subsequence, say $v_n \to v$.

Then we have $A_n(v_n) = ( A_n(v_n) - A(v_n) ) + (A(v_n) - A(v)) + A(v)$.

$||A_n(v_n) - A(v_n) ||_2 \leq ||A_n - A||_2 \to 0$, and $A(v_n) - A(v) \to 0$ as $n$ to infinity. So we get that $A_n(v_n) \to A(v)$ by triangle inequality. As $A_n(v_n) = 0$ we obtain that $A(v) = 0$. As $||v_n|| = 1$, we get $||v|| = 1$.

Since $A$ has one dimensional kernel, it is generated by $v$. (Note that in general the kernel of $A$ could jump in dimension -- this might correspond to the case of $P$ have a unique stationary distribution, but $Q$ not having a unique stationary, e.g. not being ergodic.)


NB: The relationship between $A$ and $ker(A)$ is actually algebraic: By using $ker(A) = im(A^T)^{\perp}$ (an isomorphic isometry on the relevant Grassmannians), it suffices to control the rate of change of the image. For a given matrix $A$ with kernel of dimension 1, say that $I$ is a maximal indexing set of linearly independent columns, the indices of some set of columsn that give a basis for the image. Near $A$, $I$ still works as a basis for the image, because of continuity of the determinant. The Plucker coordinates for the image are the determinants of the $(n-1) \times (n-1)$ minors of $A_I$ (columns of $A$ corresponding to indices $I$), so they are changing algebraically.

This means you could in principle use the determinant-derivative formula to control how fast the kernel is moving along a path of matrices $A$ (with all one dimensional kernels).

I'm not sure that this could ever be useful in an application where you couldn't already compute the kernel of $A$ (i.e. the stationary distribution of $\lambda P + (1 - \lambda) Q = I + A$).

For example, plugging the case $A_m(t) = [tP + (1 - t) Q]_m$ (some corresponding minor of the mixture) into the determinant derivative formula, $d/dt (det(A(t)) = det(A(t)) tr( A^{-1}(t) d/dt A(t))$, we get $d/dt ( det(A(t)) = det ( A(t) ) tr( (A(t))^{-1} [P - Q]_m)$. To see how the speed at which $ker( tP + (1 - t)A)$ is moving around around on the Grassmannian, you have to compute a vector $(A_m(t))_{m \in minors(I)}$, for some local choice of $I$, normalize to make this to make it a path on the sphere, and then compute the derivative of the resulting path.

This seems kind of useless, but I guess I'll leave it up.

Elle Najt
  • 20,740
  • Nice explanation. I know it has been a while since you've written this answer, but does your statement "If all you know is the stationary distributions, then there is no general formula" imply that there is some characterization of the stationary distributions of the mixture if we know $P$ and $Q$ and not just their stationary distributions? – jonem Mar 12 '21 at 06:19
  • @jonem If you know P and Q (say as matrices), you can find the stationary distribution(s) of $\alpha P + (1 - \alpha) Q$ in the way you would find the stationary distribution of any Markov chain given by a matrix $A$. Namely by solving the corresponding system of linear equations, $Au = u, as in this example: https://math.stackexchange.com/questions/1171283/find-the-stationary-distribution-of-an-infinite-state-markov-chain . I don't know if that answers your question. What I was writing out near the end seems to be trying to describe the path stationary distributions as you interpolate. – Elle Najt Mar 12 '21 at 07:04
  • I'd be interested in knowing what the path of stationary distributions looks like as you interpolate. Shouldn't be too hard to code up in some small cases and just see what happens. Maybe it is simpler than the circles I started running above. – Elle Najt Mar 12 '21 at 07:05