5

This question has arisen from a question about a Markov chain where the transition matrix $T$ is random but structured. The idea being that the transition is composed of two parts, transition $A$ and $B$ which occur in known quantities $n$ and $m$ respectively but unknown order. This means that $T$ is the product of $n$ matrices $A$ and $m$ matrices $B$ in some ordering.

For example when $m=1$ and $n=2$

$$T \in \{ABB, BAB, BBA\}$$

The question is, when each permutation $\begin{pmatrix} m+n \\ m\end{pmatrix}$ of these $n+m$ matrices is equally likely what is the expected transition matrix $T$?

Interestingly this is like considering necklaces with fixed content where the colors are matrices. I say this because in the sense that I'm interested in the behavior of $T$ as it is applied to itself, if this problem could be solved invariant to rotations of the ordering of $A,B$ that would also be helpful.

My thinking:

My initial hope was that $$E[T] = \left(\frac{n}{n+m}A +\frac{m}{n+m} B\right)^{n+m}$$

But that does not seem to be the case by numerical experiments for small $n,m$ where the expected value can be calculated directed by summing over every permutation.

My second hope was to be able to "factor out" the matrices $A,B$ in the expected value expression. By letting $D = diag(rep(A,n),rep(B,m))$ the (n+m)*size(A) block-diagonal matrix we can consider constructing one order from some permutation $\Pi_1$.

\begin{align*} T(\Pi_1) &= \prod_i^{n+m} (e_i^T \otimes I) (\Pi_1\otimes I) D (\Pi_1\otimes I)^T (e_i \otimes I)\\ &= \prod_i^{n+m} (e_i^T \Pi_1\otimes I)D (\Pi_1^T e_i \otimes I) \end{align*}

each term of this product permutes the diagonal blocks of $D$ is the same way, then $(e_i^T \otimes I) \cdots (e_i \otimes I) $ selects off the $i$th block. This way each of the $n+m$ blocks of $A,B$ occur in some order determined by $\Pi_1$.

My hope from this formulation was that in this form each $\Pi$ would factor out in the expected value of $T$: $$E[T] = \frac{1}{(\begin{pmatrix} m+n \\ m\end{pmatrix}} \sum_{\Pi} T(\Pi)$$ by using vectorizations but it doesn't seem to work. Notably for each term in the product of $T(\Pi)$ we have

$$vec\left((e_i^T \Pi_1\otimes I)D (\Pi_1^T e_i \otimes I)\right) = (\Pi_1 e_i^T \otimes I)\otimes(e_i^T \Pi_1\otimes I) vec(D) $$

The problem here is that despite the data matrix $D$ factoring out nicely from one term it does not help us simplify the product as far as I can tell.

Thank you for any of your thoughts!

User
  • 91
  • I have no idea how to do this, or if it can be done, but your guesses have to work in the simplest case, when $A$ and $B$ commute. – saulspatz Aug 16 '20 at 15:38
  • 1
    note that this is $\binom{n+m}{m}^{-1}$ times the coefficient of $x^m$ in $P(x) = (A + xB)^{n+m}$; you can extract this coefficient using a standard roots of unity filter, e.g. one expression is $\mathbb{E}[T] = \frac{1}{(n+m+1)\binom{n+m}{m}} \sum_{w} w^{-m} P(w)$, where the sum is over all $(n+m+1)$-th roots of unity – user125932 Aug 16 '20 at 16:25
  • @user125932 From what I'm seeing this is very close to correct - there seems to be some slight issue but I may not be equipped to understand just yet. Could you expand on the reasons for this so I know where to look for get it right? – User Aug 16 '20 at 17:03
  • 1
    you can find some information about the technique here and here; you can also find more by searching "roots of unity filter". let me know if there's something I can clarify further – user125932 Aug 16 '20 at 17:08
  • 1
    one issue is that the expression I gave might not be very "numerically stable," i.e. that the numerical errors become large under exponentiation, and so the sum doesn't come out quite right when evaluated by a computer. if this is the issue you're having, and your goal is to find a quick way to compute $\mathbb{E}[T]$ using a computer, a better approach might be to actually compute the entire matrix polynomial $(A + xB)^{n+m}$, via e.g. dynamic programming, then extract the $m$-th coefficient at the end – user125932 Aug 16 '20 at 17:28
  • Well, one issue of note is that the row sums of this expression are not 1 so there appears to be a normalization issue rather than one that seems related to being computationally stable. My thought was that it may be an "off by one" type issue. Again I'm verify this by computing directly E[T] when there are few enough permutations – User Aug 16 '20 at 17:30
  • 1
    I'm not sure where the error could be coming from, as I don't think either of the steps in my analysis is wrong. The expression can also be written as $\mathbb{E}[T] = \frac{1}{q\binom{n+m}{m}} \sum_{k=0}^{q-1} w^{-mk} P(w^k)$, where $w = e^{2\pi i/q}$, for any $q > n+m$. – user125932 Aug 16 '20 at 18:15
  • @user125932 so the answer is that this is uniform over the the $\begin{pmatrix} n+m \ m\end{pmatrix}$ unique permutations of the two objects rather than the $(n+m)!$ permutations. In honesty it is unclear which is more meaningful to my application but this is extremely valuable, thank you! I summarized your answer below – User Aug 16 '20 at 19:50
  • 1
    I'm glad if it was helpful. Though note that those two distributions are actually the same, since each "unique permutation" (i.e. a choice of $m$ locations for $B$ in the product) corresponds to exactly $n!m!$ of the $(n+m)!$ permutations. – user125932 Aug 16 '20 at 20:20

1 Answers1

1

Okay, I'm compiling information from @user125932.

WLOG let $m>n$, then let $P(x) = (A+xB)^{n+m}$. Note that the sum of the $\begin{pmatrix} m+n \\ m\end{pmatrix}$ terms of $P$ which are degree $m$ give $$ \sum_{\Pi} T(\Pi)$$.

Then for this polynomial $P(x) = \sum_k c_k x^k$ we are interested in $c_m$ as $E[T] = c_m/ \begin{pmatrix} m+n \\ m\end{pmatrix}$.

Theorem 1 here (or see here) describes one method of obtaining this coefficient through the Root of Unity Filter. For $\omega = e^{2\pi i /m}$ we have

$$c_0 + c_m =\frac{1}{m} \sum_{k=0}^{\infty} c_{k\cdot m} = \frac{1}{m} \sum_{k=0}^{m-1} P(\omega^k)$$

where further terms do not exist because $2m > m+n = deg(P)$, (this is by assumption $m>n$). Thus

$$c_m = \frac{1}{m} \sum_{k=0}^{m-1} P(\omega^k) - c_0 = \frac{1}{m} \sum_{k=0}^{m-1} P(\omega^k) - A^{m+n} $$

Then finally

$E[T] = \frac{1}{\begin{pmatrix} m+n \\ m\end{pmatrix}} \left(\left(\frac{1}{m}\sum_{k=0}^{m-1} P(\omega^k) \right)- A^{m+n} \right)$

User
  • 91