This question has arisen from a question about a Markov chain where the transition matrix $T$ is random but structured. The idea being that the transition is composed of two parts, transition $A$ and $B$ which occur in known quantities $n$ and $m$ respectively but unknown order. This means that $T$ is the product of $n$ matrices $A$ and $m$ matrices $B$ in some ordering.
For example when $m=1$ and $n=2$
$$T \in \{ABB, BAB, BBA\}$$
The question is, when each permutation $\begin{pmatrix} m+n \\ m\end{pmatrix}$ of these $n+m$ matrices is equally likely what is the expected transition matrix $T$?
Interestingly this is like considering necklaces with fixed content where the colors are matrices. I say this because in the sense that I'm interested in the behavior of $T$ as it is applied to itself, if this problem could be solved invariant to rotations of the ordering of $A,B$ that would also be helpful.
My thinking:
My initial hope was that $$E[T] = \left(\frac{n}{n+m}A +\frac{m}{n+m} B\right)^{n+m}$$
But that does not seem to be the case by numerical experiments for small $n,m$ where the expected value can be calculated directed by summing over every permutation.
My second hope was to be able to "factor out" the matrices $A,B$ in the expected value expression. By letting $D = diag(rep(A,n),rep(B,m))$ the (n+m)*size(A) block-diagonal matrix we can consider constructing one order from some permutation $\Pi_1$.
\begin{align*} T(\Pi_1) &= \prod_i^{n+m} (e_i^T \otimes I) (\Pi_1\otimes I) D (\Pi_1\otimes I)^T (e_i \otimes I)\\ &= \prod_i^{n+m} (e_i^T \Pi_1\otimes I)D (\Pi_1^T e_i \otimes I) \end{align*}
each term of this product permutes the diagonal blocks of $D$ is the same way, then $(e_i^T \otimes I) \cdots (e_i \otimes I) $ selects off the $i$th block. This way each of the $n+m$ blocks of $A,B$ occur in some order determined by $\Pi_1$.
My hope from this formulation was that in this form each $\Pi$ would factor out in the expected value of $T$: $$E[T] = \frac{1}{(\begin{pmatrix} m+n \\ m\end{pmatrix}} \sum_{\Pi} T(\Pi)$$ by using vectorizations but it doesn't seem to work. Notably for each term in the product of $T(\Pi)$ we have
$$vec\left((e_i^T \Pi_1\otimes I)D (\Pi_1^T e_i \otimes I)\right) = (\Pi_1 e_i^T \otimes I)\otimes(e_i^T \Pi_1\otimes I) vec(D) $$
The problem here is that despite the data matrix $D$ factoring out nicely from one term it does not help us simplify the product as far as I can tell.
Thank you for any of your thoughts!