7

Let $A, X\in\mathbb{R}^{n\times n}$. The scalar objective function is $$J=\mathrm{tr}(AX)$$ If no constraints, let the derivative of $J$ with respect to $X$ be zeros, then we have $$A=0$$ Suppose $A$ is also a complex function, from $A=0$ I can further calculate something.

My question is: what if $X$ is an orthogonal matrix, i.e., $X^TX=I$? Then it becomes an constrained optimization problem. Can I still use matrix differential techniques to derive the minimizer? Thanks.

Shiyu
  • 5,228

2 Answers2

7

The method of Lagrange multipliers yields the solution.

One studies the (real-valued) function $F(X)$ with the (matrix) constraint that $C(X)=I$ for $F(X)=\text{trace}(AX)$ and $C(X)=X^TX$, hence $$ F(X)=\sum\limits_{ij}A_{ji}X_{ij},\quad C_{ij}(X)=\sum\limits_{k}X_{ki}X_{kj}. $$ The gradients of $F$ and $C$ are given by $\partial_{ij}F(X)=A_{ji}$ and $$ \partial_{ij}C_{k\ell}(X)=X_{i\ell}[k=j]+X_{ik}[\ell=j]. $$ One wants that there exists some multipliers $\lambda_{ij}$ such that, for every $i$ and $j$, $$ \partial_{ij}F(X)=\sum\limits_{k\ell}\lambda_{k\ell}\partial_{ij}C_{k\ell}(X). $$ This condition reads $$ A_{ji}=\sum\limits_{k,\ell}\lambda_{k\ell}\left(X_{i\ell}[k=j]+X_{ik}[\ell=j]\right)=\sum\limits_{\ell}\lambda_{j\ell}X_{i\ell}+\sum\limits_{k}\lambda_{kj}X_{ik}, $$ or, equivalently, introducing the matrix $\Lambda=(\lambda_{ij})$, $$ A^T=X\Lambda^T+X\Lambda. $$ The matrix $M=\Lambda^T+\Lambda$ is such that $M^T=M$ and $XM=A^T$, hence $X^TX=I$ implies that $M$ should be such that $M^TM=M^2=AA^T$.

Using pseudo-inverse matrices if $A$ is not invertible and the usual definition of the square root of a symmetric matrix, one sees that the maximizing and minimizing matrices $X$ and the maximum and minimum values of $F$ are

When $A$ is invertible, one can use the usual definition of the square root of a symmetric matrix to find $M$. The maximizing and minimizing matrices $X$ and the maximum and minimum values of $F$ are$$ X_0=A^TM^{-1}=\pm A^T(AA^T)^{-1/2},\quad F(X_0)=\pm\mbox{trace}((AA^T)^{1/2}). $$ When $A$ is not invertible, the formula $X_0=A^TM^{-1}$, whichever meaning one gives to the notation $M^{-1}$, cannot yield an orthogonal matrix $X_0$ since, if $A$ is not invertible, neither are $A^T$, nor $A^T$ times any matrix. On the other hand, the formula $F(X_0)=\pm\mbox{trace}((AA^T)^{1/2})$ might still be valid.

Did
  • 279,727
  • Nice Solution! Thanks a lot! – Shiyu Sep 20 '11 at 12:19
  • Hi Didier -- I need this solution for the answer to this question, but I don't understand the part about the pseudoinverses. In that case, $AA^T$ (in your notation, not the $A$ from the other question) may not be invertible; e.g. for one example I have

    $$A=\pmatrix{0&1\0&0};,$$

    and it seems that in that case I can't get $X_0$ in the way you've written it, even using the pseudoinverse, because the resulting matrix isn't orthogonal?

    – joriki Oct 30 '12 at 17:59
  • I found another solution and used it in the answer but also linked to this question here; I'd be interested to know if what you had in mind with the pseudoinverse is equivalent to the singular-value decomposition used there. – joriki Oct 30 '12 at 18:42
  • @joriki It has been a long time since I wrote this... It seems that here $AA^T$ is symmetric nonnegative hence diagonalizable as $AA^T=PDP^T$ with $PP^T=I$ and $D$ a diagonal matrix with a diagonal $(d_k)_k$, $d_k\geqslant0$. Then $(AA^T)^{-1/2}=PFP^T$ where $F$ is the diagonal matrix with the diagonal $(f_k)_k$, $f_k=1/\sqrt{d_k}$ if $d_k\gt0$, $f_k=0$ if $d_k=0$. But maybe this was not your question? – Did Oct 30 '12 at 18:42
  • @did: Thanks. My question was: For $A$ as in my example above, we have

    $$ AA^T=\pmatrix{1&0\0&0};, $$

    so your prescription gives

    $$ (AA^T)^{-1/2}=\pmatrix{1&0\0&0};, $$

    so we get

    $$X_0=\pm\pmatrix{0&0\1&0};,$$

    which isn't orthogonal.

    – joriki Oct 30 '12 at 18:46
  • 1
    @joriki Anyway, $X_0=\pm A^T\cdot$anything cannot be orthogonal when $A$ is not invertible, hence this case of the proof does not hold. Hmmmm... :-( – Did Oct 30 '12 at 20:02
  • Thanks for looking into this. In the meantime an answer has been added that derives the solution I'd found for the answer to the other question. – joriki Oct 30 '12 at 23:54
  • @Did and joriki: The polar decomposition theorem for $A$ basically states that you can always factorize $A =RU$ where $U$ is the square root of $A^TA$ and $R$ is orthogonal. If I remember right, the invertibility of $A$ gives you unique orthogonal $R$. Thus, in the given problem what it means is that the extremizer $Q$ is not unique, but the value of the extremum is the same. – me10240 Nov 05 '17 at 18:16
4

Another way to get did's answer is to use the SVD to reduce to the case of $A$ being diagonal: We have $A = USV'$, where $U$ and $V$ are orthogonal and $S$ is diagonal, and so

$$\def\Tr{\operatorname{Tr}}\Tr(AX) = \Tr(SV'XU)\;.$$

Since $X\to V'XU$ is a bijection on the set of orthogonal matrices,

$$\min\{\Tr(AX)\mid X\text{ orthogonal}\} = \min\{\Tr(SW)\mid W\text{ orthogonal}\}\;.$$

But $\Tr(SW) = \sum_iS_{ii}W_{ii}$, and all the $S_{ii}$ are non-negative and $-1\le W_{ii}\le1$, so that $\Tr(SW)\ge\Tr(S(-I))$, and so the minimum occurs at $W=-I$ (or $X=-VU'$) and is $-\sum_iS_{ii}$.

joriki
  • 238,052
dmuir
  • 436
  • I formatted your answer because I thought it's important to have this solution worked out here; but it was quite a bit of work; I encourage you to take a look at the formatting FAQ; it's not as hard as it might seem. – joriki Oct 30 '12 at 23:49