Is there a similarity transformation rendering all diagonal elements of a matrix equal?

Question

I'm especially interested in SL$(2,\mathbb C)$, i.e. $2\times2$ matrices with determinant one, in which case I'm looking for a transformation from $\begin{pmatrix}a&b\\c&d\end{pmatrix}$ to $\begin{pmatrix}\frac{a+d}2&x\\ y&\frac{a+d}2\end{pmatrix}$ (the trace is conserved). Does such a similarity transformation exist? What about general $n\times n$ matrices?

Bonus points for an analytical formula (even if only for the 2x2 case).

It's generally possible for real $2\times2$ matrices, since you can rotate the basis vectors into each other and swap $a$ and $d$, and since this is a continuous transformation they have to be equal somewhere along the way. — joriki, Oct 12 '11 at 22:42
@joriki thanks, that's a good point. Is there an analytical formula for that rotation? (Otherwise I'll try figuring it out tomorrow) — Tobias Kienzler, Oct 12 '11 at 22:46
For $2\times 2$ matrices with complex conjugate eigenvalues, there is an orthogonal similarity transformation that renders the two diagonal elements equal to the real part of the two conjugate eigenvalues. — J. M. ain't a mathematician, Oct 13 '11 at 01:43
@Tobias: I had to improve my solution which ran sometimes into local minima and did not arrive at the correct result. A simple correction of the code was seemingly enough, see my improved answer — Gottfried Helms, Feb 26 '13 at 01:25

score 4 · Answer 1 · edited Apr 13 '17 at 12:20

I will present the final algorithm first, with follow up descriptions. Use the matrix $$\pmatrix{d_0 & x \\ y & d_1}$$ To obtain the complex $c$ and $s$ values, follow these steps (the text hopefully becomes clear with the later descriptions): \begin{align} y_s &= y - \bar{x} \quad&\text{ from the skew part}\\ \delta &= d_1 - d_0 \\ \delta_s &= \operatorname{Imaginary}\{\delta \} \\ \Delta &= \delta_s^2 + y_s\bar{y}_s & \text{the discriminant in the formulation for the sine} \\ s_0 &= j(\delta_s + \sqrt{\Delta}) & \text{the scaled sine value to diagonalize the skew part} \\ x_0 &= \delta y_s s_0 + y_s^2 x - s_0^2 y & \text{the first rotation's (scaled) result}\\ k_2c_2 &= j|x_0|y_s + x_0 \bar{s}_0 & \text{ the final (scaled) rotation values} \\ k_2s_2 &= j|x_0|s_0 - x_0\bar{y}_s \\ \end{align}

$y_s$ and $\delta_s$ are all the information needed from the skew Hermitian portion of the matrix, needed to diagonalize the skew Hermitian component to give a result of zero (for the new $x$ and $y$) in the skew Hermitian part of the result. This means that $x=\bar{y}$ in the result (first step). $y_s$ is not divided by $2$ as it would normally be in the calculation of the skew portion, since it is an unnecessary scale factor that is removed in the final scaling for $c$ and $s$.

Solving a particular quadratic (see this previous question) gives $k_0c_0=y_s$ and $k_0s_0 = j(\delta_s + \sqrt{\Delta})$, as described. The $k_0$ need never be found; it is a consistent scale factor that is removed in the final result (thus not included as a variable).

After $c_0$ and $s_0$ are applied to the matrix, the only necessary information from that is the $x$ value, named $x_0$ here. Thus it is the only calculated intermediate similarity value. From it, the complex phase is required, and $c_2$, $s_2$ are the final result. Find $k_2$ to scale these for unitary action, and the result gives equal diagonal values. See the python code at the end for more mundane details regarding cases where $c,s=0,0$ occur.

The algorithm tests well and works, here is a hopefully enlightening description.

First for reference, the unitary (complex Givens) rotation on a 2x2 matrix gives:

\begin{align} & \pmatrix{c & s \\ -\bar{s} & \bar{c} \\ } \pmatrix{d_0 & x \\ y & d_1 \\ } \pmatrix{\bar{c} & -s \\ \bar{s} & c \\ } \\ =& \pmatrix{c & s \\ -\bar{s} & \bar{c} \\ } \pmatrix{d_0\bar{c}+x\bar{s} & -d_0 s + x c \\ y \bar{c} + d_1 \bar{s} & -y s + d_1 c \\ } \\ =& \pmatrix{d_0c\bar{c} + d_1s\bar{s} + c\bar{s}x + \bar{c}sy & (d_1 - d_0)cs + x c^2 - ys^2 \\ (d_1 -d_0)\bar{c}\bar{s} -x\bar{s}^2 + y\bar{c}^2 & d_0s\bar{s} + d_1c\bar{c} - c\bar{s}x - \bar{c}sy } \end{align}

To set $d_1 = d_0 $ we solve $$d_0c\bar{c} + d_1s\bar{s} + c\bar{s}x + \bar{c}sy = d_0s\bar{s} + d_1c\bar{c} - c\bar{s}x - \bar{c}sy $$ or $$\overbrace{(d_1 - d_0)}^{\delta}(s\bar{s} - c\bar{c}) + c\bar{s}(2x) +\bar{c}s(2y) = 0 \tag{1}$$ Without loss of generality, use real $c\in [0,1]$, and the fact $c\bar{c} + s\bar{s} =1$ (from the unitary form of the similarity) to have $$s\bar{s} - c\bar{c} = (1 - c\bar{c}) - c\bar{c} = 1 - 2c\bar{c}$$ and $$s = (1 - c^2)^{\frac{1}{2}}e^{j\beta}$$

Then (1) becomes $$\delta(1-2c^2) + 2c(1 - c^2)^{\frac{1}{2}}\left[e^{-j\beta}x +e^{j\beta}y\right]=0 \tag{2}$$

From here it seems best to perform an intermediate similarity to achieve $|x| = |y|$. When that is true, the ellipse term $e^{-j\beta}x +e^{j\beta}y$ is easier to deal with. To achieve it, diagonalize the skew portion of the matrix, giving $y = \bar{x}$. This then causes the ellipse to "collapse" to a line on the real axis. The phase angle $\measuredangle xe^{j\beta} = \pm\frac{\pi}{2}$ will give the ellipse term as zero. So with $\beta = \frac{\pi}{2} + \measuredangle x$ we have the phase of $s$. For the magnitude, use the now reduced equation (2)

$$\delta(1-2c^2)=0 \tag{3}$$ We see here that $c=\frac{1}{\sqrt{2}}$ solves it, or using the scaled forms, $|c| = |s|$: \begin{align} c &= |x| \\ s &= j|x|e^{j\beta} = x\\ \end{align}

Here is the complete python code:


def hollow(d0, x, y, d1):
  ''' return c,s for a 2x2 unitary (complex Givens) with result of diagonals equal '''
  ys=y - x.conjugate()
  if ys==0: # if the matrix's skew portion is already diagonal
    c = complex(0, abs(x))
    s = x # s has the same phase as x and |c| = |s|
  else:
    d = d1 - d0
    ds =  d.imag
    ys2 = (ys*ys.conjugate()).real
    rad = ds*ds + ys2
    s0 = complex(0,ds + math.sqrt(rad)) # can do plus or minus square root here
    x0 = d*ys*s0 + ys*ys*x - s0*s0*y # the x after diagonalizing the skew portion
    if x0==0: c,s = ys, s0
    else:
      ax0 = abs(x0)
      s1 = complex(0,abs(x0))
      c = ys*s1 + x0*s0.conjugate()
      s = s0*s1 - x0*ys.conjugate()
  n = abs(complex(abs(s),abs(c))) # the square root of the sum of the norm squared of c and s
  c=c/n
  s=s/n
  return c,s

The function is called hollow since if the trace is zero, the result is a zero diagonal matrix, also known as a hollow matrix.

Wow, great job! (And I'd +2 you for adding Python if I could...) — Tobias Kienzler, Feb 26 '13 at 05:50
Updated the code, it checked x0 and gave a case for when the original matrix is already skew. Otherwise c,s=0,0 would have been the return value. — adam W, Feb 26 '13 at 13:19

Gottfried Helms · Accepted Answer · 2013-02-26T01:19:29.580

[update]: adapted the symbols S, b and c to the convention in the OP, sign error corrected [/update]

What I get for an orthogonal similarity transformation is (using r for $\small \cos(x) $ and s for $\small \sin(x) $ with some rotation-angle x and $\small S^{-1} \cdot A S $ for the matrix-multiplication

$ \small \begin{array} {rr|rr} & & r & s \\ & & -s & r \\ \hline \\ a & b & -s b+ar & rb+as \\ c & d & -sd+cr & rd+cs \\ \hline \\ r & -s & -srb+s^2d+ar^2-csr & r^2b-srd+asr-cs^2 \\ s & r & -s^2 b-srd+cr^2+asr & srb+r^2d+csr+as^2 \end{array} $

Using the abbreviations $\small r_2 = r^2-s^2=\cos(x)^2-\sin(x)^2 = \cos(2 x)$ and $\small s_2 = 2 r s= 2 \cos(x) \sin(x) = \sin(2 x)$ for the angle-duplication then I get for the resulting matrix

$ \small S^{-1} \cdot A \cdot S = \begin{array} {r|r|} (a+d)+(a-d)r_2-(b+c)s_2 & (b-c)+(a-d)s_2 + (b+c)r_2 \\ \hline \\ -(b-c) + (a-d)s_2 + (b+c)r_2 & (a+d) - (a-d)r_2 + (b+c)s_2 \\ \end{array} \cdot {1 \over 2} $

Then to have the diagonal-entries equal, the term $\small (b+c)s_2 -(a-d)r_2 $ must be zero.

[update]

The generalization for higher n seems obvious. Assume the diagonalelements $\small d_1,d_2,d_3 $, then each similarity-rotation on one pair of columns / rows modifies only two of that elements. If we denote one transformation between the columns/rows $\small T_{c_1,c_2} $ and $\small A_{c_1,c_2} = T_{c_1,c_2}^{-1} \cdot A \cdot T_{c_1,c_2} $ then the diagonal-elements behave like this over the iteration of transformations T :

$\small \begin{array} {rrr} T_{1,2}: & (d_1+d_2)/2 &, (d_1+d_2)/2 &, d_3 \\ T_{1,3}: & (d_1+d_2)/4+d_3/2 &, (d_1+d_2)/2 &, (d_1+d_2)/4+d_3/2 \\ T_{2,3}: & (d_1+d_2)/4+d_3/2 &, (d_1+d_2)3/8+d_3/4 &,(d_1+d_2)3/8+ d_3/4 \\ \ldots \end{array} $
and I think this is not too difficult to show, that iterations of this converge.

I've done an example using my (somehow primitive) MatMate-program. But I think the code will be selfexplaining enough to be translated to some other programming language.

[Update 3]: The macro had to be updated to overcome the local-minimum-problem.
We introduce a dynamic selection of the x,y-axes according to the smallest and greatest element in the diagonal of the currently iterated matrix

// Macro definitions
macrodef rotpair // rotates matrix M in one plane (=x,y) using rotation-matrix t1
m1 = t'*m*t                           // get a temporary working copy

                                     // get values a,b,c,d from submatrix
 a,b,c,d = v(m1[x,x]),v(m1[x,y]),v(m1[y,x]),v(m1[y,y])
 s_2,c_2 = a - d, b + c              // determine cos(2 phi) and sin(2 phi)
 phi = -arccs(c_2,s_2)/2             // determine required rotation-angle phi
 t1 = rotsp(einh(n),x,y,cos(phi),sin(phi)) // create rotation-matrix 
                                     //    for one x/y plane-rotation
 m2 = t1' * m1 * t1                    // do similarity-rotation
 t = t*t1                              // append current rotation to accumulator
macroend

macrodef init
 set randomstart=41
 m = (randomu(n,n,-10,10))  // create some randommatrix of size n x n
 t = einh(n)   // rotation-matrix, accumulates all rotations while iterating
 dg = diag(m) '  // get the diag of the initial matrix
 protocol = dg   // initialize some protocol for the documentation of the
                 // diagonal elements
macroend         

macrodef run  // pairwise rotations over all pairs of coordinates
  x,y = v(iminzl(dg)), v(imaxzl(dg))  // store indexes of smallest and largest 
                                      // diagonal-element into x and y-"coordinates"
  macroexec rotpair                   // do rotation
  dg = diag(m2) '
  protocol = {protocol,  dg}     // append current diagonal to protocol
 macroend

 // commands in dialog:
    n=5  // use matrix-size n=5
    macroexec init

 macroexec run    // repeat this until convergence

 // commands in dialog:
    n=10  // use matrix-size n=10
    macroexec init

 macroexec run   // repeat this until convergence

Results

 // result: (n=5 size=5x5)
  -4.2684,  7.7193,  3.0452, -0.8330, -9.9136
  -4.2684, -1.0972,  3.0452, -0.8330, -1.0972
  -0.6116, -1.0972, -0.6116, -0.8330, -1.0972
  -0.6116, -0.8544, -0.8544, -0.8330, -1.0972
  -0.8544, -0.8544, -0.8544, -0.8330, -0.8544
  -0.8544, -0.8437, -0.8544, -0.8437, -0.8544
  -0.8544, -0.8490, -0.8544, -0.8437, -0.8490
  -0.8490, -0.8490, -0.8544, -0.8490, -0.8490
  -0.8490, -0.8490, -0.8517, -0.8490, -0.8517
  -0.8490, -0.8490, -0.8504, -0.8504, -0.8517
  -0.8504, -0.8490, -0.8504, -0.8504, -0.8504
  -0.8504, -0.8497, -0.8504, -0.8497, -0.8504
  -0.8504, -0.8497, -0.8504, -0.8500, -0.8500
  -0.8500, -0.8500, -0.8504, -0.8500, -0.8500
  -0.8500, -0.8500, -0.8502, -0.8500, -0.8502
  -0.8501, -0.8500, -0.8501, -0.8500, -0.8502
  -0.8501, -0.8501, -0.8501, -0.8500, -0.8501
  -0.8501, -0.8501, -0.8501, -0.8501, -0.8501

  // result: (n=10 size=10x10, 40 iterations)       
   -4.2684,  8.0316,  7.0596, -6.1777,  3.5646, -6.5801,  9.0093,  5.8538,  2.9013, -9.6980
   -4.2684,  8.0316,  7.0596, -6.1777,  3.5646, -6.5801, -0.3444,  5.8538,  2.9013, -0.3444
   -4.2684,  0.7257,  7.0596, -6.1777,  3.5646,  0.7257, -0.3444,  5.8538,  2.9013, -0.3444
   -4.2684,  0.7257,  0.4410,  0.4410,  3.5646,  0.7257, -0.3444,  5.8538,  2.9013, -0.3444
  ...
  ...
    0.9696,  0.9696,  0.9695,  0.9696,  0.9695,  0.9698,  0.9695,  0.9696,  0.9696,  0.9696
    0.9696,  0.9696,  0.9696,  0.9696,  0.9695,  0.9696,  0.9695,  0.9696,  0.9696,  0.9696
    0.9696,  0.9696,  0.9696,  0.9696,  0.9695,  0.9696,  0.9695,  0.9696,  0.9696,  0.9695
    0.9696,  0.9696,  0.9696,  0.9696,  0.9695,  0.9695,  0.9695,  0.9696,  0.9696,  0.9695

[Update 2]:[obsolete, I found a better solution] I tried the same routine simply using n=10 instead of n=5, and with the same initializing of the randomnumber generator, so the solution should be reproducable. Unfortunately the iteration seems to run into a local minimum, such that the process converges to a non-equal solution. Here is the result near the limit:

// result
...
1.6419, 1.6419, 1.6419, 1.6419, 1.6419,-6.5801, 9.0093, 5.8538, 2.9013,-9.6980

I think the bottom right block should have a minus in front of $(a-d)r_2$, yielding the requirement $(b+c)s_2-(a-d)r_2=0$ and therefore $\tan(2x)=\frac{a-d}{b+c}$. I think that answer was also posted as a comment on my question but has been deleted since. By allowing $x\in\mathbb C$ there's probably no reason why this is not valid in general (apart from the usual $\tan^{-1}$ ambiguity). So you basically solved the 2x2 case, thanks! — Tobias Kienzler, Oct 13 '11 at 11:24
Hmm, I recalculated all entries manually and came again to the same result(no negative sign at (a-d) in the right-bottom entry). Still I might be wrong... (Also I'll update my notation to adapt yours, unfortunately the A matrix was also transposed...) — Gottfried Helms, Oct 13 '11 at 11:42
In the second-to-last step you use $2r^2=r^2+(1-s^2)=1+r_2$ and $2s^2=s^2+(1-c^2)=1-r_2$. In the top left block, there's $ar^2$ and $ds^2$, but in the bottom right one it's $as^2$ and $dr^2$, swapping the order of $(a-d)$ and yielding that missing sign — Tobias Kienzler, Oct 13 '11 at 11:49
great, accepted (unless someone shows results for general $SL(n)$...). So the final matrix is $\frac12\begin{pmatrix}a+d & (b-c) + \frac{b+c}{r_2} \ -(b-c) + \frac{b+c}{r_2} & a+d\end{pmatrix}$. (though $r_2$ can be expressed in terms of $a,b,c,d$ as well) — Tobias Kienzler, Oct 13 '11 at 12:05
For the cases n>2 I assume the proof must go via a convergence-proof of iterations. Consider a third row/column in the matrix of your comment, containing an e in the diagonal. Then this e is unaffected by the similarity-rotation done so far. Now you have to apply the same scheme additionally to the submatrices [1'3,1'3] and [2'3,2'3] where the "'" means concatenation of column/row indexes. Call all that three steps one "run", then this run will not give the final result. But I guess, that the iteration of runs gives a converging sequence (somehow averaging of means) — Gottfried Helms, Oct 13 '11 at 12:16
Nice answer! I was wondering though if you might have solved or come across a general complex solution? I have not seen it in the literature, but it turns out if you do a pre-conditioning step of diagonalizing the Hermitian portion (or skew would work also), from that point it is easier to solve. — adam W, Feb 25 '13 at 19:35
@adamW: unfortunately not, I just "invented" it from the scratch. With complex numbers I'd have to translate this to Pari/GP - perhaps someone else can put his/her time in it? — Gottfried Helms, Feb 26 '13 at 00:22
@adamW : I encountered the problem of local minima, but could improve the routine. See the changes in my answer. — Gottfried Helms, Feb 26 '13 at 01:22
Well, I thought that the diagonal should smooth out with that strategy. I am not sure I even understand how a local minimum is possible, since at each step the largest combines with the smallest and neither is ever to be seen again. — adam W, Feb 26 '13 at 02:22
@adamW: Well, I don't know exactly the reaon what form that "local minima" have. But I came across a very similar question in the well known varimax-rotation in factor analysis, where some authors found that also the varimax-rotation can fall into such suboptimal local optimae. I don't have the exact citation at hand, but you might google for "ten Berge" "varimax" "stationary". This shall give some links to an article of ten Berge first and then to some other material. — Gottfried Helms, Feb 26 '13 at 09:02
@Gottfried My function hollow() may be extended; not to have the least norm squared in the diagonal (as it works now) but to have the most possible using a unitary similarity. Thus finding the nearest normal matrix in the $2 \times 2$ case. Would this be the varimax, or maybe infomax in two dimensions? I am more familiar with eigen-decomposition than I am with statistical analysis. — adam W, Feb 26 '13 at 14:57

score 1 · Answer 3 · answered Oct 13 '11 at 09:15

1

$ \left( \begin{array}{ccc} p & q \\ r & s \end{array} \right) $ $ \left( \begin{array}{ccc} a & b \\ c & d \end{array} \right) $ $ =\left( \begin{array}{ccc} \frac{a+d}{2} & x \\ y & \frac{a+d}{2} \end{array} \right) $

$2pa+2qc=a+d$

$pb+qd=x$

$ra+sc=y$

$2rb+2sd=a+d$

$2bra+2sda=a^{2}+da$

$2bra+2bsc=2by$

$s=\frac{2by-a^{2}-da}{2bc-2da}$

$r=\frac{by-scb}{ab}$

$p=\frac{2ax-b^{2}-cb}{2ad-2bc}$

$q=\frac{x-pb}{d}$

answered Oct 13 '11 at 09:15

Angela Pretorius

10,200

Thanks for that matrix, but I'm looking for a similarity transformation, i.e. a matrix $S$ such that $S^{-1}\begin{pmatrix}a&b\c&d\end{pmatrix}S=\begin{pmatrix}\frac{a+d}{2}&x\y&\frac{a+d}{2}\end{pmatrix}$. As joriki pointed out, for real valued $a,b,c,d$ a Rotation Matrix $\begin{pmatrix}\cos\theta & -\sin\theta\\sin\theta & \cos\theta\end{pmatrix}$ can be used as $S$, leaving only one unknown $\theta$. – Tobias Kienzler Oct 13 '11 at 09:51
1

I understood the question that a similarity-transformation was sought, but I don't see, that the above is a similarity-transformation. (Such transformation requires $\small A \cdot P \cdot A^{-1} $ and in case $\small A$ should be orthogonal(a rotation T) then $\small T \cdot P \cdot T^\tau $ – Gottfried Helms Oct 13 '11 at 09:59
If a rotation-matrix shall suffice, then $\small b+c=0$ is required. – Gottfried Helms Oct 13 '11 at 10:00
@GottriedHelms: wouldn't that contradict joriki's point? – Tobias Kienzler Oct 13 '11 at 10:04
@Tobias : maybe I've an sign-error. I'll check my derivation again – Gottfried Helms Oct 13 '11 at 11:00

Is there a similarity transformation rendering all diagonal elements of a matrix equal?

3 Answers3

Linked