Diagonalization of symmetric matrix

Question

Suppose we have a symmetric matrix $A \in \mathbb R^{n \times n}$ (for example, the matrix corresponding to a quadratic form) which we want to diagonalize.

Now the usual way to do this is to find an orthonormal basis of $\mathbb R^n$ constisting of eigenvectors of $A$ (the spectral theorem always guarantees the existence of such) and the resulting matrix $Q \in O_n(\mathbb R)$ is such that $QAQ^{-1}$ is diagonal.

However, sometimes the task is to diagonalize a quadratic form, but not the way I just described, but by performing simultaneous column and row transformations (we can do that due to Sylvester's inertia theorem).

Now my questions: Are these procedures completely different from each other or are they doing the same thing? Why don't I use the spectral theorem to diagonalize a quadratic form or why don't I use the $2$nd procedure to diagonalize a general symmetric matrix?

I suppose this has to do with the type of transformation, i.e. orthogonal or not, but I would like to have some good explanation.

If the original matrix is all integers (or all rational), the operation of simultaneous row and column operations always involves rational numbers, and you arrive at $RAR^T = D$ where all entries of $R,A,D$ are rational. SEE http://math.stackexchange.com/questions/1388421/reference-for-linear-algebra-books-that-teach-reverse-hermite-method-for-symmetr — Will Jagy, Oct 07 '17 at 17:03
a 4 by 4 example with ugly eigenvalues http://math.stackexchange.com/questions/395634/given-a-4-times-4-symmetric-matrix-is-there-an-efficient-way-to-find-its-eige/1392600#1392600 — Will Jagy, Oct 07 '17 at 17:08
5 by 5 https://math.stackexchange.com/questions/2409232/simplify-quadratic-form-in-order-to-get-signature/2409277#2409277 — Will Jagy, Oct 07 '17 at 17:15
What you say as 'simultaneous column and row transformations' is sometimes formally called congruence operation on a matrix. This method is essentially same as diagonalising a quadratic form by repeatedly completing squares, i.e. the Lagrange method of diagonalisation. — StubbornAtom, Oct 07 '17 at 18:07
If you use the spectral theorem for diagonalisation, the entries of the diagonal matrix are the eigenvalues, but not necessarily so if you use congruence operations. — StubbornAtom, Oct 07 '17 at 18:14
@StubbornAtom but is that a problem? We don't care so much about how the Eigenvalues rather than about the signature and the diagonalization itself, do we? — Staki42, Oct 07 '17 at 18:51
It was never a problem. I was talking about the difference of the two methods. — StubbornAtom, Oct 07 '17 at 19:03
So in essence we have accomplished the same orthogonal base transformation, the only difference is the values on the diagonal? — Staki42, Oct 08 '17 at 16:33

score 6 · Accepted Answer · answered Oct 09 '17 at 11:57

This is a good question. From a more abstract and coordinate independent point of view, the two procedures are actually applied to two different objects and result in something different. The two scenarios are:

You have a finite dimensional real inner product space $(V, \left< \cdot, \cdot \right>)$ and a self-adjoint operator $T \colon V \rightarrow V$. You want to find an orthonormal basis $e_1,\dots,e_n$ of $V$ (that is, $\left< e_i, e_j \right> = \delta_{ij}$) such that $Tv_i = \lambda_i v_i$ for some $\lambda_i \in \mathbb{C}$. In other words, you want to find an orthonormal basis of eigenvectors of $T$.
You have a finite dimensional vector space (over a field of characteristic $\neq 2$) and a symmetric bilinear form $B \colon V \times V \rightarrow \mathbb{F}$. You want to find a basis $e_1,\dots,e_n$ of $V$ such that $B(v_i, v_j) = 0$ for $i \neq j$. Such a basis is called $B$-orthogonal.

This becomes confusing when you represent the objects involved as matrices because unfortunately, both symmetric bilinear forms and self adjoint operators are represented as matrices by the same object - a symmetric matrix. More precisely:

A symmetric operator $T$ is represented with respect to an orthonormal basis $\beta$ by a symmetric matrix $A = [T]_{\beta} \in M_n(\mathbb{R})$. With respect to a different orthonormal basis $\beta'$, it will be represented by a different matrix $[T]_{\beta'} = Q^{-1} A Q$ where $Q$ is the change of basis matrix between $\beta$ and $\beta'$. Since both bases are orthonormal, $Q$ is actually orthogonal so we have $Q^{-1} A Q = Q^T A Q$. If $\beta'$ consists of eigenvectors of $T$, we have $Q^T A Q = D$ where $D$ is diagonal.

From the matrix point of view, given a symmetric matrix $A \in M_n(\mathbb{R})$, to orthogonally diagonalize $A$ we must find an orthogonal matrix $Q$ such that $Q^T A Q$ is diagonal. Here, there will be many different orthogonal matrices $Q$ such that $Q^T A Q$ is diagonal but, up to possible reordering, the diagonal elements will always be the same (they are the eigenvalues of $A$).
A symmetric bilinear form $B$ is represented with respect to an arbitrary basis $\beta$ of $V$ also by a symmetric matrix $A = [B]_{\beta}$. With respect to a different basis $\beta'$, it will be represented by a different matrix $[B]_{\beta'} = Q^T A Q$ where $Q$ is the change of basis matrix between $\beta$ and $\beta'$.

From the matrix point of view, given a symmetric matrix $A \in M_n(\mathbb{F})$, to diagonalize $A$ by congruence we must find an invertible matrix $Q$ such that $Q^T A Q$ is diagonal. Here, there will be many invertible matrices $Q$ such that $Q^T A Q$ is diagonal but the diagonal entries won't be the same, even up to reordering.

Finally, let's answer your questions.

Why don't I use the spectral theorem to diagonalize a quadratic form?

First, the algorithm for diagonalizing a quadratic form works over any field of characteristic $\neq 2$ while the spectral theorem works only over $\mathbb{R}$ so you can't use the spectral theorem unless working with real matrices. Even if you work with real matrices, using the spectral theorem to diagonalize a quadratic form is an overkill. Indeed, the spectral theorem guarantees an orthogonal matrix $Q$ such that $Q^{-1} A Q = Q^T A Q = D$ but to diagonalize the quadratic form, you only need an invertible $Q$, not an orthogonal one.

If you want to use the spectral theorem, you will have to start by finding the eigenvalues of $A$ which involves find the roots of an $n$ degree polynomial. Since there is no closed formula for the roots, you'll have to approximate them and then find the eigenvectors which will also be approximations and so on. But if you use simultaneous row/column operations, you don't need to find the eigenvalues of $A$! If $A$ has rational entries, you'll get an invertible matrix $Q \in M_n(\mathbb{Q})$ which can be computed exactly (assuming you can do perfect rational arithmetic) such that $Q^T A Q$ is diagonal. Of course, $Q$ won't necessarily be orthogonal but it doesn't matter for the purpose of diagonalizing a quadratic form by congruence.

The only reason to use the spectral theorem to diagonalize a quadratic form is that you are given a symmetric form $B$ on an inner product space $(V, \left< \cdot, \cdot \right>)$ and you want to find a basis $(e_1,\dots,e_n)$ of $V$ which is both $B$-orthogonal and $\left< \cdot, \cdot \right>$-orthonormal. Finding only a $B$-orthogonal basis is much easier using simultaneous row/column operations.
Why don't I use the 2nd procedure to (orthogonally) diagonalize a general symmetric matrix?

The second procedure will only give you an invertible matrix $Q$ such that $Q^T A Q = D$. It won't give you an orthogonal matrix so $Q^T \neq Q^{-1}$ and this is not orthogonal diagonalization (and not even regular diagonalization). This is not surprising because to diagonalize $A$, you must know the eigenvalues of $A$ and the row/column operations procedure just doesn't give you that.

Awesome answer, that is what I was looking for. Thanks for your time. Side Note: the spectral theorem also works for $\mathbb C $, doesn't it? — Staki42, Oct 10 '17 at 10:02
@Staki42 Yes, the spectral theorem also works for $\mathbb{C}$, but there are a few minor details that need to change in that case. The matrix representation of a self-adjoint operator over a complex vector field is a Hermitian rather than symmetric matrix. Likewise the bilinear form must be Hermitian rather than symmetric. The orthogonal matrix $Q$ must become a unitary matrix $U$, etc. — tparker, Jan 15 '23 at 05:38

score 0 · Answer 2 · answered Jan 15 '23 at 20:22

levap's excellent answer covers the case of real symmetric matrices that the OP asks about. But I think that only considering real symmetric (or more generally Hermitian) matrices actually confuses the matter somewhat, because the fact that the eigenvectors are orthogonal somewhat muddies whether we should thinking about $Q^T A Q$ or $Q^{-1} A Q$ (which in the orthogonal case are equivalent).

A general (not necessarily real-symmetric or Hermitian) finite-dimensional linear operator $A$ on a vector space $V$ over a field $\mathbb{F}$ cannot necessarily be diagonalized. (But if the field is the complex numbers, then $L$ can almost always be diagonalized, i.e. with probability 1 over most standard ensembles of random matrices). If it can be diagonalized by a change-of-basis matrix $S$, then we have $D = S^{-1} A S$, where $D$ is diagonal. That is, the matrix $A$ is similar to a diagonal matrix, where we flank $A$ by inverse matrices.

If $V$ is endowed with an inner product and $A$ happens to be self-adjoint, then the spectral theorem guarantees that this is always possible, and that the eigenvalues will all be real, and that the change-of-basis matrix will be unitary/orthogonal. In terms of matrices, this means that if the matrix $A$ is Hermitian, then there will always be a matrix $S$ such that $S^{-1} A S$ is diagonal, and moreover $S$ can always be chosen to be unitary, so that the matrix to the left of $A$ can be equivalently thought of as $S^{-1}$ or $S^\dagger$.

By contrast, for a general finite-dimensional real bilinear form $B$, "diagonalizing" means finding an invertible change-of-basis matrix $S$ such that $S^T B S = D$, where $D$ is diagonal. That is, $B$ is congruent to a diagonal matrix. Since $D$ is symmetric, it's clear that $B$ must be as well. So only symmetric real bilinear forms can be diagonalized. Since $S$ is only required to be invertible and not orthogonal, this is a much easier task, and (as levap said) can be done using only row addition and scalar multiplication operations, without needing to solve any polynomials. The diagonal terms can always be normalized to $0$ or $\pm 1$ by Sylvester's law of inertia. I'm not quite sure how things work for bilinear or sesquilinear forms over complex Hilbert spaces.

In terms of matrices, this means that for any real symmetric matrix $A$, there's a real invertible matrix $S$ such that $S^T A S$ is diagonal, and indeed only has entries 0, 1, and -1. (And unlike in the previous case where we used inverses, we can find $S$ and $D$ by just doing arithmetic operations on the elements of $A$). For a non-symmetric real matrix $A$, there never exists any $S$ such that $S^T A S$ is diagonal.

Neither matrix similarity nor matrix congruence should be confused with matrix equivalence, which is much more general because $A$ can be rectangular and you can use totally unrelated matrices on both sides of $A$. Rectangular matrices of the same size are equivalent iff they have the same rank.

Diagonalization of symmetric matrix

2 Answers2

Linked