Why a form is positive only if its matrix in some ordered basis is a positive matrix?

Question

I'm reading Hoffman's "Linear Algebra" Chapter 9 "Operators on Inner Product Spaces" and got lost at the positive property on (sesqui-linear) forms, operators and matrices.

The confusing comes from that the definition of "positive" on matrix is different from forms or operators.

Definitions.

A form $f$ on a real or complex vector space $V$ is called Hermitian if $f(\alpha, \beta) = \overline {f(\beta, \alpha)}$ for all $\alpha$ and $\beta$ in $V$.

A form $f$ on a real or complex vector space $V$ is positive if $f$ is Hermitian and $f(\alpha, \alpha) > 0$ for every $\alpha$ in $V$ that $\alpha \ne 0$.

If $A$ is an $n \times n$ matrix with complex entries and if $A$ satisfies $$\tag{9.9} X^\intercal A X > 0, \forall X \in \mathbb R^n, X \ne 0$$ we shall call $A$ a positive matrix.

A linear operator $T$ on a finite-dimensional inner product space $V$ is positive if $T=T^*$ and $\langle T\alpha, \alpha \rangle > 0$ for all $\alpha$ in $V$.

Notice here "positive" of both form and operator are defined based on conjugate transpose, but positive of matrix is defined with transpose only. Also, positive of both form and operator are defined on "real or complex" vector spaces $V$, with $\alpha \in V$; but position of matrix is defined on complex vector space $V$ but the $X$ is defined in real space -- $\mathbb R^n$.

Then I got lost as he claims in page 329:

In either the real or complex case, a form $f$ is positive if and only if its matrix in some (in fact, every) ordered basis is a positive matrix.

Let me break this into 4 arguments:

(1) real vector space, $f$ is a positive form, then $[f]_\mathcal B$ is a positive matrix.

(2) real vector space, $[f]_\mathcal B$ is a positive matrix, then $f$ is a positive form.

(3) complex vector space, $f$ is a positive form, then $[f]_\mathcal B$ is a positive matrix.

(4) complex vector space, $[f]_\mathcal B$ is a positive matrix, then $f$ is a positive form.

The (1) and (3) seem ok; but (2) and (4), I'm lost: how to prove them?

(1): it is saying --

Let $f$ be a form on a real vector space, $\mathcal B$ an ordered basis, and $[f]_\mathcal B$ the matrix of $f$ under basis $\mathcal B$.

$f$ is a positive form, or, by definition, i) $f$ is Hermitian : $f(\alpha, \beta) = {f(\beta, \alpha)}$ for all $\alpha$ and $\beta$ in $\mathbb R^n$, and ii) $\forall \alpha \in \mathbb R^n, \alpha\ne 0$, $f(\alpha, \alpha) > 0$.

Then $[f]_\mathcal B$ is a positive matrix, or, by definition, $X^\intercal [f]_\mathcal B X > 0, \forall X \in \mathbb R^n, X \ne 0$.

This is easy to prove, $\forall \alpha \in \mathbb R^n, \alpha\ne 0, f(\alpha, \alpha)>0$ direct lead to $X^\intercal [f]_\mathcal B X >0, \forall X\in \mathbb R^n, X\ne 0$.

(3): it is saying --

Let $f$ be a form on a complex vector space, $\mathcal B$ an ordered basis, and $[f]_\mathcal B$ the matrix of $f$ under basis $\mathcal B$.

$f$ is a positive form, or, by definition, i) $f$ is Hermitian : $f(\alpha, \beta) = \overline {f(\beta, \alpha)}$ for all $\alpha$ and $\beta$ in $\mathbb C^n$, and ii) $\forall \alpha \in \mathbb C^n, \alpha\ne 0$, $f(\alpha, \alpha) > 0$.

Then $[f]_\mathcal B$ is a positive matrix, or, by definition, $X^\intercal [f]_\mathcal B X > 0, \forall X \in \mathbb R^n, X \ne 0$.

This is easy to prove, $\forall \alpha \in \mathbb C^n, \alpha\ne 0, f(\alpha, \alpha)>0$ direct lead to $X^\intercal [f]_\mathcal B X >0, \forall X\in \mathbb R^n, X\ne 0$.

(4): I have problem to prove it, which is saying --

Let $f$ be a form on a complex vector space, $\mathcal B$ an ordered basis, and $[f]_\mathcal B$ the matrix of $f$ under basis $\mathcal B$.

$[f]_\mathcal B$ is a positive matrix, or, by definition, $X^\intercal [f]_\mathcal B X > 0, \forall X \in \mathbb R^n, X \ne 0$.

Then $f$ is a positive form, or, by definition, i) $f$ is Hermitian : $f(\alpha, \beta) = \overline {f(\beta, \alpha)}$ for all $\alpha$ and $\beta$ in $\mathbb C^n$, and ii) $\forall \alpha \in \mathbb C^n, \alpha\ne 0$, $f(\alpha, \alpha) > 0$.

The proof is hinted on Hoffman's page 329, that:

$\forall X, Y \in \mathbb R^n$, let $Z = X + iY$, then $Z\in \mathbb C^n$, and: $Z^*A Z = (X+iY)^*A (X+iY) = (X^\intercal - iY^\intercal)A(X+iY)$ $= X^\intercal A X + Y^\intercal A Y + i(X^\intercal A Y - Y^\intercal A X)$.

If $A\in\mathbb R^{n\times n}$, $A = A^\intercal$, then $Y^\intercal A X = X^\intercal A Y$, furthermore, from $X^\intercal AX>0, \forall X\in \mathbb R^n, X\ne 0$ one can derive that $Z^*A Z>0$, $\forall Z \in \mathbb C^n, Z\ne 0$.

But this requires $A\in \mathbb R^{n\times n}$ and $A = A^\intercal$. However $[f]_\mathcal B \in \mathbb C^{n\times n}$, we can't use it as $A$.

Although there is a Principal Axis Theorem:

For every Hermitian form $f$ on $V$, there is an orthonormal basis of $V$ in which $f$ is represented by a diagonal matrix with real entries.

But again, we don't know if $[f]_\mathcal B$ is Hermitian, so can't use the Principal Axis Theorem to choose a $\mathcal B$ so that $[f]_\mathcal B$ is a diagonal matrix with real entries.

(2): I also have problem with it, which is saying --

Let $f$ be a form on a real vector space, $\mathcal B$ an ordered basis, and $[f]_\mathcal B$ the matrix of $f$ under basis $\mathcal B$.

$[f]_\mathcal B$ is a positive matrix, or, by definition, $X^\intercal [f]_\mathcal B X > 0, \forall X \in \mathbb R^n, X \ne 0$.

Then $f$ is a positive form, or, by definition, i) $f$ is Hermitian : $f(\alpha, \beta) = {f(\beta, \alpha)}$ for all $\alpha$ and $\beta$ in $\mathbb R^n$, and ii) $\forall \alpha \in \mathbb R^n$, $f(\alpha, \alpha) > 0$.

Seems i) Hermitian cannot be proved?

Actually Hoffman's book mentioned earlier on page 329 that:

If a real matrix $A$ satisfies (9-9), it does not follow that $A = A^\intercal$.

This is reasonable, as choose $A$ = $\begin{bmatrix} 1 & 0.3 \\ 0.1 & 1 \\ \end{bmatrix}$ , then for $\forall X = $ $\begin{bmatrix} x_1 \\ x_2 \\ \end{bmatrix}$, $X^\intercal A X = x_1^2 + 0.4x_1x_2 + x_2^2 = (x_1+0.2x_2)^2 + 0.96x_2^2 > 0$, but $A\ne A^\intercal$.

I'm lost here. Why does "positive" of a matrix is defined not based on "real or complex" vector space but "complex space"? Do (2) or (4) hold?

There are several equivalent definitions; in my course a matrix $A$ is positive iff $z^Az>0$ for any non-zero $z\in \Bbb C^n$. It would be logical to prove that this definition and 9.9 are equivalent. One can start with saying that $z=x+iy$ with $x,y\in\Bbb R^n$ and $z^=x^T-iy^T$. — TZakrevskiy, Apr 29 '15 at 20:01
@TZakrevskiy It sounds another approach to set up the theory? Seems that doesn't help prove (2) or (4)? — athos, Apr 30 '15 at 14:09

user126154 · Answer 1 · 2015-05-05T15:53:02.717

In my opinion there is a misprint, either in the book or in your "cut-and-paste". Indeed, the condition $$X^TAX>0\quad \forall X\in\mathbb R^n$$

gives information only on the symmetric part of $A$. In particular, you can add any asymmetric matrix you want and you preserve that property. For instance

$$A=\begin{pmatrix}1&10i\\-10i&1\end{pmatrix}$$

Is a matrix such that $X^TAX>0$ for any $X\in\mathbb R^n$, and the form it defines is hermitian but not positive. For instance, if $X=(1,i)$ you have $$X^TAX=-18$$

This provides a counterexample to $(4)$. For a counterexample of $(2)$ take a symmetric positive definite matrix as $\begin{pmatrix}2&1\\1&2\end{pmatrix}$ and add an antisymmetric matrix, for instance $\begin{pmatrix}0&1\\-1&0\end{pmatrix}$, you get $$B=\begin{pmatrix}2&2\\0&2\end{pmatrix}$$

If $X=(x,y)$ you have $X^TBX=2x^2+2xy+2y^2=x^2+y^2+(x+y)^2\ge 0$, so $B$ is positive, but the form it defines, is not symmetric. For instance if $V=(1,0)$ and $W=(0,1)$ we have $$V^TBW=2\qquad W^TBV=0.$$

In other words, with the definitions as written in the question, both $(2)$ and $(4)$ are false.

score 0 · Answer 2 · edited Apr 13 '17 at 12:21

A clue is given in the statement: "... its matrix in some..."

See also this post: Are positive definite matrices necessarily diagonalizable and when does the famous eigenvalue criterion apply? , especially the last paragraph of the accepted answer.

edit 1: Note that change of basis when referring to the "matrix of a (sesqui-linear) form" is obtained through hermitian congruence. So $A=P^*BP$ means $A$ and $B$ are hermitian-congruent (Note $P^*$ is not necessarily the inverse of $P$). So what you need to prove is this: if a matrix $A$ is positive, does that imply it is hermitian-congruent to a hermitian matrix, which is then $[f]_{\mathcal{B}}$ for some positive form $f$?

edit 2: Let's take the matrix you use as an example at the end, there is a basis: $\{(-1/\sqrt{2},1/\sqrt{2}),(i/\sqrt{2},-i/\sqrt{2})\}$ relative to which the matrix of $A$ is \begin{equation}\begin{bmatrix}4/5 & -4i/5 \\ 4i/5 & 4/5 \end{bmatrix}, \end{equation} which is clearly Hermitian. So if we take (2) - you had a positive matrix $A$ - it is congruent to the matrix above which is hermitian and also positive: so there is some ordered basis in which we can see that it is the matrix of some positive form $f$. This is the basic principle...so if Hoffman's claim is true, then this is always true: for every positive matrix we can find a basis so that it is congruent to a hermitian matrix. If Hoffman's claim is false you would be able to find some counter example. Unfortunately I do not have a reference where someone has proved this, and unfortunately I cannot spend more time on this now - maybe someone else here can do it for you - but you will learn and remember most if you work it out for yourself.

Furthermore, referring to your last question, a sesquilinear form is, to put it casually, "geared towards" complex vector space as it defines the inner product on complex space properly...a normal symmetric bilinear form does "not work" as it does for real space - you need the "conjugate linearity" in one term. And the sesquilinear form used to define inner products on complex space reduces to the normal dot product on real spaces, so it is a proper generalization for the cause...so I think the proper context for the chapters you are reading is in fact complex vector space.

I don't quite understand your opinion. do you think (2) or (4) hold? — athos, Apr 30 '15 at 14:50
are you sure? Let $B = \begin{bmatrix} 1+i/10 & i/5 \ -i/5 & 1-i/10 \ \end{bmatrix}$, is $B$ Hermitian? $B^* = \begin{bmatrix} 1-i/10 & i/5 \ -i/5 & 1+i/10 \ \end{bmatrix}$, apparently $B^* \ne B$. — athos, May 05 '15 at 05:01
Could you pls answer the question directly -- that do you think (2) or (4) hold or not? — athos, May 05 '15 at 05:34
I can say that I "think" (2) and (4) both hold, as in my testing of a few examples I could not find a counter example. I also believe that it is most probable that both hold, since change-of-basis, which in this case is (hermitian) matrix congruence preserves positive definiteness (there are proofs available for this) - and I think it is highly likely to find a hermitian matrix which is congruent to a non-hermitian positve matrix - all the examples I tried, it worked. But I cannot assert that it is true for all cases without a proof...the aim of my answer was to guide you with what I do know. — Christiaan Hattingh, May 05 '15 at 07:58

score 0 · Accepted Answer · answered May 18 '15 at 02:51

To answer my own question after I read through that section again last weekend.

I believe Hoffman's book has a typo that when defining positive matrix:

If $A$ is an $n \times n$ matrix with complex entries and if $A$ satisfies (9.9) we shall call $A$ a positive matrix.

It is actually referring to (9.8), which is $$\tag{9.8} X^* A X > 0, \forall X \in \mathbb C^n, X \ne 0$$

The reasoning is as below, where I use $F$ to denote either $\mathbb R$ or $\mathbb C$, assume all $X\ne 0$, using $f>0$ to denote positive forms, and $A>0$ to denote positive matrix.

$f_{\mathbb F}>0$ is defined as $X_{\mathbb F}^*A_{\mathbb F}X_{\mathbb F}>0$ and $A_{\mathbb F}^* = A_{\mathbb F}$.

In $\mathbb C$ ---

$f_{\mathbb C}>0$ definition translates into $X_{\mathbb C}^* A_{\mathbb C} X_{\mathbb C} > 0$, as it implies $A_{\mathbb C}^* = A_{\mathbb C}$ already.

So $A_{\mathbb C}>0$ can be defined as $X_{\mathbb C}^*A_{\mathbb C}X_{\mathbb C}>0$, as it implies $A_{\mathbb C}^\intercal = A_{\mathbb C}$.

In $\mathbb R$ ---

$f_{\mathbb R}>0$ definition translates into $X_{\mathbb R}^\intercal A_{\mathbb R} X_{\mathbb R} > 0$ and $A_{\mathbb R}^\intercal = A_{\mathbb R}$. Notice $X_{\mathbb R}^\intercal A_{\mathbb R} X_{\mathbb R} > 0$ $\nRightarrow$ $A_{\mathbb R}^\intercal = A_{\mathbb R}$.

However $X_{\mathbb R}^\intercal A_{\mathbb R} X_{\mathbb R} > 0$ and $A_{\mathbb R}^\intercal = A_{\mathbb R}$ imply $X_{\mathbb C}^* A_{\mathbb R} X_{\mathbb C} > 0$. So $A_{\mathbb R}>0$ can be defined as $X_{\mathbb C}^*A_{\mathbb R}X_{\mathbb C}>0$, as it implies $X_{\mathbb R}^\intercal A_{\mathbb R} X_{\mathbb R}>0$ and $A_{\mathbb R}^\intercal = A_{\mathbb R}$.

--- Hence for $F$ as either $\mathbb R$ or $\mathbb C$, $A_F>0$ can be defined as $X_{\mathbb C}^* A_F X_{\mathbb C}>0$. And $A_F>0$ iff $f_F>0$.

Why a form is positive only if its matrix in some ordered basis is a positive matrix?

3 Answers3