1

I've been reading a proof that the reduced row-echelon form of a given matrix is unique, but there was one part that made me wonder.

This step of the proof shows that if $B$ and $C$ are row-equivalent and in reduced row-echelon form, where $r$ is the number of non-zero rows in $B$ and $r'$ is the same for $C$, then $r = r'$. Note that $d_k$ is the column of the $k^{th}$ pivot in $B$, and $d'_k$ is the same in $C$. Previously, it was proved that $d_k = d'_k$, which makes sense to me - this proof assumes that.

Without further ado, the proof (paraphrased from the University of Puget Sound's free textbook on Linear Algebra). If my annotations below are too convoluted, refer to page 34 here.

Suppose $r' < r$. For $1 \leq \mathscr{l} \leq r'$, we have $[B]_{rd_\mathscr{l}} = 0$, as $[B]_{kd_\mathscr{l}} = 0$ iff $k = \mathscr{l}$. Because the rows of $B$ (including row $r$) are a linear combination of those of $C$, we have $0 = [B]_{kd_\mathscr{l}} = \sum_{k=1}^{m} \delta_{rk} [C]_{kd_\mathscr{l}}$ where $\delta_{ik}$ is the coefficient row $k$ in $C$ is multiplied by to contribute to row $i$ in $B$.

We can decompose this sum as $\sum_{k=1}^{r'} \delta_{rk} [C]_{kd_\mathscr{l}} + \sum_{k=r' + 1}^{m} \delta_{rk} [C]_{kd_\mathscr{l}}$, and since $[C]_{kd_\mathscr{l}} = 0$ for $k > r' \geq \mathscr{l}$, we can drop the second sum, leaving $\sum_{k=1}^{r'} \delta_{rk} [C]_{kd_\mathscr{l}}$.

Since we know $d_k = d'_k$, this becomes $\sum_{k=1}^{r'} \delta_{rk} [C]_{kd'_\mathscr{l}}$. Pulling out a single term, we have $\delta_{r\mathscr{l}}[C]_{kd_\mathscr{l}}$ + $\sum_{k=1, k \neq \mathscr{l}}^{r'} \delta_{rk} [C]_{kd'_\mathscr{l}}$. Because $[C]_{kd_\mathscr{l}} = 1$ iff $k = \mathscr{l}$, and otherwise equals $0$, the previous expression reduces to $\delta_{r\mathscr{l}}(1)$ + $\sum_{k=1, k \neq \mathscr{l}}^{r'} \delta_{rk}(0) = \delta_{r\mathscr{l}}.$ Thus, $\delta_{r\mathscr{l}} = 0$...

This proof makes sense to me (especially after drawing a diagram), but I wonder why we pull out the sum from $r + 1$ to $m$ in the second paragraph. It's perfectly fine, but wouldn't the proof be shorter like this?

Suppose $r' < r$. For $1 \leq \mathscr{l} \leq r'$, we have $[B]_{rd_\mathscr{l}} = 0$, as $[B]_{kd_\mathscr{l}} = 0$ iff $k = \mathscr{l}$. Because the rows of $B$ (including row $r$) are a linear combination of those of $C$, we have $0 = [B]_{kd_\mathscr{l}} = \sum_{k=1}^{m} \delta_{rk} [C]_{kd_\mathscr{l}}$ where $\delta_{ik}$ is the coefficient row $k$ in $C$ is multiplied by to contribute to row $i$ in $B$.

Since we know $d_k = d'_k$, this becomes $\sum_{k=1}^{m} \delta_{rk} [C]_{kd'_\mathscr{l}}$. Pulling out a single term, we have $\delta_{r\mathscr{l}}[C]_{kd_\mathscr{l}}$ + $\sum_{k=1, k \neq \mathscr{l}}^{m} \delta_{rk} [C]_{kd'_\mathscr{l}}$. Because $[C]_{kd_\mathscr{l}} = 1$ iff $k = \mathscr{l}$, and otherwise equals $0$, the previous expression reduces to $\delta_{r\mathscr{l}}(1)$ + $\sum_{k=1, k \neq \mathscr{l}}^{m} \delta_{rk}(0) = \delta_{r\mathscr{l}}.$ Thus, $\delta_{r\mathscr{l}} = 0$...

Note that the second paragraph is now gone, and in the third paragraph $r'$ in the upper limit of sums has been replaced with $m$. My question is ultimately: is my shorter proof correct? If so, why wouldn't the proof be presented this way in the first place?

  • 2
    Tangential comment: I don't think it's very important to know that the RREF of a matrix is unique. Do we need this fact for anything? – littleO Aug 30 '16 at 21:44
  • 1
    A couple of notational oddities. a) Beezer's $[A]{mn}$ is not a common notation, as far as I know, for the element at row $m$ and column $n$ of matrix $A$. $A{mn}$ is more usual. b) You use $\mathscr{l}$ instead of Beezer's $\ell$. c) It is common for $\delta_{ik}$ to be used to mean $1$ for $i=k$ and $0$ otherwise, but Beezer isn't using that notation; he gives $\delta_{ik}$ a different meaning. – ForgotALot Sep 03 '16 at 22:59
  • 1
    I hesitate to say this, but in my opinion there is no way Beezer's long proof is suited for the early pages of a first course in linear algebra. This is of course my personal taste speaking. I like Lang's Linear Algebra. – ForgotALot Sep 03 '16 at 23:20
  • @ForgotALot - thank you so much! I'll dig into that - the notation is quite bizarre. –  Sep 04 '16 at 01:37
  • 2
    @littleO: the fact that the redu8ced form is unqiue gives a unique representation for the orbit of row equiovalent matrices, and thus for the set of matrices with the same null space, and thus set up a one one correspondence between all reduced forms of given rank and all suspaces of a given dimension, i.e. they allow one to coordinatize the "grassman" variety of p dimensional subspaces of n space. In fact the choice say of which columns to have as pivot columns say for a rank 2, 2 by 4 matrix, defines the stratiication by schubert cells. fix a plane ∏, line L in ∏, and point P on L, in P^3. – roy smith Feb 10 '17 at 21:35
  • 2
    Then the reduced matrices with first two columns as pivots correspond to lines in P^3 that do not meet the line L, and these thus are seen to form an affine 4 dimensonal subset of the grassman......So one gets both an open affine cover of the grassman manifold, by relaxing the echelon condition, but keeping the reduced condition, and by keeping the reduced echelon condition, one gets a stratification, i.e. disjoint cover, by affine sets of varying dimensions. This generalizes to all grassmannians. – roy smith Feb 10 '17 at 21:39
  • 1
    counting the number of free variables in a 2 by 4 reduced echelon matrix of rank 4 for each choice of the 2 pivot columns, e.g., shows that G(2,4) has a stratification with one 4 cell, one 3 cell, two 2 cells, one 1-cell and one zero cell. It follows immediately that the euler characteristic of G(2,4) is 2, the alternating sum of the number of cells of various dimensions. (I don't see immediately how to compute the attaching maps and hence the homology groups.) – roy smith Feb 11 '17 at 22:46

1 Answers1

1

Heres a proof I made up when teaching the class. Maybe it will interest someone. Basically the matrix A determines its null space, which in turn equals the graph of a linear transformation whose matrix is the negative of the unknown part of the reduced echelon form. QED.

more detail:

Conceptual proof of uniqueness of reduced echelon form:

It is fundamental that a matrix and its reduced echelon form have the same null space. Indeed that is the reason reduced echelon forms are useful for finding the null space of the original matrix. I claim that null space determines entirely the reduced echelon form. For simplicity take the case where the “pivot” columns all appear first, followed by the non pivot columns. (Note that a column is a pivot column if and only if it does not depend linearly on earlier columns. I.e. this is obvious for a reduced echelon matrix and hence also true for the original matrix, since the columns of both matrices satisfy exactly the same relations.)

If the matrix A is n by m with rank r, then the reduced echelon form has its last n-r rows equal to zero and its upper left r by r block equal to an identity matrix. Thus it suffices to show that the null space characterizes the remaining upper right (r) by (m-r) block. Such a block of course determines, and is determined by, a unique linear transformation from m-r space to r space. Further, that linear transformation is determined by its graph, an m-r dimensional linear subspace of m space.

That subspace, i.e. that graph, except for a minus sign, is precisely the null space. I.e. from looking at the reduced echelon form one can see that the negative of the upper right r by (m-r) block is exactly the matrix of the linear map whose graph is the null space. Looked at another way, if A is the given matrix, the equation AX= 0 determines implicitly a linear function from m-r to r space whose matrix is the negative of the upper right r by (m-r) block of the reduced echelon form of A.

Note: if f is my function, I have written the graph entries here in the order (f(t),t) rather than (t, f(t)).

In general, the null space determines a linear map from the coordinate subspace spanned by the non pivot variables to that spanned by the pivot variables, whose matrix columns are (when augmented at the bottom by zeroes) exactly minus the sequence of non pivot columns of the reduced echelon form of A. Since both the location and the content of the pivot columns are known, the reduced form is determined by the null space.

(I hope I said this right, I'm doing this in my head.)

E.g. if the reduced form has two non zero rows (1 0 3 4), (0 1 5 2)

this says the upper right 2 by 2 part with rows (3 4), (5 2), is minus the matrix of the linear map (x,y) = f(z,w), where x = -3z -4w, y = -5z -2w. The graph of this map f is the plane in 4 space spanned by the vectors (-3, -5, 1, 0), (-4, -2, 0, 1), which is the null space of the matrix.

Oops, I see my definition of "graph" is backwards from the usual, in that my entries appear in the opposite order, i.e. my graph of (x,y) = f(z,w) has entries (f(z,w), (z,w)) = ((x,y), (z,w)), instead of ((z,w), f(z,w))

roy smith
  • 1,502