15

The simple proof goes:

Let B be the left inverse of A, C the right inverse.

C = (BA)C = B(AC) = B

This proof relies on associativity yet does not give any insight as to why this surprising fact about matrices is true.

AC means a bunch of linear combinations of of columns of A. CA means a bunch of linear combinations of rows of A. Completely different numbers get multiplied in each case.

The proof above is just a bunch of syntactical steps not having much to do with matrices directly, I cannot see how CA=AC=I.

Can anyone shed some light on this?

Leo
  • 1,539
  • My point exactly. This general property hides the reason it works for matrices specifically. There must be a way to look at this from a matrix perspective. – Leo Oct 20 '11 at 18:57
  • You have your "left" and "right" reversed. If $B$ is the right inverse of $A$, then $AB=I$; if $C$ is the left inverse of $A$, then $CA = I$. – Arturo Magidin Oct 20 '11 at 19:03
  • 12
    No, you are missing the forest for the trees: It works for matrices specifically because it works for any function, and matrices are just one kind of function. Matrices are a special case, so it works "for matrices specifically" because it always works for functions. It's not "hiding the reason it works for matrices", it's explaining the reason it works for matrices: because matrices are "really" functions, and this is a property of functions. – Arturo Magidin Oct 20 '11 at 19:04
  • 14
    More interesting is why a square matrix is left-invertible iff it is right-invertible. – Yuval Filmus Oct 20 '11 at 19:09
  • @Arturo Magidin - This sheds no light over why the same "recipe" works both on rows and columns of A to create I. I'm not asking about the forest, but about the tree. Can you honestly say you "understand" why the inverse C combines both rows and columns of A into I. All you have right now is a mechanical syntactic proof. – Leo Oct 20 '11 at 19:06
  • 3
    You keep missing my point: I "understand" because I know that $A$ is really a function, so as soon as you know that it has inverses on both sides, the inverses have to be the same one; it's not about "recipes", it's about correspondences: the matrix $A$ is a one-to-one onto correspondence; the matrix $C$ is the exact same correspondence "pointing the other way". So of course if you first go and then come back along the same route you will not have moved, and likewise, if you first come and then go back along the same route you will not have moved. – Arturo Magidin Oct 20 '11 at 19:17
  • What you describe sounds like commutativity which isn't true for matrices. You should talk in associativity terms – Leo Oct 20 '11 at 19:23
  • 1
    @Leo: To answer your question, to understand this you should first justify why matrix multiplication is associative and why matrices with non-zero determinant have an inverse in the first place. If you you justify these out then it will be clear to you as to why the left and right inverses are the same. –  Oct 20 '11 at 19:25
  • I believe I have them well understood but I can't seem to "forge" the 2 understandings into this left\right equality. Can you give me a push? – Leo Oct 20 '11 at 19:27
  • 7
    @Leo: If you think I'm talking commutativity, then you aren't understanding what I'm saying. – Arturo Magidin Oct 20 '11 at 19:31
  • 2
    FWIW: I would love this question if it asked why (intuitively) a left inverse of a square matrix implies a right inverse. That left and right inverses are equal (if they both exist) definitely strikes me as a function thing. – Jack Schmidt Oct 20 '11 at 19:43
  • @JackSchmidt You'd probably be interested in http://math.stackexchange.com/questions/110336/proving-that-a-right-or-left-inverse-of-a-square-matrix-is-unique-using-only-b#comment257858_110336 – mlvljr Feb 18 '12 at 07:17

2 Answers2

23

In fact, this isn't about matrices per se, but about inverses in general, and perhaps more specifically about inverses of functions. The same argument works for any function that has a left and a right inverse (and for elements of a monoid or ring, though these can also be interpreted as "functions" via an appropriate setting).

If you really want to try to understand the proof in terms of "meaning", then you should not think of matrices as a bunch of columns or a bunch of numbers, but rather as functions, i.e., as linear transformations.

Say $A$ is an $m\times n$ matrix; then $A$ is "really" a linear transformation from $\mathbb{R}^n$ to $\mathbb{R}^m$: the columns of $A$ are the images of the standard basis vectors of $\mathbb{R}^n$ under the transformation. If $B$ is a right inverse of $A$, then $B$ is $n\times m$, and $AB$ acts like the identity transformation on $\mathbb{R}^m$. In particular, $AB$ has to be onto, so the rank of $AB$ is $m$; since the rank of $AB$ is at most the rank of $A$, then the rank of $A$ has to be $m$; since the rank of $A$ is at most $\max(m,n)$, then $m\leq n$. This tells us that $A$ is onto (full rank), and that it has at least as many columns as it has rows.

If $C$ is a left inverse of $A$, then $C$ must be an $n\times m$ matrix, and $CA$ acts like the identity on $\mathbb{R}^n$. Because $CA$ is one-to-one, then $A$ has to be one-to-one. In particular, it's nullspace is trivial. That means that it cannot have more columns than rows (that would require a nontrivial nullspace, by the Rank-Nullity Theorem); since it has at least as many columns as it has rows, $A$ has exactly the same number of columns as rows, so $m=n$.

Moreover, $A$ is now one-to-one and onto. So it is in fact bijective. So it is in fact invertible. Invertible matrices have unique inverses by definition, so $B$ and $C$ have to be equal: they have no choice in the matter. It isn't about the details of the "recipe", it's about the properties of functions: once you have a function that is one-to-one and onto, it has an inverse and the inverse is unique.


I honestly think that trying to puzzle out the details of the "recipe" is not insightful here: it is staring at the bark of a single tree instead of trying to appreciate the forest.

But if you must (and I really think you shouldn't), then you want to realize that $AC$ and $CA$ are talking in a different language: the columns of $A$ specify a basis, $\gamma$, and tell you how to express the elements of $\gamma$ in terms of the standard basis $\beta$; it provides a "translation" from $\gamma$ to $\beta$. That is, $A=[\mathrm{Id}]_{\gamma}^{\beta}$. The inverse, $C$, explains how to express the elements of the standard basis $\beta$ in terms of the vectors in $\gamma$, $C=[\mathrm{Id}]_{\beta}^{\gamma}$. $AC$ talks in the language of the standard basis $\beta$, $CA$ talks in the language of $\gamma$. Then it becomes clear why "the same recipe" (not really) should work. It's not really the same recipe, because in $CA$ you "hear" vectors in $\gamma$, translate them into $\beta$, and then translates them back in to $\gamma$. But in $AC$ you "hear" vectors in $\beta$, translate them into $\gamma$, and then back into $\beta$. The "translation recipes" are the same, whether you do $\beta\to\gamma$ first or you do it second (translating English to Russian is the same, whether you are translating something written originally in English into Russian, or something that was translated into English first).

$A$ establishes a bijection between the vectors expressed in terms of $\gamma$, and the vectors expressed in terms of $\beta$. $C$ is the bijection going "the other way". Both $AC$ and $CA$ are the identity, but they are the identity of slightly different structures: $AC$ is the identity of "$\mathbb{R}^n$-with-basis-$\beta$", and $CA$ is the identity of "$\mathbb{R}^n$-with-basis-$\gamma$." Only when you forget that $AC$ and $CA$ are being interpreted as matrices on vector-spaces-with-basis do you realize that in fact you have the same "formula" for the expressions (i.e., that both matrices are "the" identity matrix).

If you want to stare at the bark so intently that you must think of matrices in terms of "linear combinations of rows" or "linear combinations of columns", you are going to miss a lot of important properties for matrices. Matrices are really functions; multiplication of matrices is really composition of functions. The aren't a bunch of vectors or a bunch of numbers thrown into a box that you multiply based on some ad hoc rule. They are functions.

Compare how easy, and, yes, intuitive, it is to realize that matrix multiplication is associative because it is just "composition of functions", vs. figuring it out by expanding the double summations of an entry-by-entry expression of $(AB)C$ and $A(BC)$: you are going in the wrong direction for 'intuitive' understanding. Staring at those summations will not tell you why multiplication is associative, it will only tell you it is. Trying to puzzle out why a right inverse is also a left inverse in terms of "adding columns" and "adding rows" is not going to help either: you need to think of it in terms of functions.

Arturo Magidin
  • 398,050
  • 1
    Haha, I think your answer to my question is "It is not intuitive that AC=CA." Unfortunately, I find your 3rd to last paragraph convincing and shall abandon my childhood dreams. – Jack Schmidt Oct 20 '11 at 20:27
  • @Jack: I guess it's the same kind of fudging as we do when we say that two matrices are similar if "they represent the same linear transformation"; if you try to think of them in terms of "linear transformations on $\mathbb{R}^n$" then that doesn't really make sense. – Arturo Magidin Oct 20 '11 at 20:36
  • @Jack: I've been trying to think if I can come up with something. Perhaps one can do it "non-canonically" through the dual space, at least in the real case: if we think of an invertible matrix $A$ as describing, via its columns, how to express vectors of a basis $\gamma$ in terms of the standard basis, then the rows of $A$ will describe how to express vectors of the dual of the standard basis in terms of the dual basis $\gamma^*$. Perhaps it's possible to carry it through in that light. – Arturo Magidin Oct 21 '11 at 03:02
  • But a matrix can represent many other things too: a description of the adjacency relations in a graph, a system of linear equation, the transition states in a Markov process, etc. Does your interpretation make sense when extended to all these other cases? – gary Oct 21 '11 at 03:30
  • @gary: "Transitions in a Markov process" amounts to application of a function, so yes. If you interpret the matrix as being "adjacency relations in a graph", then you also need to interpret what "multiplication of matrices" will represent in that setting. In that setting, two distinct matrices represent two distinct sets of edges, and "multiplying" the matrices $AC$ gives you, in the entry $a_{ij}$, the number of paths from $i$ to $j$ using first the edges in set $C$, then in set $A$. Think about what $AC=I$ would mean, and you'll see it again corresponds to a one-to-one correspondence. – Arturo Magidin Oct 21 '11 at 03:56
  • @gary: (cont): For systems of linear equations, they "really" represent the corresponding linear transformations as well (the linear equations correspond to the "coordinate funtions" of the linear transformation), so you are back in the original interpretation of "matrices-are-really-functions". – Arturo Magidin Oct 21 '11 at 03:58
  • @Arturo Magidin - Thank you for your answer. This was the view I was looking for. First a quick question: "Invertible matrices have unique inverses by definition". Can't seem to find that definition for left and right inverses being unique. I think this demands a short proof. Second, I am not going to embrace the "stare at the tree" approach. The function view is indeed an important one and I'll be sure to utilize it. – Leo Oct 23 '11 at 17:32
  • Yet, before you can declare matrices to be functions and matrix multiplication to be compositions of function you must prove it by expanding those double summations you seek to avoid. You have to pass through the "bunch of numbers and summations" phase, otherwise you have no foundation to call matrices a function and you aren't allowed to build your intuition just yet. Third, though function view is an important one, it is a generalized theory and you might miss some properties that matrices have as a specific model for that theory. – Leo Oct 23 '11 at 17:32
  • So I believe one shouldn't limit himself to a single view and pick the correct one according to need. I'm sure the folks at Matlab for example need to dig inside those summations sometimes and get out of the "it's a function" zone to implement their algorithms efficiently and stably. Your comments are welcome. – Leo Oct 23 '11 at 17:32
  • I take it MY answer wasn't much help either,Leo? – Mathemagician1234 Oct 23 '11 at 17:57
  • 2
    Sorry, I was looking for the model-specific explanation. Axioms and theory are what I tried escaping from. Thanks for the effort though. – Leo Oct 23 '11 at 18:11
  • @Leo: I've merged your accounts and moved your post to the comments here. – Zev Chonoles Oct 23 '11 at 18:24
  • @Leo: 1. The definition of "invertible matrix" may vary. In many of my sources, the definition itself includes a uniqueness clause. 2. The fact that matrices correspond to linear transformations does not require any kind of expansion. The fact that matrix multiplication corresponds to composition is done by definition of matrix multiplication. But this definition does not require the kind of expansion and re-indexing of double summations that is required in order to prove associativity of matrix multiplication from the formulas. The definition of multiplication only requires a single sum. – Arturo Magidin Oct 23 '11 at 19:25
  • @Leo. In any case, the point is that understanding why multiplication of matrices is associative comes from recognizing multiplication of matrices as being composition of functions, and not from staring at the manipulation of double-indexed sums and the necessary swap of summation of order. – Arturo Magidin Oct 23 '11 at 19:27
  • I cannot agree until it is proven that composition is multiplication of matrices. Without further proof you could claim that subtraction operation can be thought of as a function a(x) = a-x and thus it is associative, but it isn't. To prove matrix multiplication is a composition you must show : for any 2 matrices A,B and vector x: A(B(x)) = (AB)(x) meaning A(Bx)=(AB)x. But this is exactly associativity with double summation.

    If you can show me a proof that matrix multiplication is composition without double summation I'd be happy to learn.

    – Leo Oct 23 '11 at 21:03
  • @Leo: The "standard" multiplication of matrices is defined precisely so that it corresponds to composition of linear transformations. You first establish how to "translate" from a matrix to a linear transformation and vice-versa, which is: apply the linear transformation to the standard basis, and express the answers in terms of the standard basis (transformation->matrix); and given a matrix, the image of the $i$th vector of the standard basis is the linear combination given by the $i$th column of the matrix. (matrix->transformation). (cont) – Arturo Magidin Oct 23 '11 at 21:11
  • @Leo (cont). Then you take two linear transformations, $T$ and $S$, and express $T(e_i)$ and $S(e_i)$ in terms of the standard basis. Then you see what $S(T(e_i))$ is; this only involves a single summation. Namely, if $T(e_i) = a_{1i}e_1+\cdots a_{ni}e_n$ and $S(e_j) = b_{1j}e_1+\cdots+b_{nj}e_j$, then the coefficient of $e_j$ in $ST(e_i)$ is $\sum_{k=1}^n b_{jk}a_{ki}$. This is used to define matrix multiplication. You never need to check "A(B(x)) = (AB)x", because AB is defined precisely so this holds. – Arturo Magidin Oct 23 '11 at 21:14
  • @Leo (cont): In other words, you don't define matrix multiplication in a vacuum, you define it in the context of "matrices-are-linear-transformations", and then you never have to check that matrix multiplication "corresponds" to composition: matrix multiplication is composition. – Arturo Magidin Oct 23 '11 at 21:15
  • In the same manner I would like to define subtraction operation as functions. Instead of looking at numbers a or b as a separate entity, I'll say they are a transformations and express them in terms of x but you can imagine the standard basis going in there(being simply the number 1 here) a(x) = a-x. b(x) = b-x. Then I'll define a composition ab(x) = (a-b)-x. Walla, I looked at numbers+subtraction as functions, defined their composition and so they are associative. Wrong, I did not prove that my definition equals the actual a(b(x)) function. Neither did you, to prove it you have to double sum. – Leo Oct 23 '11 at 23:33
  • 1
    @Leo: If you really want to do a parallel, then you would have to do the following: for each real you have a function $f_a(x)$, given by $f_a(x) = x-a$. You then want to define an operation on real numbers, $a\Box b$, so that $f_a\circ f_b = f_{a\Box b}$. In order to do this, we compute $f_a\circ f_b(x) = f_a(f_b(x)) = f_a(x-b) = (x-b)-a = x-(b+a)$. Then the operation we define is $a\Box b = b+a$. If we do this, then associativity of $\Box$ follows from the fact that it is defined as composition of functions. – Arturo Magidin Oct 23 '11 at 23:36
  • @Leo: That is what I outlined for multiplication of matrices, not your inaccurate parallel. Now, if you want to discuss what I actually said, instead of a strawman that you will erect to "prove" your point, then let me know. Otherwise, have a nice day. – Arturo Magidin Oct 23 '11 at 23:38
  • Thanks for the quick response. In your example above, prior to defining composition you calculated f1(f2(x)) and then extracted an f3(x) that has the same effect as the composition. I very much agree. But when defining composition of matrices you skipped this step, no verification has been made. Your definition shows out of the blue. I tried looking at S(T(ei)) but then I found myself double summing to extract your definition [actually proving (AB)ei = A(Bei) which is the same]. So how can you say double summing is avoidable? Without it I have no foundation to trust the definition. – Leo Oct 24 '11 at 00:05
  • @Arturo Magidin - whatever your response is, I don't expect us to continue grinding on these unimportant details. Thank you for your time and effort. Your answer about the "recipe" has been insightful, even though you didn't want me to go there :). – Leo Oct 24 '11 at 00:14
  • @Leo: You have a family of functions which is closed under composition (in your example, the translations $x\mapsto x-a$; in the original setting, linear transformations). You also have a way to assign to each function one and only one parameter (in your example, the real number $a$; in the original setting, the standard matrix). You then define an operation $\Box$ on the set of paramaters so that $a\Box b$ is defined as follows: "find the function $f$ corresponding to $a$, the function $g$ corresponding to $b$, and define $a\Box b$ to be the parameter of $f\circ g$." – Arturo Magidin Oct 24 '11 at 01:32
  • @Leo: Under this definition, the fact that $\Box$ is associative is immediate from the definition. In your example, the error/where the analogy breaks down is that you simply asserted that because $f_a$ is defined via "subtraction", then $\Box$ would be subtraction. But in fact, it turns out to be addition. – Arturo Magidin Oct 24 '11 at 01:35
  • @Leo: For linear transformations, the formula for the $(i,j)$ entry of $AB$ is deduced from the fact that we want $AB$ to be the matrix of the composition. You don't "prove" $(AB)x = A(B(x))$, because the definition of "$AB$" is "the only matrix that fits in the equation to make it true". If you want to obtain the summation formula, you don't actually need to play around with double summations the way you do when you prove associativity directly. While trying to evaluate $A(B(e_i))$ can be done by doing a double summation, you don't need to reindex and play around. – Arturo Magidin Oct 24 '11 at 01:38
  • @Leo: Moreover, you don't actually need to do a double summation, because you don't actually need to compute $A(B(e_i))$ to find $(AB){ji}$. You only need to figure out the $j$th component of $A(B(e_i))$. To figure this, you need the $j$th component of $A(e_k)$ for each $k$, and the $k$th component of $B(e_i)$ for each $k$. This gives the $\sum\limits_k a{jk}b_{ki}$ formula. – Arturo Magidin Oct 24 '11 at 01:47
1

Arturo's response above is a very good one-as usual-but here's another way to approach the problem: Since the set of invertible n x n matrices is a non-commutative group under matrix multiplication, what you're really asking is why the right inverse of every element in a group is also a left inverse. This property is really true because the set of invertible matrices is a group and it's true of groups in general.

If you're unfamiliar with algebra, a group is defined as follows: A group is a set, G, together with an binary operation (i.e. a function from GxG into G) • (called the group law of G) that combines any two elements a and b to form another element, denoted a • b or ab. To qualify as a group, the set and operation, (G, •), must satisfy four requirements known as the group axioms: i)Closure: For all a, b in G, the result of the operation, a • b, is also in G. ii)Associativity:For all a, b and c in G, (a • b) • c = a • (b • c). iii)Identity element:There exists an element e in G, such that for every element a in G, the equation e • a = a • e = a holds. The identity element of a group G is often written as 1 or 1G, a notation inherited from the multiplicative identity. iv)Inverse element: For each a in G, there exists an element b in G such that a • b = b • a = 1. It's pretty easy to prove the set of all invertible n x n matrices is a group under matrix multiplication. Notice it's usually assumed in the definition that the identity is unique and 2 sided and each inverse of every element is 2 sided and unique to the element. However, it's not necessary to do so. One can modify the axioms of a group so it's not assumed the inverses are 2 sided: i.e. Define a group G to be nonempty set closed under a associative binary operation * such that 1) there exists at least one element e for all elements a in G such that e*a = a and 2) for every element a, there is at least one element b such that b*a= e. You can of course, alternatively define an identity on the right and inverses on the right. The point here is to not assume the 2 sided-ness of the identity and the inverses and you should try and prove that every left identity(inverse) must also be a right.You then prove that each left is unique and so is every right-therefore, they're equal. It's a bit tedious, but I think you'll find it to answer your question in a decisive fashion.

Another way-consider the following:Modify the above definition of a group so that e is a LEFT identity and for every element, there is a right inverse. Is the resulting set under a binary operation a group? It turns out the answer is NO. For a counterexample, define a*b=b for all a and b in the group. Then we can pick any e to be the left identity of all the elements. Similarly, any b has the right inverse e because b*e=e.In the trivial case of a group-a single element e-this is clearly a group. BUT if there is more than one element, the result is NOT a group: If a*x=a and b*x=b, then x=a and x=b, which obviously can't hold in the general case because there is no single (two-sided) identity element!

Do the exercise above in the alternate definition of a group with the weaker axioms-this plus the counterexample should clear the fog of your confusion.

  • 4
    I don't understand your point. What is the definition of invertible matrix you're using? If you want to check the weaker axioms only requiring one-sided inverses (identity and associativity are clear anyway) then you actually have to prove something: if $A$ is a right invertible matrix and $B$ is its right inverse then why does $B$ admit a right inverse itself? I don't think that that's what is being asked here. – t.b. Oct 20 '11 at 21:49
  • @t.b. I was assuming that the set of all n x n invertible matrices formed a group under multiplication-which it does-and then showing why in general for any group, a unique right inverse element must be equal to a unique left inverse element.In fact,the definition of an invertible matrix in particular doesn't even come up once we assert that the set of all n x n invertible matrices is a group under matrix multiplication. Then the argument is pure group theory. And look at what was asked:"Looking for insightful explanation as to why right inverse equals left inverse for square invt. matrices." – Mathemagician1234 Oct 21 '11 at 02:57
  • @t.b. cont. There,I clarified the argument more with another line added. Is it clear now? – Mathemagician1234 Oct 21 '11 at 03:02
  • 3
    @Mathemagician1234 : I think you are missing the point of the question. The OP is clearly aware that invertible matrices form a group -- in the question statement, he gave the usual group-theoretic proof that a left inverse and a right inverse have to be equal (using associativity). Adding the word "group" to this does not clarify anything. – Adam Smith Oct 21 '11 at 03:13
  • 3
    @Mathemagician1234: This is really just another instance of the situation for functions, since any element of a group/monoid can be seen as a function (via the Cayley representation)... – Arturo Magidin Oct 21 '11 at 03:14
  • @t.b., Adam: Are you guys serious?!? The OP seems to be working strictly from properties of matrices.Are we reading the same question? The fact he knows about associativity does not imply he even knows about groups-you can define vector spaces or even sets of matrices assuming ordinary matrix multiplication without any knowledge of groups whatsoever. That's how I originally learned linear algebra.Hence,his/her discussion of linear combinations,which really has nothing to do with the "sideness" of inverses and identities, which are strictly group-theoretic properties. – Mathemagician1234 Oct 21 '11 at 03:29
  • @Arturo: One certainly can use group theoretic properties to explain this and a function-theoretic explanation can be couched in group theory using the idea of group actions (where the Cayley representation is a special case of a group acting on itself). But as I was telling t.b. and Adam,the language used by the O.P. suggests a student with no knowledge of abstract group theory and is working strictly from the properties of matrices. – Mathemagician1234 Oct 21 '11 at 03:33
  • 4
    @Mathemagician1234 : I have no idea whether the OP knows the word "group", but he knows the "group-theory proof" that left and right inverses agree. I don't see how your explanation adds anything non-linguistic to his understanding. – Adam Smith Oct 21 '11 at 03:34
  • 2
    @Mathemagician1234, Adam: please remove the comments here which are completely off-topic. – Mariano Suárez-Álvarez Oct 21 '11 at 12:16
  • 6
    If you think your comments 'challenge' anything apart from the image of yourself that you choose to present to other users of this site, I think you are mistaken. – Mariano Suárez-Álvarez Oct 21 '11 at 22:46
  • 1
    @Mariano I removed all my comments and they downvoted me again.And I'm supposed to be civil in here,why,exactly......? – Mathemagician1234 Oct 22 '11 at 05:19
  • 3
    @Mathemagician1234 ... because civility is a basic requirement on this website. I understand that you might be upset because of these downvotes but I think you would be in a much better position if you ignored them. Of course, I am not taking sides on this matter but if you do not respond to Adam's comments, then there can be no conflict. Why do you let the comments of other people bother you when you do not even personally know the people? You have received three upvotes which means that there are people who appreciate your post; think about these people rather than the downvoters. – Amitesh Datta Oct 22 '11 at 11:07
  • 5
    @Mathemagician1234: you have to be civil independently of anything. – Mariano Suárez-Álvarez Oct 22 '11 at 12:35
  • 2
    @Mathemagician1234, please stop. Simply ignore him if you think he is doing all that. All this is, of course, still off-topic here... Please limit yourself to mathematical interventions in the site. – Mariano Suárez-Álvarez Oct 22 '11 at 21:30
  • @Marino I removed the personal comments.Sorry about that.Just felt the need to explain myself. – Mathemagician1234 Oct 23 '11 at 06:10