1

I am currently studying linear algebra and I have difficulty understanding what matrices are and what they represent. I've read various books and in general the definition of a matrix comes down to this:

An $m×n$ matrix is a rectangular array of numbers with $m$ rows and $n$ columns.

Alright, an array of numbers. But why would one need such and array of numbers? What can they put in rows and columns? As I proceeded with my research I figured out that matrices come from solving linear equations and as far as I understand

$$ \begin{pmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \\ \end{pmatrix} $$

is just a better way of writing

$$ \begin{cases} 1x+2y+3z \\ 4x+5y+6z \\ 7x+8y+9z \\ \end{cases}. $$

But suddenly there is a solution vector that can be written like this:

$$ \begin{pmatrix} x \\ y \\ z \\ \end{pmatrix}. $$

I understood that each row of the matrix is an equation and each column is coefficient of the unknown, but why a solution vector is a vertical column-matrix? What do the coefficients $x, y, z$ correspond to in the equations? I personally would write solution vector like this

$$ \begin{pmatrix} x & y & z\\ \end{pmatrix} $$

and I would understand that $x$ of this solution vector corresponds to $x$ in all of the equations.

Could you please help me by answering all these questions?

P.S. I know there is matrix multiplication involved in this, but I am trying to understand the idea of the matrix before I go on to matrix operations, because without the idea of the matrix I don't understand why I can do these operations the way I can do them.

Alp Uzman
  • 10,742
  • 1
    It is the other way around. The meaning of a matrix is to represent a linear map. So first we have a linear map $F$, say, with $F(e_1)=e_1+4e_2+7e_3$, and $F(e_2)=2e_1+5e_2+8e_3$, etc., and then I want a matrix to present my linear map $F$. And the matrix then is just your telephone matrix. And with matrix multiplication $F\cdot (1,0,0)^T=F(e_1)$ etc., with $e_1=(1,0,0)^T$. In general, the matrix $A$ for $F$ is defined by $Ax=F(x)$. – Dietrich Burde Sep 24 '23 at 14:21
  • A matrix can be defined as a rectangular array of numbers, but that is to miss the reason that matrices are so important. A matrix is a representation of a linear mapping with respect to a particular basis (a set of reference vectors). Matrix multiplication is the way it is because it represents the composition of linear mappings. You are being introduced to a particular application of matrices (solving simultaneous equations) as a motivation, but beware of thinking that application is the essence of what a matrix is. – Mark Bennet Sep 24 '23 at 14:29
  • 2
    You could totally write your solution vector horizontally. But then you wouldn't be able to write your linear equations as $Ax=b$, where the first part is matrix multiplication. You're not going to get any sensible answer that doesn't make reference to matrix multiplication. – JonathanZ Sep 24 '23 at 14:30
  • Does this answer your question? What exactly is a matrix? – blargoner Sep 24 '23 at 17:39

2 Answers2

1

Matrices are closely linked to linear functions. A linear function $f:\mathbb{R}\to\mathbb{R}$ is such that $f(x)=ax$ for some $a \in \mathbb{R}$. It is synonymous with being proportional, and in this case, the function is uniquely determined by the constant $a$. If it's $f:\mathbb{R}^2\to\mathbb{R}$, then $f(x,y)=ax+by$, and the function is uniquely determined by the constants $a,b$ (and it's domain and counterdomain, obviously). If it's $f:\mathbb{R}\to\mathbb{R}^2$, then the codomain is a list of two numbers, and we have $f(x)=(ax,bx)$, and it is also determined only by the constants $a,b$ (and it's domain and counterdomain, that are different from the previously example). In general, a function $f:\mathbb{R}^n\to\mathbb{R}^m$ will be characterized by $m\times n$ numbers, and that's why we use the notation $a_{ij}$ to represent them, where $0 \leq i \leq m$ and $0 \leq j \leq n$. The way we choose to arrange these numbers is in a matrix $A$ such that

$$ A=\begin{bmatrix} a_{11} & a_{12} & ... & a_{1n}\\ a_{21} & a_{22} & ... & a_{2n}\\ \vdots & \vdots & & \vdots\\ a_{m1} & a_{m2} & ... & a_{mn}\\ \end{bmatrix}\,. $$

So, this matrix uniquely determines the function $f$. You should study matrix multiplication at this point because the definition of multiplication is done in such a way that it satisfies the following property: $f(x_1,x_2,\dots,x_n)=Ax$, where $A$ is the matrix representing the function, and $x$ is a column matrix with elements $x_1,x_2,\dots,x_n$. You could define multiplication differently and choose $x$ as a row matrix; it wouldn't be a problem; it's a convention. With this multiplication, assuming, for example, $n=2$ and $m=3$ it will have the form:

$$ Ax= \begin{bmatrix} a_{11} & a_{12}\\ a_{21} & a_{22}\\ a_{31} & a_{32} \end{bmatrix} \begin{bmatrix} x_1\\ x_2 \end{bmatrix} = \begin{bmatrix} a_{11}x_1+ a_{12}x_2\\ a_{21}x_1+ a_{22}x_2\\ a_{31}x_1 +a_{32}x_2 \end{bmatrix} = \begin{bmatrix} y_1\\ y_2\\ y_3 \end{bmatrix}\,. $$

This is the same problem as writing $f(x)=y$, where $x=(x_1,x_2)$ and $y=(y_1,y_2,y_3)$. The problem of solving a linear system is then the inverse of this problem: Given $y$ what $x$ satisfies the expression $f(x)=y$? If the function $f$ is surjective, then there is always at least one solution; if it is injective, then if there is a solution, it is unique, and if it is bijective or invertible, there is always a solution, and it is unique. So, you can think of the matrix, in the concept of linear algebra, as merely a table that defines who the function $f$ is or represents the function $f$ while matrix multiplication precisely represents the operation of the function $f$ on an $n$-tuple $x=(x_1,x_2,\dots,x_n)$. It is not possible to understand a matrix in this sense outside the context of matrix multiplication.

That's why the solution vector is a column matrix, as that's how we initially defined multiplication. If you want it to be a row vector, you'll either have to change the way multiplication works or not use matrix multiplication for solving. Remember, the column matrix is not the same as the $n$-tuple, even though they may look similar. We mapped the problem of linear systems and the application of linear functions into the world of matrices for convenience.

0

Your questions are valid and deep, but in my humble opinion at this stage it might be better to just get accustomed to the syntax of matrix algebra. There is an analogous situation when one first learns elementary arithmetic. For instance consider

$$ e^{\sqrt{\pi}+e} = e^{\sqrt{\pi}}e^e. $$

Without having any answers to the questions "What do these numbers/symbols mean?" and "What do they represent?", one can certainly get accustomed to the governing rule ("exponentials turn addition into multiplication"). Similarly, when one learns basic arithmetic for the first time, often one is instructed to focus on how the symbol $0$ works with other symbols like $+$, $\times$, $\div$, with possibly some "intuitive" explanations; as opposed to going into what $0$ really is or represents.

One can ask similarly valid questions when one is first learning a new language, e.g. "Why is it the case that in Turkish the common sentence structure is subject-object-verb, despite the fact that apparently any other ordering is also grammatical?" (this contributor's native language is Turkish).


Here are now some slightly more specific answers:

But why would one need such and array of numbers? What can they put in rows and columns?

Ultimately using a rectangular array to store multiple numbers is a convenience. Right from the get go it allows one to consider multiple numbers as one entity, and moreover if one is careful with the width and the height of the array there emerges some syntax (matrix multiplication) that makes calculations more convenient. One can also speculate that the choice of rows and columns (say as opposed to writing the numbers in a circle) is influenced by the fact that most writing systems (and textbook reproduction schemes) are along verticals and horizontals.

[...] why a solution vector is a vertical column-matrix?

Perhaps we can answer this question by doing the conversion between matrices and linear equations (or expressions) that you suggested a bit slower. Say you start with the system

$$ \begin{cases} 11x+12y+13z =14\\ 21x+22y+23z =24\\ 31x+32y+33z=34 \\ \end{cases}. $$

First one can convert this to an equality of tuples:

$$ (11x+12y+13z,21x+22y+23z,31x+32y+33z) = (14,24,34) $$

This allows one to have one left-hand side and one right-hand side, as opposed to multiple. But then this is not quite as convenient, because in order to see what the first component on the LHS is equal to one ends up needing to skip a few components, so it might be better ($\ast$) to write the tuples vertically:

$$ \begin{pmatrix} 11x+12y+13z\\ 21x+22y+23z\\ 31x+32y+33z \end{pmatrix} = \begin{pmatrix} 14\\ 24\\ 34 \end{pmatrix}. $$

Here we still keep the parens to connote that we are equating one thing to another one thing, but we drop the commas as different lines already separate the entries. (Of course one could instead use tables, or square brackets, or even merely record somewhere else where each entry is supposed to be; these are stylistic choices.)

But then, after defining matrix multiplication the way most people would do nowadays, we can further factor out the LHS:

$$ \begin{pmatrix} 11&12&13\\ 21&22&23\\ 31&32&33 \end{pmatrix} \begin{pmatrix} x\\ y\\ z \end{pmatrix} = \begin{pmatrix} 14\\ 24\\ 34 \end{pmatrix}. $$

This now is even more convenient as it allows one to tap into the intuition coming from elementary arithmetic: one known thing (the square matrix on the LHS) is multiplied by one unknown thing (the column matrix on the LHS whose entries are letters), and the end result is one known thing (the column matrix on the RHS). Abbreviating, we have the equation

$$ AX=B. $$

One can then multiply both sides by the same thing(s) to simplify the equation (row-reduction), or even multiply both sides by the multiplicative inverse $A^{-1}$ from the left (in this particular case it turns out that there is no $A^{-1}$).

On the other hand even if one decides to use matrix multiplication to write the system of linear equations in a more compact manner, there is some choice involved (around ($\ast$) above). The above notation is arguably the more common one, but if one wanted one could have stored the unknowns in a row matrix like so:

$$ \begin{pmatrix} x& y& z \end{pmatrix} \begin{pmatrix} 11&21&31\\ 12&22&32\\ 13&23&33 \end{pmatrix} = \begin{pmatrix} 14& 24& 34 \end{pmatrix}. $$

Note that to be consistent with the matrix multiplication convention, we needed to modify the rest of the equation also: assuming that one wants to have a consistent matrix multiplication, the cost of using a row matrix to store the unknowns instead of a column matrix is changing the order of multiplication and swapping (or flipping) rows and columns throughout the matrix equation. The term of art for this swap is "taking the transpose", and indeed one can extend the abbreviated equation consistently by writing:

$$X^tA^t = B^t.$$

In elementary arithmetic this swap does not do anything (there are no rows or columns to speak of), so the matrix notation not only allows one to make analogies with elementary arithmetic but also it encodes the extra structure when one considers "higher dimensional numbers", so to speak.

($\dagger$) There is still something fundamentally confusing and mysterious about all this; how all this machinery seems to simultaneously have some arbitrariness and some universality, speculatively coming from different (but perhaps only seemingly) sources, whether it can be explained from a purely mathematical (or evolutionary) perspective etc.. This line of thought, although interesting, gets too philosophical and opinion-based for this site in my opinion, hence I will not continue.

What do the coefficients $x, y, z$ correspond to in the equations?

As a first answer, each one of $x,y$ and $z$ stands for an "arbitrary, but fixed" real number. "An", but not necessarily "one". In fact, going further, not even an "an", necessarily, since one can use letters as placeholders in equations for numbers before having any guarantee that the placeholders can be "filled in" with actual, known numbers (similar to one being able to write $x^2+1=0$ purely based on the fact that it is syntactic, i.e. allowed, to do so; it turns out in this case too that syntacticity signifies something much deeper, which point is related to the vaguesness of the paragraph marked with ($\dagger$) above). I like to say that they are "anonymous numbers", since the word "anonymous" also connotes something vague about identity, uniqueness, and even existence.

As a second answer, after computing that in this case there are many solutions to the system (and that from a geometric point of view they constitute a line), one could also think of $x,y$ and $z$ as anonymous numbers that one can use to write down relations defining the aggregate of all specific, actual tuples solving the system.

Alp Uzman
  • 10,742