Why did mathematicians choose the inner product to be linear in the first argument instead of the second?

Question

From my limited experience with inner product spaces, it seems like the inner product being linear in the second argument would facilitate smoother notation. For instance, for $x \in H$, we could define $x^* \in H^*$ by $$x^*y = \langle x, y\rangle $$ Then this would generalize the fact that $x^T y = \langle x, y\rangle$ on $\mathbb{R}^n $.

Does linearity in the first argument make for smoother notation in some other aspect of Hilbert space theory?

It is just a convention. I always use linearity in the first component. — J. De Ro, Jul 20 '21 at 20:43
In my own experience this is completely arbitrary with no particular motivation behind it, although I do agree that math people tend to make the first entry linear as opposed to the second. — leslie townes, Jul 20 '21 at 20:55
I know quantum mechanicists like their linearity in the second argument. — Arthur, Jul 20 '21 at 20:56
Over $\mathbf C$, an inner product is defined as a sesquilinear form (= 1½ linear form) on a complex vector space. — Bernard, Jul 20 '21 at 21:11
If anything, and I may be called a heretic in some circles for this, I like the physicist notation better. Oftentimes we want to use $\langle x, - \rangle$ as a functional, and it's always made more sense to me for this to be linear by default. I would be interested to hear why other mathematicians prefer the linear-on-the-left convention, in addition to where this came from historically. — HallaSurvivor, Jul 20 '21 at 21:42
Also in physics $\mid x \rangle$ is a vector, so we better have $\mid ax + by \rangle = a\mid x \rangle + b\mid y \rangle$! Surely mathematicians must agree that $\langle x \mid$ cannot possibly be a vector. — Lars, Jul 20 '21 at 21:46
@Lars For a mathematician, there is no need to have the "bra-ket" type of markers around a vector, so the fact that $\langle x|$ "cannot possibly be a vector" is not a particularly convincing argument. — Ben Grossmann, Jul 20 '21 at 22:34
Here is one possibility. When Hamilton introduced quaternions and their scalar products he was writing $|a|^2=a\overline{a}$ in this order (using modern notation). It didn't matter for quaternions, as they commute with their conjugates, but perhaps the convention stuck when inner/scalar products were extended to complex vectors. — Conifold, Jul 21 '21 at 00:25
A Mathematician named Hilbert first defined the inner product around 1905. Though, to be honest, I believe this was the work of von Neumann, who was a student of Hilbert. Hilbert's convention was practically etched in stone by the time Dirac defined bra-ket notation. So, for the most part, Mathematicians have maintained the Hilbert convention. I think it's better to distinguish between the space and its dual, which Dirac notation does. The dual space was barely being formulated when Hilbert defined an inner product. — Disintegrating By Parts, Jul 21 '21 at 15:33

score 11 · Accepted Answer · answered Jul 21 '21 at 10:37

I have taught linear algebra using both conventions and I agree with your conclusion. I found the "physicist" convention having more advantages than disadvantages when working over $\mathbb{C}$ (or working simultaneously over $\mathbb{F}$ where $\mathbb{F} \in \left \{ \mathbb{R}, \mathbb{C} \right \}$). Those include:

It is now standard that vectors are identified with column vectors while covectors are identified with row vectors. Thus, the standard inner product on $\mathbb{R}^n$ is written in terms of matrix product as $\vec{x}^T \cdot \vec{y}$ (and cannot be written as $\vec{x} \cdot \vec{y}^T$). By replacing $T$ with $*$, one gets a standard inner product $\vec{x}^{*} \cdot \vec{y}$ on $\mathbb{C}^n$ which generalizes the real case and is naturally anti-linear in the first variable. In order to describe the standard inner product using a linear-in-the-first-variable convention on column vectors, one must define $\left< \vec{x}, \vec{y} \right> = \vec{y}^{*} \cdot \vec{x}$ which is more awkward.
The Riesz anti-isomorphism $V \mapsto V^{*}$ is given by $v \mapsto \left< v, \cdot \right>$. This is consistent with the idea that "$v$ acts on some vector $w$ by $\left< v, w \right>$" and is even clearer with the bra-ket notation in which a vector $v \in V$ defines a linear functional $\left< v \right|$ by $\left< v \right|(w) := \left< v \, | \, w \right>$. This imposes the requirement that the inner product is linear in the second variable.
The expansion of a vector $v$ in an orthonormal basis $(e_1,\dots,e_n)$ is written as $\sum_{i=1}^n \left< e_i, v \right> v$ which is consistent with the dual space notation $\sum_{i=1}^n e^i(v) v$ where $e^i$ is $i$-th element in the dual basis which gives you the $i$-th coordinate of a vector.
The matrix coefficients of a linear operator $T$ with respect to an orthonormal basis $e_1,\dots,e_n$ are given by $a_{ij} = \left< e_i, T(e_j) \right>$ (as opposed to $a_{ij} = \left< T(e_j), e_i \right>$ which is more awkward) while the matrix coefficient of $T^{*}$ are given by $\left< e_j, T(e_i) \right>$ (as opposed to $\left< T(e_i), e_j \right>$...).

The only mildly annoying thing I noticed with the "physicist" convention is that the defining property for the adjoint operator is naturally written as $\left< T^{*}v, w \right> = \left< v, Tw \right>$ while I was used to the form $\left< Tv, w \right> = \left< v, T^{*}w \right>$. Both forms are equivalent but if one wants to use the Riesz anti-isomorphism to justify the existence of $T^{*}$, the form $\left< T^{*}v, w \right> = \left< v, Tw \right>$ is more natural and takes some time getting used to.

Thanks for discussing those points! – gordta_chichrron Jul 22 '21 at 02:54 — gordta_chichrron, Jul 22 '21 at 02:54

Tyrrell McAllister · Answer 2 · 2023-06-13T15:38:20.047

I too prefer the physicists' convention ("linearity in the second argument"). However, I can think of one somewhat-compelling argument for "linearity in the first argument". (I don't know the history well enough to say whether this argument is in fact why mathematicians so often use linearity in the first argument.)

Let $X, Y, Z$ be arbitrary sets. There is a well-established convention of writing $Y^{X}$ for the set of all functions $X \to Y$. The process of currying turns each two-argument function $X \times Y \to Z$ into a one-argument function $Y \to Z^{X}$, and this process is reversible. Thus there is a natural bijection of sets $$Z^{X \times Y} \cong (Z^{X})^{Y}.$$ Note the similarity to the equation $z^{xy} = (z^{x})^{y}$ for positive real numbers $x,y,z$. This is a strong sign that the notation $Y^{X}$ is a good choice.

The way the currying bijection works is this: Given a function $\star \in Z^{X \times Y}$ (written using infix notation, so that $(x,y) \mapsto x \star y$), define the corresponding function $\varphi \in (Z^{X})^{Y}$ by \begin{equation}\tag{*}\label{eq:curry} \bigl(\varphi(y)\bigr)(x) = x \star y. \end{equation} It might bother us that the $x$ and the $y$ appear "in reverse order" on the left-hand side of this equation. But if we balk at this "backwardness" and insist on writing $\bigl(\varphi(x)\bigr)(y) = x \star y$, then we end up defining a bijection $Z^{X \times Y} \cong (Z^{Y})^{X}$, so the "backwardness" just reappears here instead. It is at least reasonable to prefer to stick with the bijection $Z^{X \times Y} \cong (Z^{X})^{Y}$ defined by $\bigl(\varphi(y)\bigr)(x) = x \star y$.

But once this choice is made, "linearity in the first argument" can be made to feel quite compelling. For, we want our inner product to be a two-argument function $\langle \, \cdot \, , \cdot \, \rangle \colon V \times V \to \mathbb{C}$ mapping each ordered pair $(x, y)$ of vectors to a scalar $\langle x, y \rangle$. Now, if we apply Equation \eqref{eq:curry} to the case where $x \star y = \langle x, y \rangle$, it is the second argument $y$ that ends up "becoming" a scalar-valued function $x \mapsto \langle x, y \rangle$ defined on $V$, and it is the first argument $x$ that is fed to that function. Thus we are led to have $y$ correspond to a covector $\langle \, \cdot \, , y \rangle$ and to have $x$ remain in the role of a vector. But covectors should be linear functionals, so at this point we are committed to making $\langle \, \cdot \, , \cdot \, \rangle$ be linear in the first argument.

The root of the problem is that we normally read function composition right-to-left, so that $f \circ g$ means "first $g$ acts, then $f$ acts," but exponential notation is an exception, because $(z^x)^y$ means "first exponentiate by $x$, then exponentiate by $y$." The ideal solution would probably be to have functions always act on their arguments from the right, so that Equation (*) above becomes $(x)\bigl((y)\varphi\bigr) = x \star y$. Then vectors would be written as rows and covectors as columns, and the inner product of vectors $x$ and $y$ would be $x \bar{y}^T = \langle x, y \rangle$. — Tyrrell McAllister, May 11 '23 at 18:35

Why did mathematicians choose the inner product to be linear in the first argument instead of the second?

2 Answers2

Linked