I too prefer the physicists' convention ("linearity in the
second argument"). However, I can think of
one somewhat-compelling argument for "linearity in the first
argument". (I don't know the history well enough to say whether
this argument is in fact why mathematicians so often use
linearity in the first argument.)
Let $X, Y, Z$ be arbitrary sets. There is a well-established
convention of writing $Y^{X}$ for the set of all functions $X \to
Y$. The process of currying turns each
two-argument function $X \times Y \to Z$ into a one-argument
function $Y \to Z^{X}$, and this process is reversible. Thus
there is a
natural bijection of sets $$Z^{X \times Y} \cong (Z^{X})^{Y}.$$
Note the similarity to the equation $z^{xy} = (z^{x})^{y}$ for
positive real numbers $x,y,z$. This is a strong sign that the
notation $Y^{X}$ is a good choice.
The way the currying bijection works is this: Given a function
$\star \in Z^{X \times Y}$ (written using infix notation, so that $(x,y)
\mapsto x \star y$), define the corresponding function $\varphi \in
(Z^{X})^{Y}$ by
\begin{equation}\tag{*}\label{eq:curry}
\bigl(\varphi(y)\bigr)(x) = x \star y.
\end{equation}
It might bother us that the $x$ and the $y$ appear "in reverse
order" on the left-hand side of this equation. But if we balk at
this "backwardness" and insist on writing
$\bigl(\varphi(x)\bigr)(y) = x \star y$, then we end up defining a
bijection $Z^{X \times Y} \cong (Z^{Y})^{X}$, so the
"backwardness" just reappears here instead. It is at least
reasonable to prefer to stick with the bijection $Z^{X \times Y}
\cong (Z^{X})^{Y}$ defined by $\bigl(\varphi(y)\bigr)(x) = x \star
y$.
But once this choice is made, "linearity in the first argument"
can be made to feel quite compelling. For, we want our inner
product to be a two-argument function $\langle \, \cdot \, , \cdot \,
\rangle \colon V \times V \to \mathbb{C}$ mapping each ordered pair $(x,
y)$ of vectors to a scalar $\langle x, y \rangle$. Now, if we
apply Equation \eqref{eq:curry} to the case where $x \star y =
\langle x, y \rangle$, it is the second argument $y$ that
ends up "becoming" a scalar-valued function $x \mapsto \langle
x, y \rangle$ defined on $V$, and it is the first argument
$x$ that is fed to that function. Thus we are led to have $y$
correspond to a covector $\langle \, \cdot \, , y \rangle$ and to have
$x$ remain in the role of a vector. But covectors should be
linear functionals, so at this point we are committed to
making $\langle \, \cdot \, , \cdot \, \rangle$ be linear in the
first argument.
sesquilinear
form (= 1½ linear form) on a complex vector space. – Bernard Jul 20 '21 at 21:11