I try to compute the linear approximation to, say, $f(A)=A^{-1}$ using a result such as http://www.matrixcalculus.org/ that returns $-A^{-1}\otimes A^{-1}$ which is not a linear function but a matrix (Kronecker product). I guess that they mean that the linear approximation is the function that gets a $d\times d$ matrix $X$ and returns $-A^{-1} XA^{-1}$. What are the hidden assumptions in this interpretation?
Similarly, in here, as all the other places I read, it is assumed that the linear approximation uses the Frobenius/standard inner product (trace). Is this the only possible interpreation? How can I derive it myself?
I know that the norms $(V,||.||_V)$ and $(W,||.||_W)$ are equivalents in such finite dimensional spaces (so we can assume Frobenius norm wlog), and that the derivative (linear function) $D_f:V\to W$ is unique, but it is only given specific vector spaces $V$ and $W$. In the questions above we may define the (undefined) domain of $f$ to be any vector space that contains the matrix $A$ (symmetric matrices, kenels..), and we also have many options for the image vector space $W$.
More trivially, in the books they write that the (Frechet) derivative of $2x$ is $2$, but such a derivative must be a linear function. They of course mean that the derivative of $f:V\to W$ that maps $x$ to $2x$ is the linear function $f':V\to W$ that maps every $x$ to $2$ where $V=W=R$. Can't we define different subset $V$ of $R$ with different product operator (still norm) that would yield another answer? Say, binary numbers and operators?
The question is also related to the big confusions and mistakes over the years in this forum and famous books, as explained in: https://arxiv.org/abs/1911.06491
Hope it is clear enogh and many thanks in advance.
about gradients, which involve a choice of inner product and which depend on that choice, and also about how these choices interact with passing to subspaces such as symmetric matrices.
– Dan Feldman Sep 24 '22 at 16:31