Given a differentiable function $$F:M_{n\times n}(\mathbb{R})\to\mathbb{R}$$ How am I to interpret its first order approximation?
The derivative of a real valued function of a matrix is a matrix valued function of a matrix. So what I did was treat this function just as if it was a function $$F:\mathbb{R}^{n\times n}\to\mathbb{R}$$ and formed what I hope is the first order approximation: $$F(X+H)\approx F(X) + \text{tr}(F'(X)H^T)$$
Is this correct? Would the second order approximation then contain a 3-tensor? Even though interpreting it as a vector function it would only be a matrix (the Hessian).
What precisely then is the difference between the domains $M_{n\times n}(\mathbb{R})$ and $\mathbb{R}^{n\times n}$. They are isomorphic as vector spaces, so should I think about $M_{n\times n}(\mathbb{R})$ as just $\mathbb{R}^{n\times n}$ with some extra multiplicative structure?
Where can I go to learn more about this type of stuff?
Update
Ok so I spent some time looking at Aloizio's answer and here is my understanding. Aloizio says that the derivative is a continuous linear function from $\mathbb{E}\mapsto\mathbb{F}$, and that made me realize that for functions $\mathbb{R}^n\mapsto\mathbb{R}$, the gradient is not the derivative, the first derivative is the function which computes the dot product of the gradient with a vector. This is confusing since there seems to be a tendency in mathematics to conflate the linear map of partial derivatives with the derivative itself.
Looking at the determinant as an example, we calculate $$\det(A+H) = \det(A) + \text{tr}(Adj_AH) + \epsilon(A,H)$$ where $Adj_A$ is the adjoint matrix of $A$. And thus the derivative of the determinant at $A$ is $$\text{tr}(Adj_A\;\cdot\;)$$ which is of course continuous and linear.
Now for a general differentiable function $M_{n\times n} (\mathbb{R})\mapsto\mathbb{R}$. We see that $\text{tr}(Adj_A\;\cdot\;)$ is just a row-stacked version of the dot product, and thus the distinction between $M_{n\times n} (\mathbb{R})\mapsto\mathbb{R}$ and $\mathbb{R}^{n\times n}\mapsto\mathbb{R}$ appears rather cosmetic, and we should expect all first derivatives to take the form $\text{tr}(B\;\cdot\;)$ for $B$ the matrix of partial derivatives. I also imagine there's some way to write the second derivative as a function involving a 3-tensor, but I don't know anything about tensors so I couldn't say for sure.