How to find a derivative with respect to a matrix?

Question

If I have $f: \mathbb{R}^{n \times m} \times \mathbb{R}^{n} \to \mathbb{R}^{m}$

And $f(K,t) = Kt + h$ where $h \in \mathbb{R}$

How would I find $\frac{\partial f}{\partial K}$ and $\frac{\partial f}{\partial t}$?

It depends on your conventions, and in particular, what kind of object you mean when you write $\frac{\partial f}{\partial K}$. One thing which you can easily calculate (and for which there is no notational ambiguity) is the derivative with respect to the coordinates, $\frac{\partial f}{\partial K_{ij}}$. — user7530, Jan 09 '21 at 07:40
What does $x^2$ mean when it's a matrix? Did you mean to make this a quadratic form instead? — Ninad Munshi, Jan 09 '21 at 07:58
If you change the question after someone answers you might add a comment informing that their answer is no longer applicable. — copper.hat, Jan 09 '21 at 09:02

copper.hat · Answer 1 · 2021-01-09T19:57:48.953

0

$F(K+H,t) - F(K,t) = tH $, so the derivative with respect to $K$ is the map $H \mapsto t H$. This is often written as ${\partial F(K,t) \over \partial K} H = tH$. One needs to be careful in the sense that a linear map over matrices cannot always be written as matrix multiplication (as would the be case in $\mathbb{R}^n$, for example).

$f(K,t+h) - f(K,t) = hK$, so the derivative with respect to $t$ is the map $h \mapsto hK$. We have ${\partial F(K,t) \over \partial t} h = hK$. Since the relevant parameter in in the reals, this is usually written as ${\partial F(K,t) \over \partial t} = K$.

edited Jan 09 '21 at 19:57

answered Jan 09 '21 at 07:45

copper.hat

172,524

This doesn't make much sense? The derivative with respect to $t$ (which is a vector) should be an $n$-dimensional vector with the property that $f(K, t+h) - f(K, t) = h\frac{\partial f}{\partial t}(K, t) + o(|h|)$ as $h\to 0$. The derivative with respect to a matrix doesn't seem like it can be defined in the same way because $(\mathbb{R}^{n\times m}, \det(\cdot - \cdot))$ doesn't define a metric spaces ($\det$ isn't a norm) – Stefan Octavian Jan 09 '21 at 08:35
@StefanOctavian When I answered the question there was a term $x^2K$ in the expression in which case it made sense. It is a little unreasonable to expect me to check all of my 6,000+ answers on a regular basis because the OP decides to edit the question. I am fairly familiar with the derivative thank you. – copper.hat Jan 09 '21 at 09:00
I'm not saying that your answer is not applicable because of the change in the question, I'm saying it doesn't make much sense regardless of the $x$ or $x^2$ term. Really the derivative wrt $x$ here has to be a function $\mathbb{R}^{n\times m}\times \mathbb{R}\to\mathbb{R^n}$, I have no idea what your derivative is because $h$ shouldn't even be a term in it. – Stefan Octavian Jan 09 '21 at 12:24
@StefanOctavian The derivative is a linear map and has been for some time. Maybe look at https://en.wikipedia.org/wiki/Fr%C3%A9chet_derivative, https://math.stackexchange.com/q/621949/27978, https://math.stackexchange.com/q/945141/27978, https://math.stackexchange.com/q/1260050/27978 ? – copper.hat Jan 09 '21 at 17:41
Ok, now that you editted your answer I get what you meant but the way you phrased it was confusing, because the term "the derivative wrt x" can refer to the map $x \mapsto$ the linear map defined as the derivative at $x$. Here, this map was constant. Moreover, in simple contexts, the idea of the derivative as a linear map is replaced with easier ideas whose duals are the linear maps. That's why I said the derivative wrt to x should be a vector, but actually I meant to say a matrix. The dual of this matrix (in the vector space of matrices) is the map $h \mapsto hK$. This matrix is $K$. – Stefan Octavian Jan 10 '21 at 11:21
Also, the first part of your answer still doesn't make sense because the vector space of real $n\times m$ matrices is not a normed vector space, so the definition of the Frechet derivative can not be used, as the limit with $H\to O_{n\times m}$ doesn't mean anything. – Stefan Octavian Jan 11 '21 at 17:47
@StefanOctavian Please tell me you are joking. Of course you can treat the space of matrices as a Banach space (take the Frobenius norm for example). If something doesn't make sense to you please ask, but don't declare that it does not make sense. – copper.hat Jan 11 '21 at 18:41
Ok, sorry, I realise I've been mistaken. I didn't know about the Frobenius norm, but thank you for clearing it out to me. I admit, I should have just asked about it. Your answer didn't make it clear what this derivative was and it seems like the op doesn't know either. But now I understand. My bad. – Stefan Octavian Jan 11 '21 at 21:47

greg · Accepted Answer · 2021-01-09T19:27:52.073

$\def\p#1#2{\frac{\partial #1}{\partial #2}}\def\e{\varepsilon}\def\R#1{\in{\mathbb R}^{#1}}$To expand on user7530's comment, coordinate-wise derivatives are a useful approach which avoids higher-order tensors or transformations (i.e. vectorization) which flatten those tensors into matrices.

Given the following variables $$\eqalign{ A\R{m\times n} \qquad&b&\R{n} \qquad&c&\R{m} }$$ their coordinate-wise derivatives are $$\eqalign{ \p{A}{A_{ij}} &= e_i\,\e_j^T \qquad \p{b}{b_{i}} &= e_i \qquad \p{c}{c_{j}} &= \e_j \\ }$$ where {$\,e_i,\,\e_j\,$} are the cartesian basis vectors for their respective dimensions.

Applying this to the current problem yields $$\eqalign{ f &= Kt +h \\ \p{f}{K_{ij}} &= \left(e_i\e_j^T\right)t \;&=\; e_i\,t_j \\ \p{f}{t_{i}} &= K\left(e_i\right) \;&= \big(i^{th}\,{\rm column\,of\,}K\big) \\ }$$

How to find a derivative with respect to a matrix?

2 Answers2