0

If I take $\frac{d}{dx}[ x^T x]$ for some vector $x=[x_1,x_2..]^T$. I can use the product rule to get $\frac{d}{dx}[x^T]x+x^T \frac{d}{dx}[x]$. and then by taking the transpose out of the derivative $(\frac{d}{dx}[x])^T x+x^T d/dx[x]=I^Tx+x^TI=x+x^T$.

Clearly I am making a mistake because these matrices have different dimensions. What am I doing wrong?

  • 1
    What does your operator $\frac{\mathrm{d}}{\mathrm{d}x}$ stand for? don't answer it's the derivative with respect to $x$ because that doesn't make sense here. – gniourf_gniourf Feb 28 '16 at 13:38
  • "I can prove with other methods that this is false" Unless you show these "other methods"... – Did Feb 28 '16 at 13:46
  • @gniourf_gniourf It makes sense if you consider the Fréchet derivative. https://en.wikipedia.org/wiki/Fr%C3%A9chet_derivative. – mathcounterexamples.net Feb 28 '16 at 13:56
  • @gniourf_gniourf the problem came from trying to maximize $x^Tx$, over x in $R^n$ so we take a derivative and set to 0. I don't know the name of this type of derivative, but this wikipedia page has it https://en.wikipedia.org/wiki/Matrix_calculus does that clear it up? – user3490171 Feb 28 '16 at 14:17
  • @Did you can use $d/dx[x^Ta]=d/dx[a^Tx] =a^T$ after the product rule step, or simply note that the answer is nonsense because you cant add matrices of different dimensions and x is a vector – user3490171 Feb 28 '16 at 14:20
  • @user3490171 The point is not what I can use or not use, rather it is to make your question complete. http://math.stackexchange.com/a/482762/6179 – Did Feb 28 '16 at 14:23
  • The mistake might be to forget that differentials are linear functions. Here, considering the differential $L_x$ of the function $F:u\mapsto u$ at some point $x$ in $\mathbb R^n$, one gets that the differential $G_x$ of the function $G:u\mapsto F(u)^TF(u)$ at $x$ is the linear function such that, for every $h$, $$G_x(h)=L_x(h)^TF(x)+F(x)^TL_x(h).$$ Of course, here, $F(x)=x$ and $L_x(h)=h$ for every $x$ and $h$ hence $$G_x(h)=h^Tx+x^Th=h^T(2x),$$ thus, the differential of $F$ at $x$ can be represented by the vector $2x$. – Did Feb 28 '16 at 14:34
  • To sum up, the expression $I^Tx+x^TI$ in your post is correct and a shorthand for the linear functional $$h\mapsto I(h)^Tx+x^TI(h)=h^Tx+x^Th,$$ and $h^Tx+x^Th$ is not the nonexistent quantity $(x+x^T)h$ but $2h^Tx=2xh^T$ hence the differential is indeed (identifiable to the vector) $2x$ or (to the vector) $2x^T$, depending on the convention you use. – Did Feb 28 '16 at 14:38
  • @ Did Awesome, Thank you! I have never heard of anything like that. Where can I learn more about derivatives as linear functionals so I no longer make similar mistakes? – user3490171 Feb 28 '16 at 14:45
  • https://en.wikipedia.org/wiki/Total_derivative#The_total_derivative_as_a_linear_map – Did Feb 28 '16 at 15:31

0 Answers0