0

Let $\theta_i$ and $\theta_j$ are vectors with same length.(and there are $n$ vectors)

I define the cost function like below :

$$ C(\theta)=\frac12 \sum_{i=1}^n \sum_{\begin{smallmatrix} j=1 \\ j\neq i \end{smallmatrix}}^n \cos^2(\theta_i,\theta_j) = \frac12 \sum_{i=1}^n \sum_{\begin{smallmatrix} j=1 \\ j\neq i \end{smallmatrix}}^n \Big(\frac{\left<\theta_i,\theta_j\right>}{\|\theta_i\|\|\theta_j\|}\Big)^2 $$ So, to use gradient decent I want to get the derivatives of the $C(\theta)$ with respect to the parameters $\theta$.

How do I get this?

1 Answers1

1

First, collect the vectors into the columns of a single matrix $$ X = \big[\,\theta_1\;\theta_2\;\ldots\;\theta_n\,\big] \in{\mathbb R}^{m\times n} $$ Then form the Gram matrix and some auxiliary matrices $$\eqalign{ G &= X^TX \implies dG = (X^TdX + dX^TX) \\ B &= {\rm Diag}(G) = I\odot G \\ L &= B^{-1/2} \implies B = L^{-2},\quad dB = -2L^{-3}dL \\ Y &= LGL \\ M &= GLY + YLG,\quad P=I\odot M \\ }$$ all of which are $n\times n,\,$ symmetric. Several of them $(B,L,P)$ are diagonal and these will be particularly nice to work with since they commute and their functions (e.g. square roots) are easy to calculate.

The symbol $\odot$ is the elementwise/Hadamard product and $I$ is the $(n\times n)$ identity matrix.

Write the cost function in terms of $Y$, then find its differential and gradient. $$\eqalign{ C &= \tfrac{1}{2}(Y:Y - n) \\ dC &= Y:dY \\ &= Y:(L\,dG\,L + LG\,dL + dL\,GL) \\ &= LYL:dG + M:dL \\ &= LYL:dG - \tfrac{1}{2}M:L^3dB \\ &= LYL:dG - \tfrac{1}{2}L^3M:(I\odot dG) \\ &= \big(LYL - \tfrac{1}{2}(L^3M)\odot I\big):dG \\ &= L\big(Y - \tfrac{1}{2}LP\big)L:(X^TdX + dX^TX) \\ &= L\big(2Y - LP\big)L:X^TdX \\ &= XL\big(2Y - LP\big)L:dX \\ \frac{\partial C}{\partial X} &= XL\big(2Y - LP\big)L \\ }$$ To find the gradient with respect to the $k^{th}$ column of $X$ (wrt $\theta_k$), multiply the above matrix-valued gradient by $e_k,\,$ the $k^{th}$ standard basis vector, to obtain $$\eqalign{ \frac{\partial C}{\partial \theta_k} &= \bigg(\frac{\partial C}{\partial X}\bigg)e_k \;=\; XL\big(2Y - LP\big)Le_k \\ }$$ In several steps above, a colon is used to represent the trace/Frobenius product $\;A:B = {\rm Tr}(A^TB)$

The properties of the trace allow the terms in such products to be rearranged in lots of different ways, e.g. $$\eqalign{ A:B &= B:A \\ A:B &= A^T:B^T \\ A:BC &= B^TA:C = AC^T:B = etc \\ }$$ One last property is that Hadamard and Frobenius products commute $$\eqalign{ A:(B\odot C) &= (A\odot B):C \\ &= (B\odot A):C \\ &= C:(B\odot A) \\ }$$

greg
  • 35,825