5

I asked a question about derivative of cosine similarity.

But no one has answered my question. Therefore I tried to do it my self as bellow.

$$ cossim(a,b)=\frac{a\cdot{b}}{\sqrt{a^2\cdot{b^2}}} \\\frac{cossim(a,b)}{\partial{a_1}}=\frac{\partial}{\partial{a_1}} \frac{a_1\cdot{b_1}+...+a_n\cdot{b_n}}{|a|\cdot|b|} \\=\frac{\partial}{\partial{a_1}}a_1\cdot{b_1}\cdot{(a_1^2+a_2^2+...a_n^2)^{-1/2}}\cdot{|b|^{-1}} \\= {b_1}\cdot{(a_1^2+a_2^2+...a_n^2)^{-1/2}}\cdot{|b|^{-1}}-a_1^2b_1(a_1^2+a_2^2+...a_n^2)^{-3/2} {|b|^{-1}} \\=\frac{b_1}{|a|\cdot{|b|}}-\frac{a_1|a|^{-2}\cdot{a_1b_1}}{|a|\cdot{|b|}} \\=\frac{b_1}{|a|\cdot{|b|}}-\frac{a_1\cdot{b_1}}{|a|\cdot{|b|}}\cdot{\frac{a_1} {|a|^2}} \\\therefore \frac{\partial}{\partial{a}}cossim(a,b)= \frac{b_1}{|a|\cdot{|b|}}-cossim(a,b)\cdot{\frac{a_1} {|a|^2}} $$

Is this correct?

ycyoon
  • 103

2 Answers2

8

Putting $$ \cos (\mathbf{v},\mathbf{w}) = \frac{{\mathbf{v} \cdot \mathbf{w}}} {{\left| {\mathbf{v} } \right|\;\left| \mathbf{w} \right|}} $$ I would develop the required derivative as follows: $$ \cos (\mathbf{v} + d\mathbf{v},\mathbf{w}) = \frac{{\mathbf{v} \cdot \mathbf{w} + d\mathbf{v} \cdot \mathbf{w}}} {{\left| {\mathbf{v} + d\mathbf{v}} \right|\;\left| \mathbf{w} \right|}} $$ Now $\left| {\mathbf{v} + d\mathbf{v}} \right|$ can be rewritten as: $$ \begin{gathered} \left| {\mathbf{v} + d\mathbf{v}} \right| = \sqrt {\left( {\mathbf{v} + d\mathbf{v}} \right) \cdot \left( {\mathbf{v} + d\mathbf{v}} \right)} = \sqrt {\left| \mathbf{v} \right|^2 + \left| {d\mathbf{v}} \right|^2 + 2\mathbf{v} \cdot d\mathbf{v}} = \hfill \\ = \left| \mathbf{v} \right|\sqrt {1 + 2\frac{\mathbf{v}} {{\left| \mathbf{v} \right|^2 }} \cdot d\mathbf{v} + \frac{{\left| {d\mathbf{v}} \right|^2 }} {{\left| \mathbf{v} \right|^2 }}} \approx \left| \mathbf{v} \right|\left( {1 + \frac{\mathbf{v}} {{\left| \mathbf{v} \right|^2 }} \cdot d\mathbf{v}} \right) \hfill \\ \end{gathered} $$ hence: $$ \begin{gathered} \cos (\mathbf{v} + d\mathbf{v},\mathbf{w}) = \frac{{\mathbf{v} \cdot \mathbf{w} + d\mathbf{v} \cdot \mathbf{w}}} {{\left| {\mathbf{v} + d\mathbf{v}} \right|\;\left| \mathbf{w} \right|}} \approx \frac{{\mathbf{v} \cdot \mathbf{w} + d\mathbf{v} \cdot \mathbf{w}}} {{\left| \mathbf{v} \right|\left( {1 + \frac{\mathbf{v}} {{\left| \mathbf{v} \right|^2 }} \cdot d\mathbf{v}} \right)\;\left| \mathbf{w} \right|}} \approx \hfill \\ \approx \frac{{\mathbf{v} \cdot \mathbf{w} + \mathbf{w} \cdot d\mathbf{v}}} {{\left| \mathbf{v} \right|\;\left| \mathbf{w} \right|}}\left( {1 - \frac{\mathbf{v}} {{\left| \mathbf{v} \right|^2 }} \cdot d\mathbf{v}} \right) \approx \frac{{\mathbf{v} \cdot \mathbf{w}}} {{\left| \mathbf{v} \right|\;\left| \mathbf{w} \right|}} + \left( {\frac{\mathbf{w}} {{\left| \mathbf{v} \right|\;\left| \mathbf{w} \right|}} - \frac{{\mathbf{v} \cdot \mathbf{w}}} {{\left| \mathbf{v} \right|\;\left| \mathbf{w} \right|}}\frac{\mathbf{v}} {{\left| \mathbf{v} \right|^2 }}} \right) \cdot d\mathbf{v} = \hfill \\ = \cos (\mathbf{v},\mathbf{w}) + \left( {\frac{\mathbf{w}} {{\left| \mathbf{v} \right|\;\left| \mathbf{w} \right|}} - \cos (\mathbf{v},\mathbf{w})\frac{\mathbf{v}} {{\left| \mathbf{v} \right|^2 }}} \right) \cdot d\mathbf{v} \hfill \\ \end{gathered} $$ Therefore, apart from a typo, your derivation $$ \frac{\partial}{\partial{a_1}}cossim(a,b)= \frac{b_1}{|a|\cdot{|b|}}-cossim(a,b)\cdot{\frac{a_1} {|a|^2}} $$ looks to be correct.

G Cab
  • 35,272
  • 1
    @G Cab: I am wondering if this derivation still holds for complex numbers? Say $w$ and $v$ are complex vectors. Because really $\frac{v}{|v|^2}= \frac{1}{v^T}$ ($T$: conjugate transpose); Then do we still have the same conclusion? – Nick X Tsui Mar 30 '17 at 15:15
  • @NickXTsui: interesting question: thanks for posing it. In principle it should be valid also for complex vectors, since the dot product and the modulus is defined also for them. The difference is that the complex dot product is defined with one vector being conjugated: I will analyze the subject and add to my answer, or can you do that ? It would be preferrable since the idea is yours, let me know. – G Cab Mar 30 '17 at 19:18
  • I know this is old, but I just want to reach out for a question I have since Im having a similar issue. The "1" in a1/b1 can just be replaced with "i" to get the generic result for any given row, right? Then, if we were doing this across an entire matrix, you could update each row with the calculated ai derivative, correct? @GCab I'm specifically trying to do this exact problem (partial derivative of CosSim) when doing cosine_similarity of a matrix. So cossim(X) gives you a NxN symmetric matrix with the similarity between any two rows. – Jibril May 15 '18 at 02:46
  • @Jibril: a) yes, of course the component of $\bf a$ can be whichever b) that's interesting to perform on the rows (or columns) of a Matrix: so $\bf a$ would be $\bf A_n$ (row $n$) and $\bf b= \bf A_m$. And then, yes, you get a symmetric matrix that tells you how much the matrix is far from being orthogonal. – G Cab May 15 '18 at 07:45
  • Thanks for the quick answer @GCab My question is, for the derivative, there is many combinations. Like you say, An is row n of matrix A, b is row m of matrix A. However, there are multiple combinations. If you have 5 rows you have (N,N), (N,M), (N,O), (N,P), (N,Q). Then you have similar combinations for M, O, P, and Q. This obviously gives more values than the expected derivative size (NxN). Do you sum these? EX: An and Am are two rows. You get the contribution of those to the first value(a_1). Then you do An and Ao (the "a_1" or "a_i" above). You get the "a_i" for each and sum? – Jibril May 15 '18 at 16:04
  • @Jibril well, the overall derivative will be in fact a 3D matrix. – G Cab May 15 '18 at 18:37
0

For typing convenience, define the variables $$\eqalign{ x &= \|b\|^{-1}\,b \\ \lambda &= \|a\|,\quad \lambda^2 = a^Ta,\quad \lambda\,d\lambda = a^Tda \\ }$$ Denote the cosine similarity by $\phi$ and find its differential and gradient. $$\eqalign{ \phi &= x:\Big(\frac{a}{\lambda}\Big) \\ d\phi &= x:\bigg(\frac{\lambda\,da-a\,d\lambda}{\lambda^2}\bigg) \\ &= \lambda^{-3}x:(\lambda^2\,da-a\lambda\,d\lambda) \\ &= \lambda^{-3}x:(\lambda^2\,I-aa^T)\,da \\ &= \lambda^{-3}(\lambda^2\,I-aa^T)\,x:da \\ \frac{\partial \phi}{\partial a} &= \lambda^{-3}(\lambda^2\,I-aa^T)\,x \\ &= \frac{(a^Ta)\,b-(a^Tb)\,a}{\|a\|^3\;\|b\|} \\ \\ }$$ Note that above, a colon represents the trace/Frobenius product, i.e. $$\eqalign{A:B = {\rm Tr}(A^TB)}$$ and is equivalent to the dot product when both arguments are vectors.

But when mixing matrix-vector and vector-vector products in the same expression, e.g. $$ Ax:b \quad{\longleftrightarrow}\quad (A\cdot x) \cdot b \quad{\longleftrightarrow}\quad b\cdot A\cdot x $$ the Frobenius product is both clearer and easier to type.

Even purely vector expressions are quicker to type this way $$ a:a \quad{\longleftrightarrow}\quad a^Ta \quad{\longleftrightarrow}\quad a\cdot a $$

greg
  • 35,825