I am writing a bachelor's thesis on a machine learning topic involving Generalized Learning Vector Quantization (GLVQ). Most paper I read have a very brief explanation on the mathematics concerning this topic. I want to deeply understand and also show the mathematics behind it. I am almost through the process, but one step is unclear to me.
Let $\textbf{x}_i$ be an inputvector, $\textbf{w}^\pm$ a prototype related to the distance $d^\pm$. Let $$\mu(\textbf{x}) = \frac{d^+-\;d^-}{d^++\;d^-}$$ be a function determining the proximity of an inputvector to the prototypes, where $d^+$ gives back the distance to the nearest prototype $\textbf{w}^+$of the same class and $d^-$ to the nearest prototype $\textbf{w}^-$ of a different class.
Let the Costfunction be defined as $$S = \sum^{N}_{i=1}f(\mu(\textbf{x}_i))$$
We now want to do the partial derivative on $m^\pm$ over S, thus $$\frac{\partial S}{\partial \textbf{m}^{\pm}} = \frac{\partial S}{\partial f} \frac{\partial f}{\partial \mu} \frac{\partial \mu}{\partial d^{\pm}} \frac{\partial d^{\pm}}{\partial \textbf{m}^{\pm}}$$
Let $f$ be any monotone growing function (usually sigmoid, but it will remain undeclared here). Let $\textbf{x}_m$ be the Vector which gets $\textbf{m}^+$ as closest prototype from the same class.
Now we get to the derivation.
$$S = f(\mu(\textbf{x}_1)) + \cdots + f(\mu(\textbf{x}_m)) + \cdots + f(\mu(\textbf{x}_N))$$ Since we are only interested in the local stochastical gradient descent, we continue working with $f(\mu(\textbf{x}_m))$
$$ \begin{align} \frac{\partial S}{\partial \textbf{m}^+} &= \frac{\partial f(\mu(\textbf{x}_m))}{\partial\textbf{m}^+}\\ &= \frac{\partial f}{\partial \mu} \frac{\partial \mu}{\partial \textbf{m}^+} \end{align}$$
$$\require{cancel}\begin{align} \frac{\partial \mu}{\partial \textbf{m}^+} &= \cfrac{ \partial\cfrac{d^+(\textbf{x}_m, \textbf{m}^+) - d^-(\textbf{x}_m, \textbf{m}^-)} {d^+(\textbf{x}_m, \textbf{m}^+) + d^-(\textbf{x}_m, \textbf{m}^-)}} {\partial \textbf{m}^+} \\ &= \frac{(d^+ - d^-)' \cdot (d^+ + d^-) - (d^+ - d^-) \cdot (d^+ + d^-)'}{(d^+ + d^-)^2} \nonumber \\ \nonumber \\ &= \frac{(d^{\prime +} + \cancel{-d^-}) \cdot (d^+ + d^-) - (d^+ - d^-) \cdot (d^{\prime +} + \cancel{d^-})}{(d^+ + d^-)^2} \nonumber \\ \nonumber \\ &= \frac{d^{\prime +} \cdot \Big((\cancel{d^+} + d^-) - (\cancel{d^+} - d^-)\Big)}{(d^+ + d^-)^2} \nonumber \\ \nonumber \\ &= \frac{d^{\prime +} \cdot 2d^-}{(d^+ + d^-)^2} \nonumber \\ \nonumber \\ &= \cfrac{\cfrac{\partial d^+}{\partial \textbf{m}^+} \cdot 2d^-}{(d^+ + d^-)^2} \ \end{align}$$
Now as a last step I need to do the partial derivative on the distance function. Papers give this euclidean distance as a distance function.
$$d^\pm = |\textbf{x} - \textbf{m}^\pm|^2$$
So I am not sure how to proceed going on from here
$$ \cfrac{\partial d^+}{\partial \textbf{m}^+} = \frac{\partial |\textbf{x}_w - \textbf{m}^{+}|^2}{\partial \textbf{m}^+} $$
I know the solution has to be
$$\frac{\partial S}{\partial \textbf{m}^+} = \frac{\partial S}{\partial f} \frac{\partial f}{\partial \mu} \frac{\partial \mu}{\partial d^{\pm}} \frac{\partial d^{\pm}}{\partial \textbf{m}^{\pm}} = -\frac{\partial f}{\partial \mu} \frac{4d^-}{(d^+ + d^-)^2}(\textbf{x}_w - \textbf{m}^+)$$