Suppose you have a cost function $C(x) = \frac{1}{2}(y - a)^2$ where $y$ is the desired output and $a$ is an activation. There is only one training example of $x = 1$ where the desired output $y = -5$. Moreover, $a = wx$ where $w$ denotes the weight given to the training example.
This is supposed to be a really simple example. We can immediately see that $w = -5$ since $a = -5 * 1 = -5$ which is what we want for the desired output $y$.
I learned that the 'standard version' of gradient descent is calculated as:
$$w \leftarrow w - \eta\frac{\partial C}{\partial w}$$
where $\eta$ is a small number such as 0.01 and $\frac{\partial C}{\partial w} = y - a$.
Simplifying yields:
$$w \leftarrow w - \eta(y - a)$$
We do minus because we want to minimize the error.
There is one issue when I do this. In my algorithm I'm not normalizing weights, so they can be a negative value or positive value. If $w$ such that it is bigger than 0, then I get values that tend to maximize the error as opposed to minimize it.
My question: why does $w$ tend to go upward when $w$ is initialized bigger than 0 (hence maximizing the error, since $y = -5$) and downward when $w$ is smaller than 0 (hence minimizing the error, since $y = -5$, which is what we want)?
Edit: by making a minimized code example I realized what I did wrong. I'm letting it stand as is. I'm not sure if this question should still be here since it only marginally has something to do with negative weights. The mistake was thinking that:
$$\frac{\partial C}{\partial w} = \frac{\partial C}{\partial a}\frac{\partial a}{\partial w} = (y - a)w$$
It went wrong with positive or negative weights because I wasn't multiplying by $\frac{\partial a}{\partial w}$.
Here is the correct code which can be called via node
:
let cost = (activation, desiredValue) => 0.5 * (activation - desiredValue) ** 2
function startCalcGradientDescent(w, x){
let eta = 0.1
let y = -5
console.log(`start`)
return function calcGradientDescent(){
a = x * w
c = cost(a, y)
let error = (a - y) * x //∂C/∂a = (a−y)
w = w - eta * error
if(c <= 0.001) return
console.log(`c, w, a, error`)
console.log(c, w, a, error)
setTimeout(() => {
calcGradientDescent()
}, 100)
}
}
// startCalcGradientDescent(1, 1)()
// startCalcGradientDescent(-4, 1)()
// startCalcGradientDescent(1, -1)()
// startCalcGradientDescent(-4, -1)()