Questions tagged [gradient-descent]

"Gradient descent is a first-order optimization algorithm. To find a local minimum of a function using gradient descent, one takes steps proportional to the negative of the gradient (or of the approximate gradient) of the function at the current point."

Gradient descent is based on the observation that if the multi-variable function $F(x)$ is defined and differentiable in a neighborhood of a point $a$ , then $F(x)$ decreases fastest if one goes from $a$ in the direction of the negative gradient of $F$ at $a$, $-\nabla F(a )$. It follows that, if

$$a_{n+1}=a_n-\gamma \nabla F(a_n)$$

for positive $\gamma$ that is small enough, then $F(a_n) \ge F(a_{n+1})$.

1029 questions
5
votes
1 answer

Why is the optimizing algorithm called "momentum"?

The Gradient Descent with Momentum method for optimizing parameters is widely used. It has many variations including famous Adam algorithm. Now I do understand them, but where does the word "momentum" come from? I don't see any relation to the…
Neo
  • 251
  • 1
  • 7
4
votes
1 answer

In gradient descent, how do we know we found the smallest minimum?

Let's say we start at a wrong "hill" (the highest one), it will find the minimum in that area, but not the whole area? To me it seems, at the moment, that it depends a lot on what m and b we choose at the start.
hey
  • 209
4
votes
2 answers

The direction of the gradient vector.

The direction of the gradient is along the direction of the maximum ascent, or along the directional derivative that has by the most positive value, but when talking about level surfaces, it is agued that gradient is always perpendicular to the…
3
votes
1 answer

Is gradient both the path to steepest decent and perpendicular to potential field?

Imagine a function that map (x,y) -> z Imagine if we draw a bunch of curve where the value of z is the same. So that z is "potential field". The gradient is the steepest way to reach bottom. I found such pictures Now, I am a bit confused with the…
user4951
  • 1,714
3
votes
1 answer

Why is gradient describe the steepest ascent direction and not the steepest descent

I have checked all over the internet and I cannot find why is gradient shows you the steepest ascent and not the steepest descent How can we proof that?
3
votes
1 answer

Gradient descent to solve nonlinear systems

I was reading the Wikipedia page for gradient descent, but I don't understand how the objective function: Can be used to solve for $x_1, x_2,x_3$ as the objective function seems a bit arbitrary and I don't see how minimizing it will give the…
2
votes
1 answer

Looking for two separate functions with intersecting point and equal gradient

I want to explain a disadvantage of Gradient Descent where the gradient itself doesn't give information about how far we are away from the local/global minimum. Say we have two functions with an intersecting point that has the same gradient for both…
oezguensi
  • 143
2
votes
1 answer

Gradient Descent: Cost Function

I'm trying to implement the gradient descent method for the problem of minimising the following function: $$f(x) = \frac{1}{2}(x-m)^{T}A(x-m)-\sum\limits_{i=1}^n\log\left(x_i^{2}\right),$$ where $x \in R^n$ is a vector; $m \in R^n$ is a fixed…
Barton
  • 67
2
votes
1 answer

Understanding Subgradient?

I have seen in some literature that the derivative of the $l_1$-norm is represented by $sgn (.)$ function. I know , in general , the $l_1$-norm is not differentiable and therefore talking about gradient doesn't make sense. In fact, what we are…
fery
  • 94
1
vote
1 answer

Projected gradient descent, why project at every iteration?

The way I have been presented gradient descent at least from Levitin and Polyak is that you do the gradient step: $\theta_{t+1} = \theta_t - \eta_t \nabla_t(\theta_t)$ and then afterwards you project to your convex set $C$ after your gradient step:…
1
vote
0 answers

How to prove the convergence of the SGD algorithm?

As we all know, the iterative process of the SGD algorithm is: $x^{k}=x^{k-1}-\alpha_{k}\nabla f_{ik}(x^{k-1})$ And we let $f(x)=\frac{1}{N}\sum_{i=1}^{N} f_{i}(x)$, where each $f_{i}(x)$ is a differentiable function, and $f(x)$ is the gradient…
1
vote
1 answer

Some questions about gradient descent method.

I want to make sure that I understand a Gradient descent method correctly. Let's say, there is a optimization problem $f = x^2+y^2 \rightarrow min$. I randomly choose the estimate of the minimum - $(0;0)$. Then I differentiate a function $df = <2x,…
user
  • 1,412
1
vote
0 answers

Which event causes such a kink in gradient descent?

I'm running gradient descent on a continuous function and I observe this pattern: What can cause such a sudden kink? Why does the Loss keep increasing after it? I understand issues related to a learning rate that is too large, but this does not…
Ziofil
  • 1,590
1
vote
1 answer

Does gradient descent really take you down the steepest path?

So this is more of a yes/no question. I'm not sure where/how to ask simple questions on stack. Anyways, the definition of gradient descent says gradient descent takes you down the steepest path. Assuming we always move towards the minimum, is…
confused
  • 407
1
vote
0 answers

Additive Gradient Descent with negative weights error tends to be maximized (MSE) -- solved

Suppose you have a cost function $C(x) = \frac{1}{2}(y - a)^2$ where $y$ is the desired output and $a$ is an activation. There is only one training example of $x = 1$ where the desired output $y = -5$. Moreover, $a = wx$ where $w$ denotes the…
1
2 3