Questions tagged [gradient-descent]

"Gradient descent is a first-order optimization algorithm. To find a local minimum of a function using gradient descent, one takes steps proportional to the negative of the gradient (or of the approximate gradient) of the function at the current point."

Gradient descent is based on the observation that if the multi-variable function $F(x)$ is defined and differentiable in a neighborhood of a point $a$ , then $F(x)$ decreases fastest if one goes from $a$ in the direction of the negative gradient of $F$ at $a$, $-\nabla F(a )$. It follows that, if

$$a_{n+1}=a_n-\gamma \nabla F(a_n)$$

for positive $\gamma$ that is small enough, then $F(a_n) \ge F(a_{n+1})$.

1029 questions

votes

1 answer

Why is the optimizing algorithm called "momentum"?

The Gradient Descent with Momentum method for optimizing parameters is widely used. It has many variations including famous Adam algorithm. Now I do understand them, but where does the word "momentum" come from? I don't see any relation to the…

gradient-descent

asked Jun 30 '21 at 09:14

Neo

votes

1 answer

In gradient descent, how do we know we found the smallest minimum?

Let's say we start at a wrong "hill" (the highest one), it will find the minimum in that area, but not the whole area? To me it seems, at the moment, that it depends a lot on what m and b we choose at the start.

gradient-descent

asked Jan 22 '21 at 16:23

hey

votes

2 answers

The direction of the gradient vector.

The direction of the gradient is along the direction of the maximum ascent, or along the directional derivative that has by the most positive value, but when talking about level surfaces, it is agued that gradient is always perpendicular to the…

gradient-descent

asked Oct 13 '19 at 05:38

zorpsoon

votes

1 answer

Is gradient both the path to steepest decent and perpendicular to potential field?

Imagine a function that map (x,y) -> z Imagine if we draw a bunch of curve where the value of z is the same. So that z is "potential field". The gradient is the steepest way to reach bottom. I found such pictures Now, I am a bit confused with the…

gradient-descent

asked May 06 '17 at 17:20

user4951

1,714

votes

1 answer

Why is gradient describe the steepest ascent direction and not the steepest descent

I have checked all over the internet and I cannot find why is gradient shows you the steepest ascent and not the steepest descent How can we proof that?

gradient-descent

asked Feb 17 '17 at 01:44

Vieri_Wijaya

votes

1 answer

Gradient descent to solve nonlinear systems

I was reading the Wikipedia page for gradient descent, but I don't understand how the objective function: Can be used to solve for $x_1, x_2,x_3$ as the objective function seems a bit arbitrary and I don't see how minimizing it will give the…

gradient-descent

asked Oct 25 '16 at 01:04

YellowPillow

votes

1 answer

Looking for two separate functions with intersecting point and equal gradient

I want to explain a disadvantage of Gradient Descent where the gradient itself doesn't give information about how far we are away from the local/global minimum. Say we have two functions with an intersecting point that has the same gradient for both…

gradient-descent

asked Dec 25 '18 at 05:15

oezguensi

votes

1 answer

Gradient Descent: Cost Function

I'm trying to implement the gradient descent method for the problem of minimising the following function: $$f(x) = \frac{1}{2}(x-m)^{T}A(x-m)-\sum\limits_{i=1}^n\log\left(x_i^{2}\right),$$ where $x \in R^n$ is a vector; $m \in R^n$ is a fixed…

gradient-descent

asked Nov 12 '18 at 20:13

Barton

votes

1 answer

Understanding Subgradient?

I have seen in some literature that the derivative of the $l_1$-norm is represented by $sgn (.)$ function. I know , in general , the $l_1$-norm is not differentiable and therefore talking about gradient doesn't make sense. In fact, what we are…

gradient-descent

asked Jul 30 '17 at 22:15

fery

vote

1 answer

Projected gradient descent, why project at every iteration?

The way I have been presented gradient descent at least from Levitin and Polyak is that you do the gradient step: $\theta_{t+1} = \theta_t - \eta_t \nabla_t(\theta_t)$ and then afterwards you project to your convex set $C$ after your gradient step:…

gradient-descent

asked Dec 14 '23 at 07:29

redrobinyum

vote

0 answers

How to prove the convergence of the SGD algorithm?

As we all know, the iterative process of the SGD algorithm is： $x^{k}=x^{k-1}-\alpha_{k}\nabla f_{ik}(x^{k-1})$ And we let $f(x)=\frac{1}{N}\sum_{i=1}^{N} f_{i}(x)$, where each $f_{i}(x)$ is a differentiable function, and $f(x)$ is the gradient…

gradient-descent

asked May 30 '21 at 10:35

vegetable

vote

1 answer

Some questions about gradient descent method.

I want to make sure that I understand a Gradient descent method correctly. Let's say, there is a optimization problem $f = x^2+y^2 \rightarrow min$. I randomly choose the estimate of the minimum - $(0;0)$. Then I differentiate a function $df = <2x,…

gradient-descent

asked May 07 '20 at 11:03

user

1,412

vote

0 answers

Which event causes such a kink in gradient descent?

I'm running gradient descent on a continuous function and I observe this pattern: What can cause such a sudden kink? Why does the Loss keep increasing after it? I understand issues related to a learning rate that is too large, but this does not…

gradient-descent

asked Mar 06 '20 at 10:04

Ziofil

1,590

vote

1 answer

Does gradient descent really take you down the steepest path?

So this is more of a yes/no question. I'm not sure where/how to ask simple questions on stack. Anyways, the definition of gradient descent says gradient descent takes you down the steepest path. Assuming we always move towards the minimum, is…

gradient-descent

asked Feb 22 '20 at 04:18

confused

vote

0 answers

Additive Gradient Descent with negative weights error tends to be maximized (MSE) -- solved

Suppose you have a cost function $C(x) = \frac{1}{2}(y - a)^2$ where $y$ is the desired output and $a$ is an activation. There is only one training example of $x = 1$ where the desired output $y = -5$. Moreover, $a = wx$ where $w$ denotes the…

gradient-descent

asked Feb 05 '19 at 14:14

Melvin Roest

2 3 Next