0

The Problem

The gradient descent algorithm finds a minimum of a convex function, but it does not guarantee that the found minimum is the global one. I don't know if it's possible to find a minimum of a function by its derivative (assuming no, then what would be the reason for using the gradient descent algorithm). Just for clarification, the function in this whole question context is assumed to be multidimensional.

An Alternate IDEA

An alternate idea came to mind when trying to solve this problem (please don't judge strictly as I am not a professional in this sphere): Let's have a plane compared to which the given function is convex (I am almost sure this is a wrong formulation, but I hope you understand what I wanted to say). If the plane crosses with the function, let's move it down (close to the bottom) with some steps (step size can initially be taken very big, unlike the gradient descent step) until it does not cross. Then, move the plane up, downsize the step size two times, and repeat moving the plane up and down, cutting the step in half on each iteration until reaching some epsilon distance. Finally, we can find the global minimum from that point (one of the crossing points of the plane with the function, being sure that it is one of the closest points to the global minimum) using the gradient descent algorithm.

The Actual Problem

The problem with this algorithm is that I am unsure if it's always possible to determine if a plane and a function are crossing. I mean, imagine a function with 50 unknowns. Will it be possible to determine if it crosses the y=10 plane? If so, then how expensive (meaning from a time perspective) can it be to use this algorithm to find the global minimum over the gradient descent?

1 Answers1

1

If you're applying the gradient descent algorithm on a convex function, then the solution it returns is the global solution. See this answer that proves this notion

Page $7$ of this paper formally proves that the local minimum of a convex function is the global maximum. Here's the transcription of that linked proof:

Theorem 1 (Local Minimum is also a Global Minimum) Let $f\mathbb{R}^d\rightarrow\mathbb{R}$ be convex. If $x^∗$ is a local minimum of f over a convex set $D$, then $x^∗$ is also a global minimum of $f$ over a convex set $D$.

Proof: Since $D$ is a convex set, for any $y$, $y − x^∗$ is a feasible direction. Since $x^∗$ is a local minimum, for any $y ∈ D$, we can choose a small enough $\alpha > 0$, such that $$f (x^∗) ≤ f (x^∗ + \alpha(y − x^∗))$$

Furthermore, since $f$ is convex, we have

$$f (x^∗ + α(y − x^∗)) = f (αy + (1 − α)x^∗) ≤ αf (y) + (1 − α)f (x^∗)$$

Combining these, we have

$$f (x^∗) ≤ αf (y) + (1 − α)f (x^∗)$$ which implies that $f (x^∗) ≤ f (y)$. Since $y$ is an arbitrary point in $D$, this immediately proves that $x^∗$ is a global minimum

Crossing Plane

There's two ways to look at this, one can minimize the function and set $y=10$ as a constraint, and then use penalty functions to turn the model into a unconstrained one to use the gradient descent method.

Otherwise, one could substitute $y=10$ into the function and minimize that function, which would be easier and faster than the penalty method. Regardless, if the model solves with a feasible solution, then they intersect. If it returns with an infeasible solution, then they do not.

Miss Mae
  • 1,576
  • thanks for the answer, and sorry for the confusion. I meant we can find a local extremum, but am not sure that it is the minimum of the function. Please focus on the last part of the question: Is it always possible to determine if a plane and a function cross? – Artyom Vancyan Mar 27 '24 at 07:54
  • 1
    in general, you just set them equal to each other and you select points until they match (intersection). For modeling, this would just be a constraint – Miss Mae Mar 27 '24 at 08:02
  • I know this is possible for 1d, 2d, and maybe 3d functions, but what if the function has many unknowns? Is it always possible to find if they intersect, and if so, how expensive is it compared to the gradient descent algorithm from a time efficiency perspective? – Artyom Vancyan Mar 27 '24 at 08:22
  • I just edited my answer to account for this, but it shouldn't matter @ the many unknowns. If the model returns a feasible answer, they intersect; otherwise, they don't. Time efficiency is tougher to answer because that becomes more implementation, but substituting $y=10$ into the function and solving that function directly with gradient descent would be the fastest possible way – Miss Mae Mar 27 '24 at 08:26
  • If the function has N unknowns, substituting the ( y ) value and obtaining another function that does not cause errors like division by zero, etc, then it is intersecting with the plane. Did I get it right? – Artyom Vancyan Mar 27 '24 at 08:29
  • That is correct for most cases, but you're still going to need to find a solution and double check. Suppose we have the plane $y=0$ and a function $y=-x^2-1$. They do not intersect at all but substitution is still possible. Thus, the aforementioned feasible vs infeasible aspect of checking is still important – Miss Mae Mar 27 '24 at 08:39