finding minimum of function with 2 independant variables

Question

I've got a project for school where I need to find the distance from a point (input) to a specific function which is dependent on x,y. The function looks like this:

$z = kz e^{-2\sqrt{ \left(\frac{x}{kx}\right)^2 + \left(\frac{y}{ky}\right)^2 } } \cdot \cos(\sqrt{ (x\frac{xwdt}{kx})^2 + (y\frac{ywdt}{ky})^2} - 2\pi f t)$

$kz, ky, kx, xwdt, ywdt$ & $f$ are constants and $t$ is an input making it constant too.

The function for distance in 3D-space is $\sqrt{(x-x_2)^2 + (y-y_2)^2 + (z-z_2)^2}$ so the total function should look like this:

$ d = \\ \sqrt{ (x-x_2)^2 + (y-y_2)^2 + (z_2- (kz * e^{-2\sqrt{ (\frac{x}{kx})^2 + (\frac{y}{ky})^2 } } \cdot \cos(\sqrt{ (x\frac{xwdt}{kx})^2 + (y\frac{ywdt}{ky})^2} - 2\pi f t)))^2 }$

My input point has the coords $x_2, y_2$ & $z_2$, so they are constant too, but this still leaves me with a function dependant on $x$ & $y$. Is it possible to find the global minimum of a function in 3D?

I tried looking things up but I was only confused by trying to understand mathematical optimization: https://en.wikipedia.org/wiki/Mathematical_optimization

I first asked this question on Stack Overflow as I was more interested in functional code than understanding the actual solution but I have no clue how the code works so I wanted to reask the question here. My previous question on Stack Overflow: https://stackoverflow.com/questions/44352197/finding-global-minima-in-3d-in-c/44354160?noredirect=1#comment75731850_44354160

Thanks in advance

First of all, please use MathJax as this is near impossible to read. Second, look up Lagrange multipliers for optimization in higher dimensions. — Bobson Dugnutt, Jun 09 '17 at 09:13
Do you know about partial derivatives? The minimum will occur at a point where the partial derivatives are zero. — Gerry Myerson, Jun 09 '17 at 09:16
A point of vocabulary: you are looking to the distance (geometric concept) of a point to a curve (geometric object) not to a function (algebraic/analytic object) — Jean Marie, Jun 09 '17 at 09:33
I think your problem should be considered at the light of so called "parallel curves" (or offset curves). Have a look at the answer I just made in (https://math.stackexchange.com/q/2297568) and the curves herein. — Jean Marie, Jun 09 '17 at 09:39
@Lovsovs Thank you, I'm used to coding so I don't really have a problem with reading maths in line but it is true that is much more readable in MathJax. As for Lagrange multipliers: I thought for 3D you would use mathematical optimisation instead. At least that's what I thought to have found out. — olive, Jun 09 '17 at 11:54
@GerryMyerson Does the minimum always occure at a point where the partial derivatives are zero? — olive, Jun 09 '17 at 11:54
@JeanMarie What exactly is the difference? I had a look at your answer too but to be honest I could neither understand a thing nor see the relation to my problem. This can however be due to my lack of knowledge in this field of maths. — olive, Jun 09 '17 at 11:55
The idea is to invert the problem. Prior to any other computation, you compute the curves associated with the initial curve (in the answer it is a parabola) ; Let us take take a geopolitical comparison : you create a set of graded of "territorial waters" : the inside of red curve is the land. The first external curve is the limit of territorial waters, at 10km, then at 20 km, then at... that you can refine as you want. When you are given a point , you choose the corresponding level curve. — Jean Marie, Jun 09 '17 at 12:04
@JeanMarie So I calculate multiple curves to then choose the apropriate when I have my given input point? Wouldn't that need huge amounts of memory to compute? — olive, Jun 09 '17 at 12:21
No, you need only to precompute a distance map, which is very classical in image processing (almost instantly obtained for an image $1000 \times 1000$ for example in Matlab (function bwdist) using very efficient algorithms for that, in particular Danielsson algorithm. (http://mathworks.com/help/images/ref/bwdist.html) — Jean Marie, Jun 09 '17 at 12:49
If the function is "smooth" (that is, if it has enough derivatives, and your function should qualify), olive, then the extreme values (maximum and minimum, if any) will occur at points where the partial derivatives are zero. This will be in any Calculus textbook that does functions of several variables, also probably thousands of places on the internet, if you do a search. — Gerry Myerson, Jun 09 '17 at 13:05
@GerryMyerson I have actually done quite some research but I have not managed to actually understand mathematical optimisation. I found the information that you have a minimum if both deriatives are 0 but I could not find out if there are other possibilities for the curve to be at aminimum. — olive, Jun 09 '17 at 13:16
If both partial derivatives are zero, you might have a (local) minimum, or you might have a maximum, or you might have neither (you might have a saddle point). But if a sufficiently differentiable function of two variables has a minimum, then the derivatives there are zero. — Gerry Myerson, Jun 09 '17 at 13:21

score 0 · Answer 1 · answered Jun 09 '17 at 19:21

I shall attempt to write this answer in a way that can be generalized to other similar situations, in the hopes that others trying to solve a similar problem can use this as a rough outline for finding the/a solution.

The problem is to find the global minimum distance between $\left( x , y , z(x, y) \right)$ and a fixed point $( x_2 , y_2 , z_2 )$, where $$z(x, y) = k_z e^{-2 \sqrt{ \left( \frac{x}{k_x} \right)^2 + \left( \frac{y}{k_y} \right)^2 } } \cos \left( \sqrt{ x_w^2 \left( \frac{x}{k_x} \right)^2 + y_w^2 \left( \frac{y}{k_y} \right)^2 } - 2 \pi f t \right ) \tag{1}\label{1}$$

Because the distance is nonnegative, we can actually minimize the squared distance instead, $g(x,y)$: $$g(x,y) = (x - x_2)^2 + (y - y_2)^2 + \left[ k_z e^{-2 \sqrt{ \left( \frac{x}{k_x} \right)^2 + \left( \frac{y}{k_y} \right)^2 } } \cos \left( \sqrt{ x_w^2 \left( \frac{x}{k_x} \right)^2 + y_w^2 \left( \frac{y}{k_y} \right)^2 } - 2 \pi f t \right ) - z_2 \right ]^2 \tag{2}\label{2}$$

Equation $\eqref{2}$ is very complicated, so at this point, we probably should see if we can simplify it. One easy way is to temporarily switch to coordinates scaled by $k_x$ and $k_y$, i.e. $$\begin{cases}u = \frac{x}{k_x} \\ v = \frac{y}{k_y} \end{cases} \tag{3}\label{3} \qquad \iff \qquad \begin{cases} x = k_x u \\ y = k_y v \end{cases}$$ so that equation $\eqref{2}$ simplifies to $$g(u, v) = ( k_x u - x_2 )^2 + ( k_y v - y_2 )^2 + \left ( k_z e^{-2\sqrt{u^2 + v^2}} \cos\left(\sqrt{x_w^2 u^2 + y_w^2 v^2} - 2 \pi f t \right) - z_2 \right )^2 \tag{4}\label{4}$$ Such coordinate changes are often necessary to get the function to be minimized into a (more) tractable form.

Yet, there is no formula we could use that would directly yield the $(u, v)$ that minimizes equation $\eqref{4}$.

We do know that for a real-valued differentiable function, like equation $\eqref{4}$ above, the global minimum is one of the stationary points, where the gradient is zero.

The gradient of $g(u,v)$ is $$\nabla g(u, v) = \left ( \frac{\partial g(u, v)}{\partial u} ,\, \frac{\partial g(u, v)}{\partial v} \right ) = \left ( g_u (u,v) ,\, g_v (u,v) \right ) \tag{5}\label{5}$$ The rightmost form in equation $\eqref{5}$ is just alternate, shorter notation for the partial derivatives. In this case, Maple tells us that $$\begin{array}{rl} g_u (u,v) = & 2 k_x ( k_x u - x_2 ) \; - \\ \, & 2 k_z u e^{-2\sqrt{u^2 + v^2}} \; \times \\ \, & \left ( k_z e^{-2\sqrt{u^2 + v^2}} \cos\left(\sqrt{x_w^2 u^2 + y_w^2 v^2} - 2 \pi f t\right) - z_2 \right ) \; \times \\ \, & \left ( \frac{x_w^2}{\sqrt{x_w^2 u^2 + y_w^2 v^2}} \sin\left(\sqrt{x_w^2 u^2 + y_w^2 v^2} - 2 \pi f t\right) \right. \; + \\ \, & \; \left. \; \frac{2}{\sqrt{u^2 + v^2}} \cos\left(\sqrt{x_w^2 u^2 + y_w^2 v^2} - 2 \pi f t\right) \right) \end{array} \tag{6}\label{6}$$ and $$\begin{array}{rl} g_v (u,v) = & 2 k_y ( k_y v - y_2 ) \; - \\ \, & 2 k_z v e^{-2\sqrt{u^2 + v^2}} \; \times \\ \, & \left ( k_z e^{-2\sqrt{u^2 + v^2}} \cos\left(\sqrt{x_w^2 u^2 + y_w^2 v^2} - 2 \pi f t\right) - z_2 \right ) \; \times \\ \, & \left ( \frac{y_w^2}{\sqrt{x_w^2 u^2 + y_w^2 v^2}} \sin\left(\sqrt{x_w^2 u^2 + y_w^2 v^2} - 2 \pi f t\right) \right. \; + \\ \, & \; \left. \; \frac{2}{\sqrt{u^2 + v^2}} \cos\left(\sqrt{x_w^2 u^2 + y_w^2 v^2} - 2 \pi f t\right) \right) \end{array} \tag{7}\label{7}$$ Note that equations $\eqref{6}$ and $\eqref{7}$ have many common terms, so in numerical computations, you probably want to evaluate both in the same function for efficiency. (If you normalize the gradient to unit length, you can omit the common factor $2$, above.)

Again, there is no formula to find $(u, v)$ that yield $g_u (u, v) = 0$ and $g_v (u, v) = 0$; we must find the stationary points numerically.

Fortunately, there is a very well known method, called gradient descent, that we can use to find a stationary point near some starting point. We do need to know $g(u, v)$ (equation $\eqref{4}$), $g_u (u,v)$ (equation $\eqref{6}$), and $g_v (u,v)$ (equation $\eqref{7}$) to use gradient descent, though.

The basic idea underlying the gradient descent method is simple. We start at some point, with an initial step length. If we can, we take a step opposite to the gradient, i.e. "downwards". If not, we try with shorter steps. We repeat this until either the gradient is sufficiently close to zero, or the step size becomes too short.

If we have pseudocode functions

Function f(x, y):
    Return the value of the function at (x, y)

Function dfdx(x, y):
    Return the x component of the gradient of f() at (x, y)

Function dfdy(x, y):
    Return the y component of the gradient of f() at (x, y)

then we can write the gradient descent as

Function find_minimum(x, y, s, smin, dzero):
    # (x, y) is the initial starting point
    #  s     is the initial step length
    #  smin  is the minimum step length
    #  dzero is the maximum gradient vector length
    #        that is still considered zero

    # We start at x, y, with no preceding step.
    dx = 0
    dy = 0

    While (s >= smin):

        # Accept/take the previously found step.
        x = x + s * dx
        y = y + s * dy

        # Find out the value and gradient at this point.
        f0 = f(x, y)
        dx = dfdx(x, y)
        dy = dfdy(x, y)

        # Is the gradient vector close enough to zero
        # to consider this point a stationary point?
        d = sqrt( dx*dx + dy*dy )
        If (d <= dzero):
            Break out of the while loop

        # Normalize gradient to unit length.
        # We negate the direction, because we walk
        # down-gradient, not up-gradient.
        dx = -dx / d
        dy = -dy / d

        # Find an acceptable step length to take.
        # Note: we do not accept a step that does
        #       not get us to a lower f(), because
        #       if we did, we could end up bouncing
        #       in an orbit around the local minimum!
        While (s >= smin) AND
              (f(x+s*dx, y+s*dy) >= f0):
            # Try a shorter step.
            s = 0.75 * s
        End While

    End While

    Return (x, y)
End Function

There are variations, of course. In particular, the innermost While loop could be replaced with more complex code, that tries longer steps rather than shorter ones if the initial step is already acceptable. In that case, the caller should also provide the maximum step length (instead of just initial step length), so that the steps do not accidentally become too long, jumping well over stationary points.

So, the gradient descent function will find a local minimum. However, we need to find all of them. We need some kind of a strategy in selecting the starting points (and step lengths).

Any strategy where we start at least one gradient descent walk within each region containing exactly one minimum, with an initial step size shorter than the minimum size of that region, will work.

Because all minima, maxima, and saddle points occur at stationary points, the regions around local minima are separated by stationary points. (If you were to draw a map of the function $g(u,v)$, with contour lines where $g_u (u,v) = 0$ and $g_v (u,v) = 0$, you would have an explicit map of the regions, with a point or curve or area at local minima and maxima.)

To determine how we should distribute our gradient descent starting points and initial step sizes, let's examine equation $\eqref{4}$ to see how it behaves; and equations $\eqref{6}$ and $\eqref{7}$ to see where they change signs (i.e. cross zero).

The first two terms in $\eqref{4}$ define an elliptic paraboloid centered at $u = \frac{x_2}{k_x}$, $v = \frac{y_2}{k_y}$, where the first two terms are zero. Thus, that point makes for an interesting candidate.

The third term in $\eqref{4}$ is more complicated: it is the squared difference between $z_2$ and equation $\eqref{1}$. The exponent term defines a normal distribution as a function of distance from origin $(0, 0)$, to which the cosine term causes "ripples" to; the argument to the cosine term is the distance to the origin in coordinates scaled by $x_w$ and $y_w$, plus some constant $- 2 \pi f t$.

Note that because all three summands in equation $\eqref{4}$ are nonnegative (are zero or positive), as soon as we have found our first local minimum, say $g_0$ (perhaps found by starting at $(0,0)$, since that is always an interesting point to examine?), we can limit the search region $(u, v)$ to an elliptical region around $( x_2 / k_x ,\, y_2 / k_y )$, by $$( k_x u - x_2 )^2 + ( k_y v - y_2 )^2 \le g_0$$ since all points $(u, v)$ outside that region have $g(u, v) \gt g_0$.

Personally, I would restrict the search to a rectangular region that encloses the ellipse: $$\frac{x_2}{k_x} - \sqrt{\frac{g_0}{k_x^2}} \le u \le \frac{x_2}{k_x} + \sqrt{\frac{g_0}{k_x^2}}$$ $$\frac{y_2}{k_y} - \sqrt{\frac{g_0}{k_y^2}} \le v \le \frac{y_2}{k_y} + \sqrt{\frac{g_0}{k_y^2}}$$ since it is easy to use a regular rectangular grid for the gradient descent starting points, although it does cover about 27% larger area than the ellipse.

The cosine term in equation $\eqref{4}$ tells us how we should space the starting points of the gradient descent, to find all local minima. The distance along $u$ between cosine term extrema is $\pi/x_w$, and the distance along $v$ between cosine term extrema is $\pi/y_w$; these give us the grid spacing for the gradient descent starting points (although I personally would use somewhat smaller spacing, more grid points, to ensure I do find all minima). A suitable initial step size is perhaps one fourth of the diagonal, or $0.25 \pi / \sqrt{x_w^2 + y_w^2}$.

Note that my example code to the original programming question does not implement the above exactly. If I had worked out the math first, as I did in this answer, I would have written the code basically as I outline in this answer. Because I wrote the example code first, it is not nearly as illustrative as this answer is, and may contain mathematical bugs. This is a perfect example of why one should examine the problem at the mathematical and algorithmic level first -- something I attempted in this answer -- before writing any code; that way you know much better what you are doing, and whether it will/may/can solve the original problem.

finding minimum of function with 2 independant variables

1 Answers1