I shall attempt to write this answer in a way that can be generalized to other similar situations, in the hopes that others trying to solve a similar problem can use this as a rough outline for finding the/a solution.
The problem is to find the global minimum distance between $\left( x , y , z(x, y) \right)$ and a fixed point $( x_2 , y_2 , z_2 )$, where
$$z(x, y) = k_z e^{-2 \sqrt{ \left( \frac{x}{k_x} \right)^2 + \left( \frac{y}{k_y} \right)^2 } } \cos \left( \sqrt{ x_w^2 \left( \frac{x}{k_x} \right)^2 + y_w^2 \left( \frac{y}{k_y} \right)^2 } - 2 \pi f t \right ) \tag{1}\label{1}$$
Because the distance is nonnegative, we can actually minimize the squared distance instead, $g(x,y)$:
$$g(x,y) = (x - x_2)^2 + (y - y_2)^2 + \left[ k_z e^{-2 \sqrt{ \left( \frac{x}{k_x} \right)^2 + \left( \frac{y}{k_y} \right)^2 } } \cos \left( \sqrt{ x_w^2 \left( \frac{x}{k_x} \right)^2 + y_w^2 \left( \frac{y}{k_y} \right)^2 } - 2 \pi f t \right ) - z_2 \right ]^2 \tag{2}\label{2}$$
Equation $\eqref{2}$ is very complicated, so at this point, we probably should see if we can simplify it. One easy way is to temporarily switch to coordinates scaled by $k_x$ and $k_y$, i.e.
$$\begin{cases}u = \frac{x}{k_x} \\ v = \frac{y}{k_y} \end{cases} \tag{3}\label{3} \qquad \iff \qquad \begin{cases}
x = k_x u \\ y = k_y v \end{cases}$$
so that equation $\eqref{2}$ simplifies to
$$g(u, v) = ( k_x u - x_2 )^2 + ( k_y v - y_2 )^2 + \left ( k_z e^{-2\sqrt{u^2 + v^2}} \cos\left(\sqrt{x_w^2 u^2 + y_w^2 v^2} - 2 \pi f t \right) - z_2 \right )^2 \tag{4}\label{4}$$
Such coordinate changes are often necessary to get the function to be minimized into a (more) tractable form.
Yet, there is no formula we could use that would directly yield the $(u, v)$ that minimizes equation $\eqref{4}$.
We do know that for a real-valued differentiable function, like equation $\eqref{4}$ above, the global minimum is one of the stationary points, where the gradient is zero.
The gradient of $g(u,v)$ is
$$\nabla g(u, v) = \left ( \frac{\partial g(u, v)}{\partial u} ,\, \frac{\partial g(u, v)}{\partial v} \right ) = \left ( g_u (u,v) ,\, g_v (u,v) \right ) \tag{5}\label{5}$$
The rightmost form in equation $\eqref{5}$ is just alternate, shorter notation for the partial derivatives. In this case, Maple tells us that
$$\begin{array}{rl}
g_u (u,v) = & 2 k_x ( k_x u - x_2 ) \; - \\
\, & 2 k_z u e^{-2\sqrt{u^2 + v^2}} \; \times \\
\, & \left ( k_z e^{-2\sqrt{u^2 + v^2}} \cos\left(\sqrt{x_w^2 u^2 + y_w^2 v^2} - 2 \pi f t\right) - z_2 \right ) \; \times \\
\, & \left ( \frac{x_w^2}{\sqrt{x_w^2 u^2 + y_w^2 v^2}} \sin\left(\sqrt{x_w^2 u^2 + y_w^2 v^2} - 2 \pi f t\right) \right. \; + \\
\, & \; \left. \; \frac{2}{\sqrt{u^2 + v^2}} \cos\left(\sqrt{x_w^2 u^2 + y_w^2 v^2} - 2 \pi f t\right) \right)
\end{array} \tag{6}\label{6}$$
and $$\begin{array}{rl}
g_v (u,v) = & 2 k_y ( k_y v - y_2 ) \; - \\
\, & 2 k_z v e^{-2\sqrt{u^2 + v^2}} \; \times \\
\, & \left ( k_z e^{-2\sqrt{u^2 + v^2}} \cos\left(\sqrt{x_w^2 u^2 + y_w^2 v^2} - 2 \pi f t\right) - z_2 \right ) \; \times \\
\, & \left ( \frac{y_w^2}{\sqrt{x_w^2 u^2 + y_w^2 v^2}} \sin\left(\sqrt{x_w^2 u^2 + y_w^2 v^2} - 2 \pi f t\right) \right. \; + \\
\, & \; \left. \; \frac{2}{\sqrt{u^2 + v^2}} \cos\left(\sqrt{x_w^2 u^2 + y_w^2 v^2} - 2 \pi f t\right) \right)
\end{array} \tag{7}\label{7}$$
Note that equations $\eqref{6}$ and $\eqref{7}$ have many common terms, so in numerical computations, you probably want to evaluate both in the same function for efficiency. (If you normalize the gradient to unit length, you can omit the common factor $2$, above.)
Again, there is no formula to find $(u, v)$ that yield $g_u (u, v) = 0$ and $g_v (u, v) = 0$; we must find the stationary points numerically.
Fortunately, there is a very well known method, called gradient descent, that we can use to find a stationary point near some starting point. We do need to know $g(u, v)$ (equation $\eqref{4}$), $g_u (u,v)$ (equation $\eqref{6}$), and $g_v (u,v)$ (equation $\eqref{7}$) to use gradient descent, though.
The basic idea underlying the gradient descent method is simple. We start at some point, with an initial step length. If we can, we take a step opposite to the gradient, i.e. "downwards". If not, we try with shorter steps. We repeat this until either the gradient is sufficiently close to zero, or the step size becomes too short.
If we have pseudocode functions
Function f(x, y):
Return the value of the function at (x, y)
Function dfdx(x, y):
Return the x component of the gradient of f() at (x, y)
Function dfdy(x, y):
Return the y component of the gradient of f() at (x, y)
then we can write the gradient descent as
Function find_minimum(x, y, s, smin, dzero):
# (x, y) is the initial starting point
# s is the initial step length
# smin is the minimum step length
# dzero is the maximum gradient vector length
# that is still considered zero
# We start at x, y, with no preceding step.
dx = 0
dy = 0
While (s >= smin):
# Accept/take the previously found step.
x = x + s * dx
y = y + s * dy
# Find out the value and gradient at this point.
f0 = f(x, y)
dx = dfdx(x, y)
dy = dfdy(x, y)
# Is the gradient vector close enough to zero
# to consider this point a stationary point?
d = sqrt( dx*dx + dy*dy )
If (d <= dzero):
Break out of the while loop
# Normalize gradient to unit length.
# We negate the direction, because we walk
# down-gradient, not up-gradient.
dx = -dx / d
dy = -dy / d
# Find an acceptable step length to take.
# Note: we do not accept a step that does
# not get us to a lower f(), because
# if we did, we could end up bouncing
# in an orbit around the local minimum!
While (s >= smin) AND
(f(x+s*dx, y+s*dy) >= f0):
# Try a shorter step.
s = 0.75 * s
End While
End While
Return (x, y)
End Function
There are variations, of course. In particular, the innermost While
loop could be replaced with more complex code, that tries longer steps rather than shorter ones if the initial step is already acceptable. In that case, the caller should also provide the maximum step length (instead of just initial step length), so that the steps do not accidentally become too long, jumping well over stationary points.
So, the gradient descent function will find a local minimum. However, we need to find all of them. We need some kind of a strategy in selecting the starting points (and step lengths).
Any strategy where we start at least one gradient descent walk within each region containing exactly one minimum, with an initial step size shorter than the minimum size of that region, will work.
Because all minima, maxima, and saddle points occur at stationary points, the regions around local minima are separated by stationary points. (If you were to draw a map of the function $g(u,v)$, with contour lines where $g_u (u,v) = 0$ and $g_v (u,v) = 0$, you would have an explicit map of the regions, with a point or curve or area at local minima and maxima.)
To determine how we should distribute our gradient descent starting points and initial step sizes, let's examine equation $\eqref{4}$ to see how it behaves; and equations $\eqref{6}$ and $\eqref{7}$ to see where they change signs (i.e. cross zero).
The first two terms in $\eqref{4}$ define an elliptic paraboloid centered at $u = \frac{x_2}{k_x}$, $v = \frac{y_2}{k_y}$, where the first two terms are zero. Thus, that point makes for an interesting candidate.
The third term in $\eqref{4}$ is more complicated: it is the squared difference between $z_2$ and equation $\eqref{1}$. The exponent term defines a normal distribution as a function of distance from origin $(0, 0)$, to which the cosine term causes "ripples" to; the argument to the cosine term is the distance to the origin in coordinates scaled by $x_w$ and $y_w$, plus some constant $- 2 \pi f t$.
Note that because all three summands in equation $\eqref{4}$ are nonnegative (are zero or positive), as soon as we have found our first local minimum, say $g_0$ (perhaps found by starting at $(0,0)$, since that is always an interesting point to examine?), we can limit the search region $(u, v)$ to an elliptical region around $( x_2 / k_x ,\, y_2 / k_y )$, by
$$( k_x u - x_2 )^2 + ( k_y v - y_2 )^2 \le g_0$$
since all points $(u, v)$ outside that region have $g(u, v) \gt g_0$.
Personally, I would restrict the search to a rectangular region that encloses the ellipse:
$$\frac{x_2}{k_x} - \sqrt{\frac{g_0}{k_x^2}} \le u \le \frac{x_2}{k_x} + \sqrt{\frac{g_0}{k_x^2}}$$
$$\frac{y_2}{k_y} - \sqrt{\frac{g_0}{k_y^2}} \le v \le \frac{y_2}{k_y} + \sqrt{\frac{g_0}{k_y^2}}$$
since it is easy to use a regular rectangular grid for the gradient descent starting points, although it does cover about 27% larger area than the ellipse.
The cosine term in equation $\eqref{4}$ tells us how we should space the starting points of the gradient descent, to find all local minima. The distance along $u$ between cosine term extrema is $\pi/x_w$, and the distance along $v$ between cosine term extrema is $\pi/y_w$; these give us the grid spacing for the gradient descent starting points (although I personally would use somewhat smaller spacing, more grid points, to ensure I do find all minima). A suitable initial step size is perhaps one fourth of the diagonal, or $0.25 \pi / \sqrt{x_w^2 + y_w^2}$.
Note that my example code to the original programming question does not implement the above exactly. If I had worked out the math first, as I did in this answer, I would have written the code basically as I outline in this answer. Because I wrote the example code first, it is not nearly as illustrative as this answer is, and may contain mathematical bugs. This is a perfect example of why one should examine the problem at the mathematical and algorithmic level first -- something I attempted in this answer -- before writing any code; that way you know much better what you are doing, and whether it will/may/can solve the original problem.