I want to explain a disadvantage of Gradient Descent where the gradient itself doesn't give information about how far we are away from the local/global minimum.
Say we have two functions with an intersecting point that has the same gradient for both functions. But assume that the point is nearer to an optimum of one function than to the other. After a gradient descent step, we would see that one function would approach the minimum better than the other. An example would be some functions like here.
For that, I am looking for two rather simple 1) 3D functions $f: \mathbb{R}² \rightarrow \mathbb{R}$ and $g: \mathbb{R}² \rightarrow \mathbb{R}$ that have an 2) intersecting point $f(a,b) = g(a,b)$, that also has the 3) same gradient $\triangledown f(a,b) = \triangledown g(a,b) $, for that given point $(a,b)$. But these points should have 4) different distances to the optimum.
I couldn't think of a way to create such functions. Can somebody give me a hint, on how to approach this problem?