So you know that $y=Ae^{bx}$, and that's all you know. You have a whole bunch of data $(x_1,y_1), (x_2,y_2), \ldots, (x_N,y_N)$ and you can plot these points on a graph, and then you see that they lie on an exponential curve, but how do you get the values of $A$ and $b$?
What if we took the log of both sides: $\ln(y) = \ln(A) + bx$. But wait! This is our familiar equation of the line: output equals the slope times the input plus a constant. In this case, $x$ is the input, $\ln(y)$ is the output, $b$ is the slope, and $\ln(A)$ is the constant. So now all we have to do is fit points to a line, and we can do that.
How do we find the relevant quantities? We could graph the points by hand, we'd see that they almost fall on a line, then we could take a ruler and try to draw a line that goes through most of the points. This would do fairly well, but it probably wouldn't be the "best" solution in any way that we would care about.
Alternatively, we could solve the following problem:
\begin{align}
\text{minimize}\hspace{8pt}\sum_i \left(\ln(y_i) - (\ln(A)+bx_i)\right)^2.
\end{align}
This is called the "least squares problem" because we are minimizing the difference between the points we known and our model, squared. If we think of this difference as the error, then we're minimizing the sum of the errors squared:
\begin{align}
\text{minimize}\hspace{8pt}\sum_i \text{error}_i^2
\end{align}
where $\text{error}_i = \left(\ln(y_i) - (\ln(A)+bx_i)\right)^2$.
But what's happened? Consider any one point of our data, and think of the error in the original exponential plot. The distance between the point and the curve, or the error, is not the same as the distance between the point and the line in the new plot where we plot $\ln(y)$ versus $x$. And, as stated in the referred article, the smaller values of $y$ will matter more than the larger values of $y$.
This may be ok. For example, if you knew that you cared about the smaller values of $y$ more than the larger values of $y$, then this might be exactly what you want. But what if it's not?
The key is to realize that the function we're minimizing was chosen arbitrarily. If you know that you care about the larger values of $y$ as much as the smaller values, then you can alter the minimization function as follows:
\begin{align}
\text{minimize}\hspace{8pt}\sum_i w_i \left( \ln(y_i)-(\ln(A)+bx_i)\right)^2.
\end{align}
What did you just do? You created a set of weights $w$ that change how much you care about different terms in the sum. And if you want to give less weight to the data points with small values of $y$, you can let $w_i=y_i$.
This type of problem is called "weighted least squares". Note that this is a differentiable function, and you can solve it by taking derivatives and setting them equal to 0.
For many problems of engineering, determining weights can be the difference between a solution that works and one that doesn't.