The number of functions $f:A \to B$, where $|A| = m$ and $|B| = n$ is $n^m$ since for each of the $m$ elements in the domain, there are $n$ choices in the codomain to which the element can be mapped.
There are $\binom{n}{k}$ ways to select $k$ elements in the codomain so that they will not be in the range, leaving at most $n - k$ elements in the range. Since there are $(n - k)^m$ functions from a set with $m$ elements to a set with $n - k$ elements, applying the Inclusion-Exclusion Principle yields
\begin{align*}
n^m - & \binom{n}{1}(n - 1)^m + \binom{n}{2}(n - 2)^m + \cdots + (-1)^{n - 1}\binom{n}{n - 1}[n - (n - 1)]^m + (-1)^n\binom{n}{n}(n - n)^m\\
& = \sum_{k = 0}^{n} (-1)^{n - k}\binom{n}{k}(n - k)^m
\end{align*}
for the number of surjective functions $f:A \to B$, where $|A| = m$, $|B| = n$, and $m \geq n$.
Note: You asked why subtracting $\binom{n}{1}(n - 1)^m$ from the total number of functions has the effect of eliminating those functions with $n - 2$ elements in the range twice.
Suppose that $B = \{b_1, b_2, \ldots, b_n\}$.
Remember that $(n - 1)^m$ is the number of functions from set $A$ to at most $n - 1$ elements in set $B$. We subtract $(n - 1)^m$ to eliminate those functions for which $b_1$ is not in the range, another factor of $(n - 1)^m$ to eliminate those functions for which $b_2$ is not in the range, and so forth. Since there are $n$ ways to exclude one element from the range, we subtract $\binom{n}{1}(n - 1)^m = n(n - 1)^m$ functions to eliminate those with at most $n - 1$ elements in the range.
Since we subtract once to eliminate those functions in which $b_1$ is not in the range and once to eliminate those functions in which $b_2$ is not in the range, if a function has neither $b_1$ nor $b_2$ in the range, we have subtracted it twice. This is why we must add the number of functions with at most $n - 2$ elements in the range in order to count them.
Edit: In answer to your additional question: When we subtract $\binom{n}{1}(n - 1)^m$ to remove functions with at most $n - 1$ elements in the range, we subtract functions with at most $n - 3$ elements in the range three times. For instance, we remove functions for which none of $b_1, b_2, b_3$ is in the range when we remove $b_1$ from the range, when we remove $b_2$ from the range, and when we remove $b_3$ from the range. However, when we add $\binom{n}{2}(n - 2)^m$ to account for the functions with at most $n - 2$ elements in the range that we have subtracted twice, we add those functions back three times. For instance, we count a function whose range contains neither $b_1$ nor $b_2$ nor $b_3$ when we add functions that contain neither $b_1$ nor $b_2$, neither $b_1$ nor $b_3$, and neither $b_2$ nor $b_3$. Since we have both added and subtracted those functions with at most $n - 3$ elements in the range three times, we have not yet eliminated those functions with at most $n - 3$ elements in the range. Therefore, we must subtract $\binom{n}{3}(n - 3)^m$ to eliminate those functions with at most $n - 3$ elements in the range.