I’ll sketch the argument for counting the functions from $X$ onto $Y$; try to use it as a model to get the functions that map $X$ onto all of $Y$ except one point; you can then combine the two partial results.
There are altogether $n^m$ functions from $X$ to $Y$. For each $y\in Y$ we want to throw out the functions that don’t have $y$ in their ranges, i.e., the functions that map $X$ to $Y\setminus\{y\}$. For each $y\in Y$ there are $(n-1)^m$ such functions, and there are $n$ points in $Y$, so we’re throwing out $n(n-1)^m$ functions, leaving us with $n^m-n(n-1)^m$ functions. Unfortunately, if $y_1$ and $y_2$ are distinct points of $Y$, any function from $X$ to $Y$ whose range omits both $y_1$ and $y_2$ has been thrown out twice, once for $y_1$ and once for $y_2$. Thus, we’ve counted each of those functions $-1$ times: once in $n^m$, and $-2$ times in the correction term. We don’t want to count those functions at all, since they’re not onto $Y$, so we want to add them back in. There are $\binom{n}2$ pairs of elements of $Y$, and for each pair there are $(n-2)^m$ functions from $X$ to $Y$ whose ranges contain neither element of the pair, so our third approximation is
$$n^m-n(n-1)^m+\binom{n}2(n-2)^m=\binom{n}0(n-0)^m-\binom{n}1(n-1)^m+\binom{n}2(n-2)^m\;.$$
At this point you can probably guess that the final count is going to be
$$\sum_{k=0}^n\binom{n}k(-1)^k(n-k)^m\;.$$
To put this in more formal terms, for each $y\in Y$ let $\mathscr{A}_y$ be the set of functions from $X$ to $Y\setminus\{y\}$. Then $\bigcup_{y\in Y}\mathscr{A}_k$ is the set of functions from $X$ to $Y$ that are not onto $Y$, and the number of functions from $X$ onto $Y$ is therefore
$$n^m-\left|\bigcup_{y\in Y}\mathscr{A}_k\right|\;.$$
By the inclusion-exclusion principle this is equal to
$$n^m-\sum_{\varnothing\ne F\subseteq Y}(-1)^{|F|-1}\left|\bigcap_{y\in F}\mathscr{A}_y\right|\;.$$
Now $\bigcap_{y\in F}\mathscr{A}_y$ is just the set of functions from $X$ to $Y\setminus F$, so its cardinality is
$$|Y\setminus F|^m=(n-|F|)^m\;.$$
Moreover, for $k=1,\ldots,n$ there are $\binom{n}k$ subsets of $Y$ of cardinality $k$, so
$$\begin{align*}
n^m-\sum_{\varnothing\ne F\subseteq Y}(-1)^{|F|-1}\left|\bigcap_{y\in F}\mathscr{A}_y\right|&=n^m-\sum_{k=1}^n(-1)^{k-1}\binom{n}k(n-k)^m\\
&=\binom{n}0(n-0)^m+\sum_{k=1}^n(-1)^k\binom{n}k(n-k)^m\\
&=\sum_{k=0}^n(-1)^k\binom{n}k(n-k)^m\;.
\end{align*}$$