I'll try to give you an equivalent definition, which I think is a bit clearer, I will leave the equivalence to you in first instance. I might add more explanation later.
First let's consider the case that $n=2$ and $m=1$, (actually what I will say works equally well for arbitrary $n$, but $n=2$ is the simplest case that is new to us).
Let me try to be concrete, suppose we have a function $f: \mathbb{R}^{2} \rightarrow \mathbb{R}$, and we want to compute the derivative of $f$, say at $0 \in \mathbb{R}^{2}$. That is, we want to figure out how the output of $f$ changes if we change the input. The idea is to use the fact that we know how to differentiate functions from $\mathbb{R}$ to $\mathbb{R}$. Suppose we have any $v \in \mathbb{R}^{2}$, then we can get a function from $\mathbb{R}$ to $\mathbb{R}$ as follows
\begin{align}
f_{v}: \mathbb{R} &\rightarrow \mathbb{R}, \\
t &\mapsto f(tv).
\end{align}
This function is (a reparametrization of) the restriction of $f$ to the line spanned by $v$.
We know how to differentiate such a function with respect to $t$ (it might not be differentiable, in which case $f: \mathbb{R}^{2} \rightarrow \mathbb{R}$ is not differentiable). We thus get a number
\begin{equation}
D_{v} f := \frac{\text{d}}{\text{d}t}\bigg|_{t=0} f(tv),
\end{equation}
which tells us how quickly the output of the function $f$ changes if we vary the input along the line spanned by $v \in \mathbb{R}^{2}$.
What was described above makes sense for any vector $v \in \mathbb{R}^{2}$, so we have a map from $\mathbb{R}^{2}$ to $\mathbb{R}$,
\begin{align}
\mu: \mathbb{R}^{2} &\rightarrow \mathbb{R}, \\
v &\mapsto D_{v}f = \frac{\text{d}}{\text{d}t}\bigg|_{t=0} f(tv).
\end{align}
Now, I claim that the map $\mu$ is linear, and furthermore satisfies the equation that you wrote down (with $a=0$):
\begin{equation}
\lim_{h \rightarrow 0} \frac{\|f(h) - f(0) - \mu(h)\|}{\|h\|} = 0.
\end{equation}
Note that $h \in \mathbb{R}^{2}$. It's up to you to show that $\mu$ satisfies this equation, (and that any $\mu$ that satisfies this equation is the derivative). (I might add some steps to show this later).
Now I have claimed that $\mu: \mathbb{R}^{n} \rightarrow \mathbb{R}^{m}$ is the derivative of $f$ at $0$ (or $a$), I should maybe tell you a bit about what it would look like in the case that $m=n=1$. In this case we see (exercise!) that for $s \in \mathbb{R}$
\begin{equation}
\mu(s) = s \frac{\text{d}}{\text{d}t}\bigg|_{t=0} f(t).
\end{equation}
Like remarked above, I have not really used the fact that $n=2$ and the entire story holds equally well for arbitrary $n$. For higher $m$, we should view a function $f: \mathbb{R}^{n} \rightarrow \mathbb{R}^{m}$ as $m$ functions $f_{i}: \mathbb{R}^{n} \rightarrow \mathbb{R}$.