My course on manifolds defines an embedding as follows:
'A smooth map $f:\mathcal{M}\rightarrow\mathcal{N}$ between manifolds $\mathcal{M}$ of dimension $m$ and $\mathcal{N}$ of dimension $n$ is an embedding if it is a diffeomorphism onto its range. We refer to the range, $f(\mathcal{M})$, of such a map, as a submanifold of $\mathcal{N}$.'
Below the definition, it states 'clearly, any embedding is an injective immersion', but I am struggling to see why this is. Injectivity follows from the fact that any diffeomorphism is bijective, but why does the immersion part follow? Why does $f$ being a diffeomorphism onto its range imply that the derivative $d_xf$ is injective for all $x\in\mathcal{M}$? Is it to do with $d_xf$ having a left inverse?
I've tried to consider a dimension argument but this has got me even more confused, so I have a second question: we view the $d_xf$ as a linear map from $T_x\mathcal{M}$ to $T_{f(x)}\mathcal{N}$, which can be represented as an $n\times{m}$ matrix since the dimension of a tangent space is equal to the dimension of its corresponding manifold. But couldn't we equally view $d_xf$ as a linear map from $T_x\mathcal{M}$ to $T_{f(x)}f(\mathcal{M})$? This can then be represented as an $m\times{m}$ matrix, since the dimension of $f(\mathcal{M})$ is equal to the dimension of $\mathcal{M}$ (diffeomorphisms preserve dimension?). Which is the correct way to look at it?