The simplest way to proceed is rejection sampling. For this we need to first assume $M \geq N$. If this isn't the case (maybe you're playing D&D with a d6, so $N=20,M=6$), then you should roll at least $k=\lceil \log_M(N) \rceil$ times, and then label the $k$-tuples of rolls. For the discussion of the rejection approach let us assume you have already done this and accordingly rename $M^k$ as $M$ if need be.
Now rejection sampling amounts to assigning each of the numbers $1,2,\dots,N$ to $q(M,N)$ of the possible rolls, where $q$ is the quotient when $M$ is divided by $N$. If you got one of the assigned rolls, you terminate, otherwise you start over.
This rejection method takes constant space. It fails with probability $\frac{r(M,N)}{M}$ (where $r$ is the remainder when $M$ is divided by $N$), so the average number of steps taken is $\frac{M}{M-r(M,N)}$. In theory this can run forever but the probability of long runtimes decays exponentially fast.
A different way to proceed is to simply solve a more general problem and then apply it to this one. Note that a statistically equivalent procedure to rolling a $N$ sided die is to generate a random variable $U$ which is uniform on $(0,1)$ and then return $\lceil NU \rceil$. Such a variable can be obtained using infinitely many rolls of an $M$ sided die: $U=\sum_{n=1}^\infty (X_n-1) M^{-n}$ where $X_n$ are uniform on $\{ 1,2,\dots,M \}$.
Now as it stands, that seems like a bad thing, because you can't do infinitely many rolls. But you don't need full resolution, you only need to resolve which of the intervals $(k/N,(k+1)/N]$ that $U$ will eventually be in. Given $R$ rolls and a current value of $U$, say $U_R$, you know that the final value of $U$ will be somewhere between $U_R$ and $U_R+\sum_{n=R+1}^\infty (M-1) M^{-n}=U_R+M^{-R}$. If these numbers fall in the same interval of the form above then you are done computing.
Again the runtime of this alternative method is random and not bounded. Additionally, the memory footprint of this method is also random and not bounded. The advantage of it is that it does not throw away entropy, so the probability that you will finish in the next step improves as you go on. Also, if $M<N$, although you need to roll at least $k=\lceil \log_M(N) \rceil$ times, you do not have to perform units of $k$ rolls, which could be good if for some reason $k$ were large.
An example of the latter method: $N=20,M=6$. I roll a 4, putting me in $[3/6,4/6]$, but $\lceil 3 \cdot 20/6 \rceil = 10 \neq 14 = \lceil 4 \cdot 20/6 \rceil$. Then I roll a 6, putting me in $[23/36,24/36]$. I'm still not done because $\lceil 23 \cdot 20/36 \rceil = 13 \neq 14 = \lceil 24 \cdot 20/36 \rceil$. I roll again and get a $1$, and now my roll is understood as a 13.