We assume a standard secure RSA public key $(n,e)$, e.g. 3072-bit $n$ and $e=65537$.
The Meet-in-the-Middle attack (not to be confused with Man-in-the-Middle attack) on textbook RSA recovers $m$ from $c=m^e\bmod n$ for a sizable fraction of small $m$ and $\mathcal O(\sqrt m)$ modular operations modulo $n$. It does not apply in practice to well-designed uses of RSA, which $m$ is essentially random in $[0,n)$.
More precisely, when we ignore memory and auxiliary costs, the attack works with $u_\max$ modular inversions and multiplications modulo $n$, and $u+v$ modular exponentiations to the $e^\text{th}$ power modulo $n$, when $m$ can be written as $m=u\,v$ with $u<u_\max$ and $u\le v$.
The question asks about time-memory tradeoffs in this attack; and secondarily, other considerations to make it practical.
The attack requires modest means for a large fraction of 64-bit $m$ (see Dan Boneh, Antoine Joux & Phong Q. Nguyen, Why Textbook ElGamal and RSA Encryption Are Insecure, in proceedings of Asiacrypt 2000). Others stated that this attack "does not scale to 128-bit symmetric session keys", but without considering the possibility of time-memory tradeoffs.
To establish notation, the basic attack without time-memory tradeoff can be as follows:
- Decide bound $u_\max$ for $u$, say somewhat below the square root of the expected $m$.
- Decide bound $v_\max$ for $v$, say about the expected $m/u_\max$, with $v_\max\ge u_\max$.
- Allocate storage for a memory data structure large enough for $u_\max$ values in $[0,p)$ with index $i\in[0,u_\max)$, searchable by value to detect if a given value exists, and if so recover it's index $i$.
- Input $c$ (with $0\le c<n$)
- Store phase: For $i\in[0,u_\max)$
- Compute $x_i=(i^{-1}\bmod n)^e\,c\bmod n$
- Store $x_i$ in the data structure at index $i$
- Search phase: For $j\in[0,v_\max)$ ()
- Compute $y_j=j^e\bmod n$
- Search $y_j$ in the data structure. If found at index $i$,
- Output the product $i\,j$, which is $m$, and stop.
- Stop with no output. This happens only when $m$ can't be written as $m=u\,v$ with $u<u_\max$ and $v<v_\max$.
Note: the computation of $x_i$ as $(i^{-1}\bmod n)^e\,c\bmod n$ rather than ${(i^e\bmod n)}^{-1}\,c\bmod n$ in the store phase is to perform modular inversion on smaller integers. With $e=65537$, I think this saves effort even compared to sharing the computation of $i^e\bmod n$ with the search phase, complicating the exposition.
The algorithm is sound because it's stored $x_i=i^{-e}\,c\bmod n$ with index $i$, and searched for $y_j=j^e\bmod n$. When $x_i=y_j$, it holds $i^{-e}\,c\equiv j^e\pmod n$, thus $c\equiv(i\,j)^e\pmod n$, thus $i\,j$ is the desired $m$.
Asymptotically, the memory requirement can be down to little more than $u_\max\log_2 u_\max$ bits. In practice, one option is a hash table with open addressing. We can take advantage of the fact that the $x_i$ to store are random-like, and can be recomputed from $i$ at any time to resolve a false match if that's kept rare. So we can use say $\left\lceil\frac 6 5\,u_\max\,\right\rceil$ entries in the table (20% extra space, for about 5 probes on average in the search phase), each storing index $i$ plus say 12 bits of $x_i$ to reduce false matches, with the primary index for $x_i$ in the hash table determined from other bits of $x_i$. It's used $\left\lceil\frac 6 5\,u_\max\,\right\rceil\times\left\lceil\frac{12+\log_2u_\max}8\right\rceil$ bytes.