In brief: If you know $(a,N)$, you can speed the computation up by precomputing some of the powers of $a$.
Let $x=x_n\dots x_1x_0=\sum_{i=0}^n x_i 2^i$ be the binary expansion of $x$, and let $a_j=a^{2^j}\pmod N$.
Very naively:
$$
a^x \pmod N = \overbrace{a*(a*(a*\dots*(a))\dots))}^{\text{x terms}}
$$
This requires $\theta(x)$ multiplications.
Traditional Square and Multiply:
$$
a^x \pmod N = a^{x_n \dots x_0} = a^{2^n x_n+ \dots+ x_0}
=(a^{2^n})^{x_n} (a^{2^{n-1}})^{x_{n-1}} \dots a^{2^0})^{x_n}
=\prod_{i=0}^n a_i^{x_i}
$$
So, to use this for efficient multiplication, we maintain a product value $y$, and exponentiation variable $e$ - initialised with $(y,e)=(1,a)$. Then, we simply continue to square $e$, multiplying $y$ by $e$ each time we reach some power $a^{2^j}$ for which $x_j= 1$. How much work does this require? Well, we must calculate $n=\log_2(x)$ multiplications to calculate the $a_j$, and then on average $n/2$ multiplications to calculate $a^x$ (where we assume that an "average" value of $x$ has half it's bits set), and at most $n$ multiplications. Total? $2n$ worst case, $\frac{3}{2}n$ on average.
Precomputational Optimisations of Square and Multiply
Calculating each $a_j=a^{2^j}\pmod N$ in advance, when we calculate $a^x$ we only need to do the (at most) $n$ multiplications. That is, we do not need to do any exponentiations at all. However, if $x$ may be very large, this will involve storing a large amount of data, but by storing some subset of these $j$ we can reduce the number of exponentiations required to reach the remaining values. Moreover, if one so wished this could store values such as $b=a_4*a_2$, which would reduce the online cost of calculating $a^{1010b}$ to the cost of looking up $b=a^{1010b}$.
Deciding a balance for this trade-off provides an interesting question, since at some point storing too many powers becomes unreasonable. For example, it would be possible to precompute and store $a^x\pmod N$ for all $x\in\{0,\dots,2^t\}$. This would reduce calculating $a^x$ to the look-up cost, but such a table would have size $\theta(2^t)$, which may well be impractically large.