These prime numbers are called Solinas primes (because they were described by Jerome Solinas). The article details how they are found and how optimization works for them.
As a brief summary, consider a prime:
$$p = \sum_{i=0}^{k} b_i 2^{iw} $$
where each $b_i$ is either $0$, $1$ or $-1$, and $w$ is your "word size" (typically $w = 32$ or $64$ will yield the best results on usual computers). You also want $b_0 \neq 0$ ($p$ must be prime, so in particular it must be odd) and $b_k = 1$: the prime value is close to $2^{wk}$. To make things even easier to implement, make sure that $p$ is slightly below $2^{wk}$ rather than slightly higher, meaning that the highest non-zero $b_i$ (for $i < k$) has value $-1$.
With such a prime, modular reduction is easy because, modulo $p$, you have:
$$ 2^{wk} = \sum_{i=0}^{k-1} -b_i 2^{iw} \pmod p $$
from which you can infer:
$$ -x 2^{w(k+j)} = \sum_{i=0}^{k-1} b_i x 2^{w(i+j)} \pmod p $$
for all integers $x$ and $j$. Therefore, if you have a $n$-word value $X$ (with $n > k$):
$$ X = \sum_{i=0}^{n-1} x_i 2^{wi} \mathrm{\ \ \ \ \ \ where\ } 0 \leq x_i < 2^w $$
then you can compute the value $X'$:
$$ X' = X - x_{n-1} 2^{w(n-1)} + \sum_{i=0}^{k-1} b_i x_{n-1} 2^{w(n-1-k+i)} $$
This addition is easily computed because all the $b_i$ are $0$, $1$ or $-1$, so this amounts to adding or subtracting word $x_{n-1}$ to some other, lower words. With the relation above, you have $X' = X \pmod p$. But $X'$ now consists of $n-1$ words. So, with a few addition and subtractions, you have removed one word from your problem. Iterate.
Whether Solinas primes really give that much an advantage over a random prime (with Montgomery multiplication) is disputed. It depends on the implementation architecture. It has also been argued that choosing $p = 2^m - z$ for a small $z$ yields more efficient computations. As for all performance things, there is no absolute answer; it must be tried and measured, and any new language, compiler, processor version or architecture can change the answers.