These hashes are sometimes called "password hashes", because they are designed to protect against exactly the threat model you mention: someone getting a hold of a copy of your password database and brute-forcing it. A subset is also known as "password-based key-derivation functions (PBKDF)".
Scrypt is a relatively new, but widely-known and widely-used one that was created to fix flaws in the more established PBKDF2 and bcrypt algorithms that can be drastically sped up by custom ASICs or FPGAs, and easily parallelized on a GPU.
The central innovation in scrypt is a very large pseudo-random bitvector, which in turn is accessed very often in a pseudo-random fashion. What this means that the standard "trick" for changing the performance characteristics of code, the space-time-tradeoff is expensive in both directions. In particular, the very large bitvector makes the algorithm hard to parallelize, since you will either have lots of computing elements thrashing the memory bus (limiting the parallel speedup), or lots of copies of the very large bitvector in the individual computing elements (making parallelism expensive). The pseudo-random access pattern also ensures that branch prediction, cache prediction, memory prefetching and such cache-miss reducing optimizations are useless, and the size of the bitvector ensures that you will blow each and every cache you throw at it.
Theoretically, since both the bitvector and the access pattern are "only" pseudo-random, they are still algorithmically determined.
Ergo, you could reduce the memory requirement by just computing everything on the fly and not keeping the bitvector in memory at all. However, the algorithm is designed such that this computation is itself still very slow, and the algorithm is designed to access the same elements over and over again (but you can't easily predict when and in which order), so you would have to re-compute elements over and over again. OTOH, you could reduce the time requirement by pre-computing all possible values, but then you would get an explosion in the memory requirements.
Either way, the trade-off is prohibitively expensive: you have no choice but to use both large memory and lots of CPU cycles.
Basically, you can think of the two pseudo-random sequences as a single very complex pseudo-random sequence, and the bitvector as a cache. But, the "cache" is designed in such a way that you can neither remove it and try to make up for that with increased processing speed or parallelism, nor can you expand it and thus save processing time and pay for it with increased memory usage.