While studying the various algorithm implementations available on-line of the Fast Fourier Transform algorithm, I've come to a question related to the way the DFT works in theory.
Suppose you have a sequence of $N$ points $x_0, ..., x_{N-1}$. For $ k = 0, ..., N-1 $, let $ X_k = \sum_{n=0}^{N-1} x_n e^{-2ik\pi \frac{n}{N}} $.
I've noticed that many algorithms are easier to implement, or faster, when the size of the input can be expressed as a power of 2. To pad the signal, I've seen two approaches.
- Pad the signal with $0$s, settings $x_N, ..., x_{2^p-1} = 0$, and $X_k = \sum_{n=0}^{N-1} x_n e^{-2ik\pi \frac{n}{2^p}}$
- Interpolate the original values, by setting $\tau=N/2^p$ the new spacing between consecutive points and then guessing the values at $0, \tau, 2\tau, ..., (2^p-1)\tau$ through linear interpolation.
I've heard people saying different things:
Some people oppose the first approach very strongly (I recently had a discussion with a physics teacher about this). They say that padding the signal with extra zeros give you the Fourier coefficients of a different function, which will bear no relation to those of the original signal. On the other hand, they say that interpolation works great.
On the other hand, most libraries, if not all, that I have reviewed use the second solution.
All the references that I could find on the internet were pretty vague on this topic. Some say that the best band-limited interpolation that you can do in frequency domain is obtained through time-domain padding, but I couldn't find any proof of such statement.
Could you please help me figure out which advantages and drawbacks both approaches have? Ideally, I'm searching for something with a mathematical background, not only visual examples =)
Thanks!