Discrete Fourier Transform: Effects of zero-padding compared to time-domain interpolation

Question

While studying the various algorithm implementations available on-line of the Fast Fourier Transform algorithm, I've come to a question related to the way the DFT works in theory.

Suppose you have a sequence of $N$ points $x_0, ..., x_{N-1}$. For $ k = 0, ..., N-1 $, let $ X_k = \sum_{n=0}^{N-1} x_n e^{-2ik\pi \frac{n}{N}} $.

I've noticed that many algorithms are easier to implement, or faster, when the size of the input can be expressed as a power of 2. To pad the signal, I've seen two approaches.

Pad the signal with $0$s, settings $x_N, ..., x_{2^p-1} = 0$, and $X_k = \sum_{n=0}^{N-1} x_n e^{-2ik\pi \frac{n}{2^p}}$
Interpolate the original values, by setting $\tau=N/2^p$ the new spacing between consecutive points and then guessing the values at $0, \tau, 2\tau, ..., (2^p-1)\tau$ through linear interpolation.

I've heard people saying different things:

Some people oppose the first approach very strongly (I recently had a discussion with a physics teacher about this). They say that padding the signal with extra zeros give you the Fourier coefficients of a different function, which will bear no relation to those of the original signal. On the other hand, they say that interpolation works great.

On the other hand, most libraries, if not all, that I have reviewed use the second solution.

All the references that I could find on the internet were pretty vague on this topic. Some say that the best band-limited interpolation that you can do in frequency domain is obtained through time-domain padding, but I couldn't find any proof of such statement.

Could you please help me figure out which advantages and drawbacks both approaches have? Ideally, I'm searching for something with a mathematical background, not only visual examples =)

Thanks!

In a general FFT I usually see factoring instead of either of the methods you give above. Is there are reason you're not including this? It seems like factoring your input would (for large inputs at least) be better. — Xodarap, Mar 11 '11 at 20:43
@Xodarap: Mostly because I understand pretty well how factoring-based one work on the mathematical point of view, whereas I have trouble evaluating how much bias the other methods introduce. — Clément, Mar 12 '11 at 13:42

Graham · Accepted Answer · 2011-03-11T21:58:56.383

Zero-padding in the time domain corresponds to interpolation in the Fourier domain. It is frequently used in audio, for example for picking peaks in sinusoidal analysis.

While it doesn't increase the resolution, which really has to do with the window shape and length. As mentioned by @svenkatr, taking the transform of a signal that's not periodic in the DFT size is like multiplying with a rectangular window, equivalent in turn to convolving it's spectrum with the transform of the rectangle function (a sinc), which has high energy in sidelobes (off-center frequencies), making the true sinusoidal peaks harder to find. This is known as spectral leakage.

But I disagree with @svenkatr that zero-padding is causing the rectangular windowing, they are separate issues. If you multiply your non-periodic signal by a suitable window (like the Hann or Hamming) that has appropriate length to have the frequency resolution that you need and then zero-pad for interpolation in frequency, things should work out just fine.

By the way, zero-padding is not the only interpolation method that can be used. For example, in estimating parameters of sinusoidal peaks (amplitude, phase, frequency) in the DFT, local quadratic interpolation (take 3 points around a peak and fit a parabola) can be used because it is more computationally efficient than padding to the exact frequency resolution that you want (would mean a much larger DFT size).

I didn't imply that the zero padding caused the rectangular windowing. Rather, I wanted to say that simply zero padding a signal has the same effect as using a rectangular window. — svenkatr, Mar 11 '11 at 21:56
You make a good point about different kinds of windows. The wikipedia article on windowing functions( http://en.wikipedia.org/wiki/Window_function) explains the concept in detail. — svenkatr, Mar 11 '11 at 21:58

svenkatr · Answer 2 · 2011-03-11T22:20:50.240

I'll try and give an intuitive answer that can be made mathematically precise with some careful analysis.

First, imagine someone gives you the DFT of a function, and it turns out to be a constant everywhere (i.e., for $k = 0,1,2, \ldots N-1$). Which function in the time domain has such a DFT? The answer is a "delta function", i.e., $f(n) = C$ for $n=0$ and $f(n)= 0$ for $n \neq 0$

These arguments work when you reverse domains, so if you have a constant function in the time domain, you will have a "delta" function in the frequency domain. If you decide to pad you time domain sequence with zeros and then take DFT, you will get a sinc function instead of a delta function. Therefore, strictly speaking, by padding with zeros, you are distorting the DFT of the function.

However, there is another way of looking at this. Your padding by zeroes operation is essentially the same as taking an extended set of points of the original time series and multiplying it by a rectangular function (whose non zero region is limited to the points $x_0,\ldots x_{N-1}$). Therefore, in the frequency domain your original DFT is convolved by a sinc function. Ideally, you want the DFT of the original signal to be convolved with something that is close to a delta function (as this doesn't distort anything). In our case, this means that we should at least ensure that the sinc has a very sharp main lobe and side lobes that decay very fast (to approximate a delta function).

If your sinc function has a very spread out main lobe, your DFT will be highly distorted whereas if the main lobe is sharply peaked, the distortion is very small. In other words, if you tried to pad your original time series by a large number of zeros, the effect of the convolution is much more pronounced because the main lobe of the sinc will be very spread out. If you pad it with a small number of zeros, you are very close to convolving the DFT with a delta function, and distortion is small.

Regarding interpolation, I guess that method could also work without giving much distortion, but the correct approach would be to use an interpolation filter instead of doing simple linear interpolation (as explained in http://en.wikipedia.org/wiki/Upsampling ). In practice, I'm not sure there will be a significant difference when dealing with well behaved signals that are properly sampled.

Thanks for your detailed answer! – Clément Mar 16 '11 at 16:39 — Clément, Mar 16 '11 at 16:39

Discrete Fourier Transform: Effects of zero-padding compared to time-domain interpolation

2 Answers2

Linked