Why can FFT accelerate the procedure to solve the ridge regression problem?

Question

For a ridge regression problem $$ \arg \min_x {\|Ax-b\|^2+\lambda\|x\|^2} $$ where $A$ is a symmetric matrix. The solution can be achieved through gradient-based iterative method like Gradient Descent Method, Newton Method and Conjugate Gradient Method. And the analytical solution should be $$\hat{x} = (A^TA+\lambda I)^{-1}A^Ty $$

In some books, there are a similar conclusions like this: The iterative procedure could be accelerated by using FFT, and the solution is $$ \hat{x}=F^{-1}(\frac{\overline{F(A)}F(y)}{\overline{F(A)}F(A)+\lambda})$$

How does this come? Could anyone give me some derivations or references?

The issue is to make sense for example of the DFT (discrete Fourier transform) of a matrix multiplied by the DFT of vector... As it as been said, it is clear only is $A$ is circulant. — Jean Marie, Apr 25 '21 at 04:26

littleO · Accepted Answer · 2021-04-25T02:34:42.503

An $n \times n$ circulant matrix $M$ is diagonalized by the discrete Fourier basis. This gives us a way to solve $Mx = b$ very efficiently, as follows. Let $F$ be an $n \times n$ matrix whose columns are the discrete Fourier basis vectors. Then $MF = F \Lambda $, where $\Lambda$ is a diagonal matrix whose diagonal entries are the eigenvalues of $M$. So $F^{-1} M F = \Lambda$ is a diagonal matrix. Now, notice that \begin{align} & Mx = b \\ \iff &F^{-1} M F F^{-1} x = F^{-1} b \\ \iff& \Lambda \hat x = \hat b \end{align} where $\hat x = F^{-1} x$ and $\hat b = F^{-1} b$. It is now effortless to compute $\hat x = \Lambda^{-1} \hat b$, because we are merely inverting a diagonal matrix. We then obtain $$ x = F \hat x $$ and so we have solved $Mx = b$.

Normally multiplying a matrix by a vector takes $O(n^2)$ floating point operations. So already, we are able to solve $Mx = b$ using $O(n^2)$ rather than the usual $O(n^3)$ floating point operations. But, there is something even more tremendous that happens here: computing $F^{-1} x$ can in fact be accomplished with just $O(n \log(n))$ floating point operations, thanks to the wonderful Fast Fourier Transform algorithm. We can also compute $F^{-1} b$ and $F \hat x$ using the blazingly fast FFT algorithm. So, we are able to solve $Mx = b$ with just $O(n \log(n))$ floating point operations. This is a radical improvement over the usual $O(n^3)$ cost of solving a linear system of equations.

Finally, in your problem, if $A$ represents a convolution operator on $\mathbb R^n$ or on $\mathbb C^n$ (using periodic boundary conditions) then $A$ is circulant, so $M = A^T A + \lambda I$ is also circulant, so the above technique can be applied.

If $A$ represents a convolution operator on images (that is, on $p \times q$ rectangular arrays of real or complex numbers), then the above argument still works, but now the "discrete Fourier basis" is a basis for $\mathbb C^{p \times q}$.

@reuns It depends on what boundary conditions are used, right? Depending on the type of boundary conditions that are used when performing the convolution, there may or may not be a way to use the FFT to solve the ridge regression problem efficiently. So I assumed periodic boundary conditions just for simplicity. — littleO, Apr 25 '21 at 03:13
@Shannon Here's another answer I once wrote providing some more insight into the "discrete Fourier basis" -- what it is and where it comes from. https://math.stackexchange.com/a/1944778/40119 — littleO, Apr 26 '21 at 08:03

Why can FFT accelerate the procedure to solve the ridge regression problem?

1 Answers1