11

So we have a matrix $A$ size of $M \times N$ with elements $a_{i,j}$. What is a step by step algorithm that returns the Moore-Penrose inverse $A^+$ for a given $A$ (on level of manipulations/operations with $a_{i,j}$ elements, not vectors)?

Kabumbus
  • 438

3 Answers3

21
  1. Perform a singular value decomposition $\mathbf A=\mathbf U\mathbf \Sigma\mathbf V^\top$

  2. Check for "tiny" singular values in $\mathbf \Sigma$. A common criterion is that anything less than $\|\mathbf A\|\epsilon$ is "tiny". Set those singular values to zero.

  3. Form $\mathbf \Sigma^+=\mathrm{diag}(1/\sigma_1,1/\sigma_2,\dots, 0)$. That is, reciprocate the nonzero singular values, and leave the zero ones untouched.

  4. $\mathbf A^+=\mathbf V\mathbf \Sigma^+\mathbf U^\top$

9

As far as I know and from what I have read, there is no direct formula (or algorithm) for the (left) psuedo-inverse of $A$, other than the "famous formula" $$(A^\top A)^{-1}A^\top$$ which is computationally expensive. I will describe here an algorithm that does indeed calculate it efficiently. As I said, I believe this may be previously un-published, so cite me appropriately, or if you know of a reference please state it. Here is my algorithm.

Set $B_0 = A$ and for each iteration step, take a column of $B_i$ and orthogonalize against the columns of $A$. Here is the algorithm ($A$ has $n$ independent columns):

1. Initialize $B_0=A$.
2. For $j \in 1\ldots n$ do \begin{align} \mathbf{t} &= B_i^\top A_{\star j} \\ R\mathbf{t} &= e_j & \text{Find such $R$, an elementary row operation matrix}\\ B_{i+1}^\top &= R B_i^\top \\ \end{align}

Notice that each step does one vector/matrix multiplication and one elementary matrix row operation. This will require much less computation than the direct evaluation of $(A^\top A)^{-1}A^\top$. Notice also that there is a nice opportunity here for parallel computation. (One more thing to notice--this may calculate a regular inverse, starting with $B_0=A$, or with $B_0=I$.)

The step that calculates $R$ may partially be orthogonal rather than elementary (orthonormal/unitary -thus better in the sense of error propagation) if desired, but should not make use of the previously orthonormalized rows of $B$, since they must remain orthogonalized in the process. If this is done, the process becomes a "cousin" to the QR algorithm, whereas before it would be considered a "cousin" to the LU factorization. "Cousin" because the matrix operations of those are self referential, and this algorithm references the matrix $A$. The following is a hopefully enlightening description.


See first edits for a more wordy description. But I think the following is more concise and to the point.

Consider the conformable matrix $B^\top$ such that $B$ has the same dimension as $A$ and \begin{align} B^\top A &= I \tag{1}\\ B &= A B_A \tag{2}\\ \end{align}

Here $B_A$ is arbitrary and exists only to ensure that the matix $B$ shares the same column space as $A$. (1) states that $B^\top$ is a (not necessarily unique) left inverse for $A$. (2) states that $B$ shares the same column space as $A$.

Claim: $B^\top$ is the Moore-Penrose pseudoinverse for $A$.

Proof:

The Moore-Penrose pseudoinverse for $A$ is $$A^\dagger = \left(A^\top A\right)^{-1}A^\top$$ Given (1) and substituting (2) we have \begin{align} \left(AB_A\right)^\top A &= I \\ B_A^\top A^\top A &= I \\ B_A^\top &= \left( A^\top A\right)^{-1} \\ B_A &= \left( A^\top A\right)^{-1} \\ \end{align} Now solving for $B$ from (2): \begin{align} B &= A\left(B_A\right) \\ B &= A\left( A^\top A\right)^{-1} \\ \Rightarrow B^\top &=\left( A^\top A\right)^{-1}A^\top \\ \end{align}

Q.E.D.

adam W
  • 5,565
  • But the question is about the Moore-Penrose pseudoinverse, not the left inverse. If left inverses exist, the Moore-Penrose pseudoinverse is one of them, isn't it? And it can be computed using the algorithm in J.M.'s answer. So I'm not sure what you mean by "there is no direct formula (or algorithm)". –  Feb 28 '13 at 20:17
  • @Rahul The algorithm in J.M.'s answer is iterative. What I am saying is that this algorithm computes the particular left inverse which is indeed the Moore-Penrose pseudoinverse, and does so in exact arithmetic, not by convergence. – adam W Feb 28 '13 at 20:29
  • I liked your idea much, so I tried to implement it to see, whether I've got it correctly. Here is a webpage where I just put my procedere together:http://go.helms-net.de/math/divers/ACheapimplementationforthePenrose.htm . Has that taken your proposal correctly? – Gottfried Helms Feb 28 '13 at 21:45
  • @Gottfried Well, I do get the same numbers up to accuracy. For instance, $-0.012390122873089741$ is what I get for the lower right element. The way you describe "putting A into upper triangular shape" seems like you are doing regular LU or QR though...so not too certain. – adam W Feb 28 '13 at 23:02
  • Is this faster than SVD? It's $\Omega(mn^2)$, right? – user7530 Mar 01 '13 at 02:05
  • @user7530 yes, it is, and correct. I calculate it is $2mn^2$ multiplies, and $2mn^2 - n^2 -nm$ additions. Where $n$ is the smaller dimension. If parallel operations are used, the pipeline takes $2n$ operations in series, the rest in parallel. – adam W Mar 01 '13 at 02:40
  • This looks awfully like one of the older methods for the Moore-Penrose inverse; I remember seeing this in one of the SIAM journals. I'll try digging up a reference... – J. M. ain't a mathematician May 30 '13 at 19:23
  • @J.M. Sounds good. I feel like it must not be anything new, just a bit uncommon since it really is best for exact arithmetic. I am aware that the condition number that affects it is square of that of the matrix. However, I have come up with a different way that uses only unitary operations. It takes about as many ops as two or three QR factorizations, but still avoids any convergence issues, such as with the SVD. – adam W May 31 '13 at 17:26
3

How to we use the elements of $\mathbf{A}$ to construct the pseudoinverse $\mathbf{A}^{\dagger}$?

Anatomy of the SVD

The target matrix has $m$ rows, $n$ columns, and has rank $\rho$ $$ \mathbf{A} \in \mathbb{C}^{m\times n}_{\rho} $$

Define the singular value decomposition: $$ \begin{align} \mathbf{A} &= \mathbf{U} \, \Sigma \, \mathbf{V}^{*} \\ % &= % U \left[ \begin{array}{cc} \color{blue}{\mathbf{U}_{\mathcal{R}}} & \color{red}{\mathbf{U}_{\mathcal{N}}} \end{array} \right] % Sigma \left[ \begin{array}{cccc|cc} \sigma_{1} & 0 & \dots & & & \dots & 0 \\ 0 & \sigma_{2} \\ \vdots && \ddots \\ & & & \sigma_{\rho} \\\hline & & & & 0 & \\ \vdots &&&&&\ddots \\ 0 & & & & & & 0 \\ \end{array} \right] % V \left[ \begin{array}{c} \color{blue}{\mathbf{V}_{\mathcal{R}}}^{*} \\ \color{red}{\mathbf{V}_{\mathcal{N}}}^{*} \end{array} \right] \\ % & = % U \left[ \begin{array}{cccccccc} \color{blue}{u_{1}} & \dots & \color{blue}{u_{\rho}} & \color{red}{u_{\rho+1}} & \dots & \color{red}{u_{n}} \end{array} \right] % Sigma \left[ \begin{array}{cc} \mathbf{S}_{\rho\times \rho} & \mathbf{0} \\ \mathbf{0} & \mathbf{0} \end{array} \right] % V \left[ \begin{array}{c} \color{blue}{v_{1}^{*}} \\ \vdots \\ \color{blue}{v_{\rho}^{*}} \\ \color{red}{v_{\rho+1}^{*}} \\ \vdots \\ \color{red}{v_{n}^{*}} \end{array} \right] % \end{align} $$ Coloring separates $\color{blue}{range}$ space entities from $\color{red}{null}$ space entities.

Using the thin, or economical, or nullspace free variants, we have $$ \mathbf{A} = \color{blue}{\mathbf{U}_{\mathcal{R}}} \, \mathbf{S} \, \color{blue}{\mathbf{V}^{*}_{\mathcal{R}}} \qquad \Rightarrow \qquad \mathbf{A}^{\dagger} = \color{blue}{ \mathbf{V}_{\mathcal{R}} } \, \mathbf{S}^{-1} \, \color{blue}%{\mathbf{U}^{*}_{\mathcal{R}}}. $$

Construction of the SVD

  1. Form the product matrix $\mathbf{A}^{*}\mathbf{A}$.

  2. Solve for the eigenvalues $\lambda \left( \mathbf{A}^{*}\mathbf{A}\right)$. This is a messy affair because it demands finding the roots of the characteristic equation which is the numerical challenge of solving for the zeros of an arbitrary polynomial.

  3. Order the nonzero values to assemble the matrix of singular values $\mathbf{S}=\text{diagonal}\left( \sigma \right)$ where the singular value spectrum is $$ \sigma = \sqrt{\lambda \left( \mathbf{A}^{*}\mathbf{A} \right)} $$ with $$ \sigma_{1} \ge \sigma_{2} \ge \dots \sigma_{\rho} > 0. $$ A problem immediately arises over the value of $\rho$. Are small numbers numerical artifacts? The issue is vital because the contribution to the pseudoinverse is weighted by the reciprocal of the singular value. Very small number make very large contributions.

  4. Compute the eigenvectors $\hat{u}_{k}$, $k=1,\rho$ for each eigenvalue: $$ \mathbf{A}^{*}\mathbf{A}\, \hat{u}_{k} = \lambda_{k}\,\hat{u}_{k} $$ This is the origin of the sign ambiguity in the SVD.

  5. Normalize these vectors and make them the column vectors of the codomain matrix $\color{blue}{\mathbf{U}^{*}_{\mathcal{R}}}$: $$ \color{blue}{\mathbf{U}^{*}_{\mathcal{R}}} = \left[ \begin{array}{cccc} \color{blue}{\frac{ \hat{u}_{1} } {\lVert \hat{u}_{1} \rVert}} & \color{blue}{\frac{ \hat{u}_{2} } {\lVert \hat{u}_{2} \rVert}} & \dots & \color{blue}{\frac{ \hat{u}_{\rho} } {\lVert \hat{u}_{\rho} \rVert}} \end{array} \right] $$

  6. Optional: Want $\color{red}{\mathbf{U}_{\mathcal{N}}}$? Apply Gram-Schmidt orthogonalization to the column vectors. The orientation is a free choice and a reason while different programs create different nullspace spans. If you choose this option, you need to pack $\mathbf{S}$ into a sabot matrix $\Sigma$ and compute the companion nullspace for $\mathbf{V}$.

  7. With $\mathbf{S}$ and $\color{blue}{\mathbf{U}^{*}_{\mathcal{R}}}$ in hand, construct the column vectors of $\color{blue}{\mathbf{V}^{*}_{\mathcal{R}}}$ using $$ \left[ \color{blue}{\mathbf{V}_{\mathcal{R}}} \right]_{k} = \sigma_{k}^{-1} \left[ \color{blue}{\mathbf{U}^{*}_{\mathcal{R}}} \mathbf{A} \right]_{k} $$ Notice the reciprocal of the singular value.

  8. Optional: Want $\color{red}{\mathbf{V}_{\mathcal{N}}}$? Apply Gram-Schmidt orthogonalization to the column vectors of $\color{red}{\mathbf{V}_{\mathcal{N}}}$.

  9. Invert the diagonal matrix $\mathbf{S}$: $$ \mathbf{S} = \left[ \begin{array}{ccc} \sigma_{1} \\ & \sigma_{2} \\ && \ddots \\ &&& \sigma_{\rho} \end{array} \right] % \qquad \Rightarrow \qquad % \mathbf{S}^{-1} = \left[ \begin{array}{ccc} \frac{1}{\sigma_{1}} \\ & \frac{1}{\sigma_{2}} \\ && \ddots \\ &&& \frac{1}{\sigma_{\rho}} \end{array} \right] $$ Again, the reciprocal of the singular values.

  10. Form the Hermitian conjugate by taking the transpose of the conjugate or the conjugate of the transpose: $$ \color{blue}{\mathbf{V}^{*}_{\mathcal{R}}} = \color{blue}{\overline{\mathbf{V}}_{\mathcal{R}}^{\mathrm{T}}} = \color{blue}{\overline{\mathbf{V}^{\mathrm{T}}_{\mathcal{R}}}} $$

Construction of the pseudoinverse

  1. Assemble $$ \mathbf{A}^{\dagger} = \color{blue}{\mathbf{V}_{\mathcal{R}}} \, \mathbf{S}^{-1} \, \color{blue}{\mathbf{U}^{*}_{\mathcal{R}}} = \left[ \begin{array}{cc} \color{blue}{\mathbf{U}_{\mathcal{R}}} & \color{red}{\mathbf{U}_{\mathcal{N}}} \end{array} \right] % Sigma \left[ \begin{array}{cccccc} \mathbf{S}^{-1} & \mathbf{0} \\ \mathbf{0} & \mathbf{0} \end{array} \right] % V \left[ \begin{array}{c} \color{blue}{\mathbf{V}_{\mathcal{R}}}^{*} \\ \color{red}{\mathbf{V}_{\mathcal{N}}}^{*} \end{array} \right] $$
dantopa
  • 10,342