2

An $m$-dimensional (column) vector $y$ is defined as follows:

$Ay=x+v$,

where $A$ is an $m*n$ matrix with $m<n$ (and full row rank), $x$ is an $m$-dimensional column vector of constants and $v$ is an $m$-dimensional column vector with mean-zero normally distributed elements and diagonal covariance matrix.

If I have understood correctly, if the equation was just $Ay=x$, i.e. without the random vector, it could be approximately solved for $y$ with the pseudoinverse $A^+$ (which here is also the right inverse) such that $y=A^+x$ (approximately).

This works as well for $Ay=x+v$, so that approximately $y=A^+(x+v)$.

In the case without the random vector, I understand that $A^+Ay$ results in a vector $y'$ which is an approximation of $y$ with the property that the Euclidean norm $|| Ay'-x ||$ cannot be made smaller by using any other vector instead of $y'$.

(see e.g. the introductory part of http://arxiv.org/pdf/1110.6882.pdf)

I.e., introducing the variable inverse $M$, and writing the estimate for $y$ in brackets as a function of $Ay$, the Euclidean norm $|| A(MAy)-x ||$ is minimized if $M$ is set to $M=A^+$.

For the interpretation of the pseudoinverse in the case with the random vector, the Euclidean norm from above can be rewritten with $x+v$ instead of $x$ is:

$|| A(MAy)-(x+v) ||$

As noted by Ian in the comments below, $A^+$ depends entirely on $A$.

($A$ has full row rank, so the pseudoinverse here can be computed as $A^*(AA^*)^{-1}$, http://en.wikipedia.org/wiki/Moore%E2%80%93Penrose_pseudoinverse#Definition, which, if the matrix contains only real numbers as assumed here, becomes $A^T(AA^T)^{-1}$ http://planetmath.org/conjugatetranspose).

So with $A^+$ depending only on $A$, $A^+$ minimizes the Euclidean norm $|| A(A^+Ay)-(x+v) ||$ for every realization of $v$.

The pseudoinverse can therefore in general be interpreted as an inverse that provides the approximation $x'$ for vector $x$ (in a vector-matrix equation $Ax=y$), which minimizes the Euclidean norm of the differences between $y$ and its estimate $y'=Ax'$), and if $y$ is random, this holds for every realization of $y$.

Is this correct?

user70160
  • 555
  • @Ian - I am thinking about your answer. As an intermediate step: am I understanding your answer correctly, that the pseudoinverse will contain random variables? In other words, the pseudoinverse is not independent of the realization of $v$ - or $w$ in your notation (I assume $w$ represents the realization of a standard normal variable in your notation?) – user70160 Apr 22 '15 at 13:48
  • 1
    Here $\omega$ denotes the underlying sampling variable of the probability space. So $v$, being a random variable, is a function of $\omega$, which I make explicit by writing $v(\omega)$. Here the pseudoinverse $A^+$ does not depend on any randomization (it is determined entirely by $A$), but $A^+(x+v)$ certainly does. So what you get is a minimizer for each $\omega$ separately. – Ian Apr 22 '15 at 15:03
  • @ Ian Thank you. I introduce a new variable $M$ which is a variable $mxn$ matrix, label the Euclidean norm as $||P||$ and write it as a function of M: $||P||=||My-x-v(w)||$. I understand your answer as follows: no matter which value $w$ and hence $v(w)$ is going to take, $A^+$ is always the same, and this $A^+$ ensures that $||P||=||My-x-v(w)||$ is minimized if $A^+$ is chosen for $M$? I.e. no other matrix $M$ will lead to a lower $||P||$ than $A^+$ - is this a correct understanding of your answer? – user70160 Apr 22 '15 at 16:03
  • What you've written is just slightly incorrect. You have a variable "inverse" $M$, and the relevant norm is $| AM(x-v(\omega))-x-v(\omega) |$. To minimize for each $\omega$ separately, the optimal $M$ is $A^+$, which does not depend on $\omega$. – Ian Apr 22 '15 at 16:09
  • @ Ian thanks again - frankly, I believe I have confused myself now... will have to think about this for a bit.. I have the feeling that your answer before my last two comments is probably sufficient. If you copy your comment and paste it as "answer", I can accept it, so you get the reputation points. – user70160 Apr 22 '15 at 16:17
  • @ Ian - I think I have learned from your answer, and edited the notation in the question. Unfortunately I still don't get it completely - see the final question above. Will keep thinking about it. – user70160 Apr 22 '15 at 21:06
  • @Ian - Think I got it - as you noted, the pseudoinverse depends entirely on $A$ (only). So it's "minimizing property" must hold for all realizations of $v$. Hence the initially asked for interpretation is exactly as in your first comment: the pseudoinverse here leads to minimzation of the Euclidean norm of the distance between estimated and true $y$ - for every realization of $v$ (i.e. no matter the realization of $v$). Many thanks, again! For convenience, I'll summarize this in the original question, and ask for confirmation - for the points, answer with a simple "yes" and I'll accept. – user70160 Apr 22 '15 at 21:43

1 Answers1

1

Given the matrix $ \mathbf{A} \in \mathbb{C}^{m\times n} $, and the data vector $b\in \mathbb{C}^{m}$ which is not in the null space $\color{red}{\mathcal{N}(\mathbf{A})}$, find the solution vector $x\in \mathbb{C}^{n}$, which minimizes the sums of the squares of the residual errors with respect to the $2-$norm $$ x_{LS} = \min_{x\in \mathbb{C}^{n}} \lVert \mathbf{A} x - b \rVert_{2}^{2} $$ This set of minimizers is, in general, an affine set and is depicted by the dashed red line in the figure below. The general solution to the least squares problem is $$ x_{LS} = \color{blue}{\mathbf{A}^{\dagger}b} + \color{red}{\left( \mathbf{I}_{n} - \mathbf{A}^{\dagger}\mathbf{A} \right)y}, \quad y \in \mathbb{C}^{n} $$ Vectors are colored to show they live in a $\color{blue}{range}$ space or a $\color{red}{null}$ space.

Every point on the dashed red line is a minimum. What are the lengths of these vectors? $$ \lVert x_{LS} \rVert_{2}^{2} = \lVert \color{blue}{\mathbf{A}^{\dagger}b} + \color{red}{\left( \mathbf{I}_{n} - \mathbf{A}^{\dagger}\mathbf{A} \right)y} = \lVert \color{blue}{\mathbf{A}^{\dagger}b}\rVert_{2}^{2} + \lVert \color{red}{\left( \mathbf{I}_{n} - \mathbf{A}^{\dagger}\mathbf{A} \right)y} \rVert_{2}^{2} $$ What is the solution vector of minimum length? Set $y$ to cancel the null space terms and we are left with $$ \color{blue}{x_{LS}} = \color{blue}{\mathbf{A}^{\dagger}b}, $$ the point where the locus of minimizers punctures $\color{blue}{\mathcal{R}(\mathbf{A}^{*})}$.

solution

dantopa
  • 10,342