Matrix Calculus in Least-Square method

Question

In the proof of matrix solution of Least Square Method, I see some matrix calculus, which I have no clue. Can anyone explain to me or recommend me a good link to study this sort of matrix calculus?

In Least-Square method, we want to find such a vector $x$ such that $||Ax-b||$ is minimized.

Assume $r=Ax-b$

$\Rightarrow\|r\|^2=x^TA^TAx-2b^TAx+b^Tb$

$\Rightarrow \nabla_x \|r\|^2=2A^TAx-2A^Tb$

In the end we set the gradient to zero and find the minimized solution. I understand the whole idea, but I just don't know how exactly we did matrix calculus here, or say I don't know how to do the matrix calculus here. For example, can anyone tell me how we got those transpose in $\|r\|^2$(By what rule?) and how we got the gradient?(how do we take the gradient exactly in matrix format)?

I'll really appreciate if you can help me out. Thanks!

Thomas · Answer 1 · 2013-04-22T20:44:23.977

11

Well the first step is the definition of $||r||^2$. This is easy \begin{align} ||r||^2 & = \langle r,r \rangle = r^T r \\ &= (Ax-b)^T(Ax-b) = (x^TA^T-b^T)(Ax-b) \\ &= x^TA^TAx -x^TA^Tb-b^TAx +b^Tb \\ &= x^TA^TAx -(b^TAx)^T -b^TAx +b^Tb \end{align} Since $(b^TAx)$ is a scalar it holds $(b^TAx)⁼ (b^TAx)^T$ Thus \begin{align} ||r||^2 & = x^TA^TAx -2b^TAx +b^Tb \end{align}

And for the derivatives, you could take a look here. Another approach would be to write out the matrix-vector expressions in sumation form and calculate the derivative, then no matrices are involved.

edited Apr 22 '13 at 20:44

answered Apr 22 '13 at 20:37

Thomas

4,363

Jo, thanks! I got it. Both of you are great :D – Cancan Apr 22 '13 at 21:11
It really looks like the combination of our two answers cover both points ;) – Thomas Apr 23 '13 at 05:05
This is the only explanation I can find that actually shows the full working, I would upvote this twice if I could. thanks !!! – nixon Feb 27 '19 at 08:44
2

What is this expression: ()⁼() ? Is that a dot or circle in the middle? And how do you know that () is a scalar? – heretoinfinity Aug 27 '20 at 10:58

score 9 · Answer 2 · answered Apr 22 '13 at 20:38

9

Below are the matrix/vector derivative rules, you will need.

$$\dfrac{d(x^TBx)}{d x_i} = \dfrac{d}{dx_i}\left(\sum_{j,k} x_j B_{jk}x_k\right) = \sum_{j} x_j B_{ji} + \sum_{k}B_{ik} x_k = \sum_{k}\left(B^T + B\right)_{ik}x_k$$ Hence, we have $$\dfrac{d(x^TBx)}{d x} = (B^T+B)x$$ Similarly, we have $$\dfrac{d(c^Tx)}{d x_i} = \dfrac{d}{d x_i}\left(\sum_k c_k x_k\right) = c_i$$ Hence, we have $$\dfrac{d(c^Tx)}{dx} = c$$ Now you should be able to get what you want.

answered Apr 22 '13 at 20:38

Thanks，I got your point! really helpful! – Cancan Apr 22 '13 at 21:07
1

I didn't get the point for the first derivative. @Cancan can you help ? – Taylor Aug 12 '17 at 09:03

score 3 · Answer 3 · answered May 22 '19 at 02:09

All we need here is multivariable calculus, not matrix calculus. Let $f(x) = (1/2) \| Ax - b \|^2$. Notice that $f(x) = g(h(x))$, where $h(x) = Ax - b$ and $g(u) = (1/2) \|u \|^2$. It can easily be seen that the derivatives of $g$ and $h$ are $$ h'(x) = A, \qquad g'(u) = u^T. $$ From the multivariable chain rule, we have \begin{align*} f'(x) &= g'(h(x)) h'(x) \\ &= (Ax - b)^T A. \end{align*} If we use the convention that $\nabla f(x)$ is a column vector, then \begin{align} \nabla f(x) &= f'(x)^T \\ &= A^T (Ax - b). \end{align}

score 1 · Answer 4 · answered Apr 06 '19 at 16:37

The problem is to calculate $\nabla_xL$ given the following $$\eqalign{ r &= A\cdot x-b \cr L &= r\cdot r \cr }$$ First, calculate the differential. Then change the independent variable from $r\to x$ $$\eqalign{ dL &= 2r\cdot dr \cr &= 2r\cdot (A\cdot dx) \cr &= (2A^T\cdot r)\cdot dx \cr \nabla_xL &= (2A^T\cdot r) \cr&= 2A^T\cdot(A\cdot x-b) \cr }$$

Matrix Calculus in Least-Square method

4 Answers4

Linked

Related