0

I'm trying to do gradient descent to approximate the point that minimizes the error for a system of $N+k$ equations with $N$ variables, for large values of $N$ and $k$. We're trying to minimize the value of $||Ax - B||^2$, where $A$ is the coefficient matrix with dimensions $[(N + k), N]$, $x$ is the coordinate vector with $N$ entries, and $B$ is the solution vector with $N + k$ entries. Geometrically, this would be the point that minimizes the total distance from some given $3 + k$ planes, for the case $N = 3$. I've looked into the conjugate gradient method, but can't figure out how to amend the algorithm for the additional $k$ equations. The residual vector has $N + k$ entries, while the guesses and search directions should only have $N$ entries, and conjugate gradient assumes they are the same size. I've thought of finding all $N + k\choose N$ intersections and averaging those points, unsure if that would work or if there's a simpler way.

KReiser
  • 65,137
MRoads
  • 3
  • How big are $N$ and $k$, and does this need to be done quickly live, or can it take a few seconds or minutes to do one run? Also what language does this need to be implemented in? – Ian Aug 09 '20 at 21:38
  • They're both around 100. Time isn't a huge factor, minutes would be fine. Just need a fairly accurate result. – MRoads Aug 09 '20 at 21:41
  • Conjugate gradient doesn't make sense here as the matrix isn't square, cannot be symmetric, etc. Gradient descent is different. Here you would want to do gradient descent with line search. – Jürgen Sukumaran Aug 09 '20 at 21:42
  • At that scale you can do any number of standard things. In Matlab or Octave just A\b would be fine. Alternately, if you write $A=QR$ as a QR decomposition (obtained by Gram-Schmidt or, ideally, Householder reflection) then you can solve $R^T y = b$ and then $Rx=y$. There's a similar way to do it with the SVD. Any such thing will be faster and more accurate than a from-scratch gradient descent setup. – Ian Aug 09 '20 at 21:47

1 Answers1

0

The domain of your problem is $\mathbb{R}^N$, since you want to find $x$. And the range is only $\mathbb{R}$, since you are minimizing $||Ax-B||^2$. Therefore you want to use the G-D to minimize $$ L(x) = ||Ax-B||^2. $$

We want to perform $x_{n+1} = x_n - \gamma_n \nabla L(x_n)$, where $x_{n+1}, x_n, \nabla L(x_n) \in \mathbb{R}^N$ and $\gamma_n\in\mathbb{R}$. The value of $k$ does not matter.

To see how to evaluate $\nabla L(x)$ see this question.

Lucas Resende
  • 1,286
  • 9
  • 21
  • That is a million dollar question! There is no general good choice for it. The step size has a fancy name: learning rate. There are a lot of articles arguing how to choose $\gamma_n$: https://scholar.google.com.br/scholar?hl=pt-BR&as_sdt=0%2C5&q=learning+rate+gradient+descent&btnG=&oq=learning+rate+gradi. – Lucas Resende Aug 09 '20 at 22:28