0

This question is a follow-up of one of my previous questions: Optimizing a vector equation

Let $x$ and $b$ be two vectors of real numbers in k dimensional space.

Let $W$ be a k-by-k matrix of real numbers representing a transformation.

Let $\alpha$ be a scalar value.

We are looking for the optimal alpha, which minimizes the squared Mahalanobis distance between x and the multivariate distribution, $ N(b, \Sigma) $ scaled by $\alpha$:

$$ D = (x - \alpha b)^T (\alpha W)^{-1} (x - \alpha b) $$

The Mahalanobis distance:

$$ D = (x - \mu) \Sigma^{-1} (x - \mu) $$

As it can be seen, in our case, $\mu = \alpha b$ and $\Sigma = \alpha W$

I took the derivative of the Mahalanobis distance, according to Differentiating mahalanobis distance and applied the chain rule to get to $\alpha$:

$$ \eqalign { \frac {dD} {d \alpha} &= \frac {dD} {d \mu} \frac {d \mu} {d \alpha} + \frac {dD} {d \Sigma} \frac {d \Sigma} {d \alpha} \cr &= -2 \Sigma^{-1}(x - \mu) b - \Sigma^{-1} (x - \mu) (x - \mu) \Sigma^{-1} W \cr &= 0 \cr 2 \Sigma^{-1}(x - \mu) b &= -\Sigma^{-1} (x - \mu) (x - \mu) \Sigma^{-1} W}$$

This expression is reduced by multipliing by $\Sigma$ from the left and by expanding the $\Sigma^{-1}$ term on the rightmost side, getting

$$ 2 (x - \mu) b = - (x - \mu) (x - \mu) \alpha^{-1} $$

Expanding the $\mu$ terms on both sides and expanding the parentheses yields

$$ \eqalign { 2xb - 2 \alpha bb &= -(xx - 2 \alpha xb + \alpha^2 bb) \alpha^{-1} \cr 2xb - 2 \alpha bb &= -\alpha^{-1}xx + 2xb - \alpha bb}$$

This can be simplified by adding $2xb + \alpha bb$ to both sides and multipliing by -1 getting

$$ \eqalign { \alpha bb &= \alpha^{-1} xx \cr \alpha &= \sqrt { \frac {||x||^2} {||b||^2} } \cr \alpha &= \frac {||x||} {||b||}}$$

I have the strong intuition that I went wrong somewhere. Could someone please check my calculations?

  • I think your intuition that something went wrong is correct for the simple reason that your result does not depend on $W^{-1}$. Instead of doing such complicated calculations, you should try to develop the first expression that you have for $D$ and then work from there. As a side note you forgot a transpose in your second expression for $D$ (and in the subsequent equations). – M. P. Nov 24 '17 at 14:10
  • For the chain rule you need to use $$\frac{dD}{d\alpha} = \frac{\partial D}{\partial\mu}:\frac{d\mu}{d\alpha} + \frac{\partial D}{\partial\Sigma}:\frac{d\Sigma}{d\alpha}$$ where the colons denote the trace product, i.e. $$A:B={\rm tr}(A^TB)$$ – greg Nov 24 '17 at 17:06
  • Thanks for the answer, one question though: is this trace product equivalent to the hadamard (or entrywise) product? Maybe followed by a summation? – user2729400 Nov 25 '17 at 06:09
  • Yes, you can think of the trace product as a hadamard product followed by a sum over all the elements. Written with explicit summations it looks like this $$A:B=\sum_i\sum_k A_{ik}B_{ik}$$ – greg Nov 25 '17 at 15:04
  • Why $\alpha W$ in $D$ instead of (constant) $W$ ? Consider all scalars, 1d, with $W =1$ and $\alpha \to 0$ . – denis Jan 24 '18 at 10:50
  • Because we are morphing a normal distribution along the line defined by (0, 0), $b$. Not only its center moves, but the shape given by $W$ also changes. – user2729400 Jan 25 '18 at 10:57

1 Answers1

5

Let's use the product notation (:) for the trace, i.e. $$A:B={\rm tr}(A^TB)$$ For convenience, let's define new scalar, vector and matrix variables $$\eqalign{ \beta &= \alpha^{-1} &\implies d\beta=-\beta^2d\alpha \cr z &= \alpha b-x &\implies dz=b\,d\alpha \cr M &= W^{-1} &\implies \beta M=(\alpha W)^{-1} \cr }$$ Now we can find the differential and gradient of the distance directly $$\eqalign{ D &= \beta M:zz^T \cr dD &= \beta M:d(zz^T) + d\beta\,M:zz^T \cr &= 2\beta M:(dz\,z^T) - \beta^2M:zz^T\,d\alpha \cr &= (2\beta Mz:b - \beta^2M:zz^T)\,d\alpha \cr \frac{\partial D}{\partial\alpha} &= 2\beta Mz:b - \beta^2M:zz^T \cr &= 2\beta b^TMz - \beta^2z^TMz \cr }$$ Set the gradient to zero, and multiply by $\alpha^2$ $$\eqalign{ 2\alpha b^TMz &= z^TMz \cr 2\alpha b^TM(\alpha b-x) &= (\alpha b-x)^TM(\alpha b-x) \cr 2\alpha^2 b^TMb - 2\alpha b^TMx &= \alpha^2b^TMb -2\alpha b^TMx + x^TMx \cr \alpha^2 b^TMb &= x^TMx \cr }$$ Yielding this expression for the optimal parameter value $$\eqalign{ \alpha^2 &= \frac{x^TW^{-1}x}{b^TW^{-1}b} \cr\cr }$$ Although it was not stated, I assumed that $W$ is symmetric.

greg
  • 35,825
  • OK, the real problem has a bit more complication in it and to solve it, I'm trying to internalize your calculation, but I can't seem to be able to look up on one sorcery you used. Where do I obtain the powers to do such magic: $dD = \beta M : d(zz^t) + d \beta M : zz^T $ ? This is not simple derivation, you are only obtaining differentials somehow. Does this technique have a name by which I can look it up and read about it? – user2729400 Nov 27 '17 at 05:49
  • Well, it's not magic. The product rule for differentials $$d(A\star B) = dA\star B + A\star dB$$ holds for any type of product, whether its the Kronecker or Hadamard or Schur or Trace or Frobenius or Matrix or Dyadic or Dot or Double-Dot or Triple-Dot product.$$ $$Further, the operands $(A, B)$ can be scalars, vectors, matrices, or tensors -- whatever makes sense for the product under consideration. – greg Nov 27 '17 at 12:59
  • Yes, but you are doing arithmetic on differentials, I thought differentiation, e.g. $ \frac {df} {dx} $ should not be treated as a quotient, but as a symbol as a whole. – user2729400 Nov 29 '17 at 14:49