3

I'm trying to derive the optimization objective for an SVM (namely $1/\|w\|$), but I'm running into a little trouble. I've already read this question, which has certainly offered a lot of insight into the problem, but I would like to know why my own derivation doesn't quite work.

So essentially, let's say our support vectors on either side of the decision boundary are $\vec x_+$ and $\vec x_-$. Suppose the decision boundary itself is $\vec w\cdot\vec x + b = 0$. We have $\vec w\cdot x_\pm + b = \pm 1$. Now, the margin $d$ can be calculated as the sum of the distances between the points $\vec x_\pm$ and the decision boundary. We can use the typical projection formula to obtain

$$ \begin{aligned} d & = \left\|\vec w\frac{\vec x_+\cdot\vec w}{\vec w\cdot\vec w}\right\| + \left\|\vec w\frac{\vec x_-\cdot\vec w}{\vec w\cdot\vec w}\right\| \\ & = \frac{\|\vec w\|}{\|\vec w\|^2}(|\vec x_+\cdot\vec w| + |\vec x_-\cdot\vec w|) \\ & = \frac1{\|\vec w\|}(|1-b|+|1+b|) \end{aligned} $$

This is almost the correct result, and in fact if we restrict $b\in[-1, 1]$, $d$ reduces to $2/\|\vec w\|$ as expected. However, in general this can be written: $$ d = \frac{2\max(1,|b|)}{\|w\|} $$ So my question essentially is why did I end up with the extra b in the numerator? Does it change the optimization objective at all (I feel like it does)? Is there a reasonable explanation for why we could potentially restrict $b\in[-1,1]$?

user3002473
  • 133
  • 6

1 Answers1

1

The first line of your equation for $d$ is incorrect. It should be

$$d = \left\|\vec w\frac{\vec x_+\cdot\vec w}{\vec w\cdot\vec w} - \vec w\frac{\vec x_-\cdot\vec w}{\vec w\cdot\vec w}\right\|.$$

(Why? The distance between two points $u,v$ is $\|u-v\|$, not $\|u\|+\|v\|$.)

If you continue the derivation from there, you will obtain the result you were expecting.

D.W.
  • 159,275
  • 20
  • 227
  • 470
  • Yeah, I don't really know what I was thinking, but the formula I was trying to use was the perpendicular distance from the point to the plane $\vec w\cdot\vec x + b = 0$, but all I was doing was projecting $\vec x_\pm$ onto $\vec w$. If I had calculated $$ d = \frac{|\vec w\cdot\vec x_+ + b|}{|\vec w|^2} + \frac{|\vec w\cdot\vec x_- + b|}{|\vec w|^2} $$ I think it would've also been correct, but I'll have to check that on my own time. Thanks for the help! – user3002473 Sep 17 '18 at 10:57