I'm learning about Support Vector Machines(SVM). I understood that their objective is to maximize the margin between the decision line and the closest point to it. For simplicity, let's assume we have a linear SVM and linearly separable data X. We would like to find the maximum distance D and vector W, such that
$\dfrac{W^T \cdot X_i+W_0}{\|W\|}\geq D$ for all $X_i$ in the first class
$\dfrac{W^T \cdot X_i+W_0}{\|W\|}\leq-D$ for all $X_i$ in the second class.
Why is the problem equivalent to maximizing $\frac{1}{\|W\|}$, given the following constraints are satisfied:
$W^T\cdot X_i+W_0\geq1$ for all $X_i$ in the first class
$W^T\cdot X_i+W_0\leq-1$ for all $X_i$ in the second class.
I'm more interested in a formal proof than in the intuition.