4

I have the following cubic equation

$$x^3-6x^2+11x-6=(x-1)(x-2)(x-3)=0$$

whose solutions are $x=1,2,3$.

What if the above equation were represented by quadratic form? Let $\mathbf x = \begin{bmatrix} x^2 & x & 1\end{bmatrix}^T$. Can the cubic equation above be represented as follows?

$$x^3-6x^2+11x-6=\begin{bmatrix}x^2&x&1\end{bmatrix}\begin{bmatrix}0&b&c\\d&e&f\\g&h&-6\end{bmatrix}\begin{bmatrix}x^2\\x\\1\end{bmatrix} = \textbf{x}^T \textbf{A} \textbf{x}$$

There are many possibilities to form matrix $A$. For instance the two matrices below

$$\textbf{A}_1 = \begin{bmatrix}0&0.5&0\\0.5&-6&5.5\\0&5.5&-6\end{bmatrix}\quad\quad\quad\textbf{A}_2 = \begin{bmatrix}0&0&0\\1&-3&6\\-3&5&-6\end{bmatrix}$$

Is there any study about finding the homogeneous solution, $\textbf{x}^T \textbf{A} \textbf{x}=0$? Thank you in advance.

actlee
  • 376

3 Answers3

3

Since we have a cubic polynomial, we can use a $2 \times 3$ matrix (or a $3 \times 2$ matrix) instead of a $3 \times 3$ matrix. Hence,

$$x^3 - 6 x^2 + 11 x - 6 = \begin{bmatrix} 1\\ x\end{bmatrix}^\top \begin{bmatrix} -6 & t_1 & t_2\\ 11-t_1 & -6-t_2 & 1\end{bmatrix} \begin{bmatrix} 1\\ x\\ x^2\end{bmatrix}$$

We would like to find parameters $t_1, t_2 \in \mathbb R$ such that the $2 \times 3$ matrix above is rank-$1$. If the matrix is rank-$1$, then the 1st row is a multiple of the 2nd row, i.e.,

$$\begin{array}{rl} -6 &= t_2 (11 - t_1)\\ t_1 &= t_2 (-6-t_2)\end{array}$$

which yields the cubic equation

$$t_2^3 + 6 t_2^2 + 11 t_2 + 6 = 0$$

Once we have $t_2$, the value of $t_1$ can be found via

$$t_1 = - t_2 (6+t_2)$$

However, what is the point? Initially, we wanted to find the roots of a cubic polynomial. Now we want to find the roots of another cubic polynomial. The problem is the same, only the signs of two coefficients have been flipped.

Given the "niceness" of the coefficients of the latter cubic polynomial, it is not hard to conclude that one of the roots of this cubic polynomial is $t_2 = -1$. Hence, $t_1 = 5$. One admissible $2 \times 3$ matrix is

$$\begin{bmatrix} -6 & 5 & -1\\ 6 & -5 & 1\end{bmatrix} = \begin{bmatrix} -1\\ 1\end{bmatrix} \begin{bmatrix} 6 & -5 & 1\end{bmatrix}$$

Thus, $x-1$ is a factor of the original cubic polynomial. We are left with a quadratic polynomial

$$x^2 -5 x + 6$$

whose roots can be found using the quadratic formula. The roots are $2$ and $3$. Hence,

$$x^3 - 6 x^2 + 11 x - 6 = (x-1) (x-2) (x-3)$$

However, the original cubic polynomial's coefficients are even "nicer", as they sum to zero. Why use matrices at all? Why introduce parameters? Why tune the parameters until one has a rank-$1$ matrix?


Factoring via SDP

It would be very, very nice if we could find a $2 \times 3$ rank-$1$ matrix with the desired properties, but without solving a system of polynomial equations. One option would be numerical optimization.

Consider the following equality-constrained rank-minimization problem in $\mathrm Q \in \mathbb R^{2 \times 3}$

$$\begin{array}{ll} \text{minimize} & \mbox{rank} (\mathrm Q)\\ \text{subject to} & q_{11} = -6\\ & q_{21} + q_{12} = 11\\ & q_{22} + q_{13} = -6\\ & q_{23} = 1\end{array}$$

Unfortunately, minimizing the rank is computationally hard. Thus, let us minimize the nuclear norm of $\mathrm Q$ instead, which we denote by $\| \mathrm Q \|_*$ and which is a convex proxy for rank and, thus, computationally easy. Hence,

$$\begin{array}{ll} \text{minimize} & \| \mathrm Q \|_*\\ \text{subject to} & q_{11} = -6\\ & q_{21} + q_{12} = 11\\ & q_{22} + q_{13} = -6\\ & q_{23} = 1\end{array}$$

which can be cast as a semidefinite program (SDP). Using Python + NumPy + CVXPY:

from cvxpy import *
import numpy as np

# define matrices
M0 = np.array([(1,0,0), (0,0,0)])
M1 = np.array([(0,1,0), (1,0,0)])
M2 = np.array([(0,0,1), (0,1,0)])
M3 = np.array([(0,0,0), (0,0,1)])

# matrix variable
Q = Variable(2,3)

# objective function
objective = Minimize( norm(Q,"nuc") )

# constraints
constraints = [ trace(M0.T * Q) == -6,
                trace(M1.T * Q) == 11,
                trace(M2.T * Q) == -6,
                trace(M3.T * Q) ==  1 ]

# create optimization problem
prob = Problem(objective, constraints)

# solve optimization problem
solution = prob.solve()

print Q.value

This script produces the following output

[[-6.00198563  5.14853065 -1.00979949]
 [ 5.85031083 -4.99445918  0.99856348]]

Rounding to the nearest integer, we again obtain the rank-$1$ matrix

$$\color{blue}{\begin{bmatrix} -6 & 5 & -1\\ 6 & -5 & 1\end{bmatrix}} = \begin{bmatrix} -1\\ 1\end{bmatrix} \begin{bmatrix} 6 & -5 & 1\end{bmatrix}$$

Again, we conclude that $x-1$ is a factor of the original cubic polynomial.


Addendum

Let us now factor the quadratic polynomial $x^2 - 5 x + 6$. Using Python + NumPy + CVXPY again:

from cvxpy import *
import numpy as np

# define matrices
M0 = np.array([(1,0), (0,0)])
M1 = np.array([(0,1), (1,0)])
M2 = np.array([(0,0), (0,1)])

# matrix variable
Q = Variable(2,2)

# objective function
objective = Minimize( norm(Q,"nuc") )

# constraints
constraints = [ trace(M0.T * Q) ==  6,
                trace(M1.T * Q) == -5,
                trace(M2.T * Q) ==  1 ]

# create optimization problem
prob = Problem(objective, constraints)

# solve optimization problem
solution = prob.solve()

print Q.value

This script produces the following output

[[ 5.99839973 -2.5001549 ]
 [-2.5001549   0.99808956]]

Rounding to the nearest integer is now not so straightforward. Should $-2.5001549$ be rounded up or down? There are $4$ possibilities, $2$ of which are admissible, i.e., do produce a rank-$1$ matrix, namely,

$$\color{blue}{\begin{bmatrix} 6 & -2\\ -3 & 1\end{bmatrix}} = \begin{bmatrix} -2\\ 1\end{bmatrix} \begin{bmatrix} -3 & 1\end{bmatrix}$$

$$\color{blue}{\begin{bmatrix} 6 & -3\\ -2 & 1\end{bmatrix}} = \begin{bmatrix} -3\\ 1\end{bmatrix} \begin{bmatrix} -2 & 1\end{bmatrix}$$

each of which allows us to conclude that $x^2 - 5 x + 6 = (x-2) (x-3)$. Lastly,

$$x^3 - 6 x^2 + 11 x - 6 = (x-1) (x-2) (x-3)$$

1

The simplest solution is to use the 3 homogeneous coordinates $\pmatrix{ x^2 & x & 1}$ and a 3×3 symmetric matrix

$$A x^3 + B x^2 + C x + D = \pmatrix{x^2 \\x \\1}^\top \begin{bmatrix} 0 & \frac{A}{2} & 0 \\ \frac{A}{2} & B & \frac{C}{2} \\ 0 & \frac{C}{2} & D \end{bmatrix} \pmatrix{x^2 \\x \\1} $$

But the formal solution is actually to use 4 homogeneous coordinates $\pmatrix{ x^3 & x^2 & x & 1}$ and a 4×4 symmetrix matrix

$$A x^3 + B x^2 + C x + D = \pmatrix{x^3 \\ x^2 \\x \\1}^\top \begin{bmatrix} 0 & 0 & 0 & 0\\ 0& 0 & \frac{A}{2} & 0 \\ 0 & \frac{A}{2} & B & \frac{C}{2} \\ 0 & 0 & \frac{C}{2} & D \end{bmatrix} \pmatrix{x^3 \\ x^2 \\x \\1} $$

The reason is that you have to treat $x^3$ as a separate variable from $x^2$, $x$ and $1$. The coefficients are found by matching and taking successive derivatives of both sides of the expression.

Additional constraints I have imposed (besides the matrix being symmetric) is to make it as diagonal as possible, by making the most off-diagonal terms zero.

John Alexiou
  • 13,816
  • I am curious to know of any application where your rewrite might lead to a faster computation. I am thinking more in terms polynomials of matrices, parallelism and number of slow memory operations, than the original situation. – Carl Christian Jul 26 '17 at 21:55
  • Modern compilers (Fortran) do a good job at vectorization with SIMD even for what appears to be scalar expressions. But there is application to this formalism when it comes to transforming the coordinate with $x \rightarrow x+c$ and getting new $A$, $B$, $C$ and $D$ coefficients quickly. – John Alexiou Jul 26 '17 at 23:30
  • @ja72 Thank you for your answer. Even $x^3$ is introduced as a separate variable, it looks there is no difference whether $x^3$ is taken into account or not. Can you give me a little more explanation? – actlee Jul 27 '17 at 09:24
0

With $\mathbb x=[x,y,1]^T$, the equation $\mathbb x^TA\mathbb x=0$ defines a conic, which corresponds to a cone in homogeneous coordinates $\mathbb x=[x,y,z]^T$.

Now when you consider the curve $\mathbb x=[x^2,x,1]^T$, you have the parabola $x=y^2$ in the plane of the conic.

The intersection of the parabola and the conic is made of up to four points and this is no surprise as $[x^2,x,1]A[x^2,x,1]^T=0$ defines a quartic equation.

Presumably, in the case the quartic degenerates to a cubic, the parabola tangents the conic.

  • Thank you for your answer. The explanation with geometrical interpretation is very impressive for me. I think I need to have a look about the relationship between conic sections, quadratic forms. – actlee Jul 27 '17 at 09:41