0

I am trying to take the derivative

$$\dfrac{d}{d\mathbf{x}}[\lambda( \mathbf{x}^T \mathbf{x} - 1)]$$

By my reasoning, I get

$$\dfrac{d}{d\mathbf{x}}[\lambda( \mathbf{x}^T \mathbf{x} - 1)] = 2 \lambda \mathbf{x}'$$

However, according to my textbook, the derivative is apparently

$$\dfrac{d}{d\mathbf{x}}[\lambda( \mathbf{x}^T \mathbf{x} - 1)] = 2 \lambda \mathbf{x}$$

I don't understand why the $\mathbf{x}$ are not differentiated in the textbook solution? If my calculation is incorrect, then what is the correct reasoning?

I would appreciate it if people would please take the time to clarify this.

The Pointer
  • 4,182

1 Answers1

1

You're taking the derivative with respect to the vector $\mathbf{x}$, not another variable. Just like how the derivative of $y = x^2$ is not $2x'$, but $2x$. Either way, the derivative cannot be $\mathbf{x}'$ (which I take to mean the derivative of $\mathbf{x}$ with respect to some other variable). Recall in one variable the chain rule: $$\frac{df(g(x))}{dx} = \frac{d(f(g(x))}{dg(x)} \cdot \frac{dg(x)}{dx}$$ so if you wanted to find the derivative with respect to time of $y = x^2$, where $x$ is actually a function of time, we would have $y'(t) = 2x(t) \cdot x'(t)$. I suspect maybe you were trying to apply the chain rule. For example, if you wanted to find $$\frac{d}{dt} (\lambda(\mathbf{x}^T\mathbf{x} - 1))$$ then you would have to apply the chain rule (for multivariate functions) to find that $$\frac{d}{dt} (\lambda(\mathbf{x}^T\mathbf{x} - 1)) = 2\lambda (\mathbf{x} \cdot \mathbf{x}'(t))$$ but even in this case the derivative cannot be $2 \lambda \mathbf{x}'$ (this wouldn't even make sense in this context as we are taking a scalar derivative).

EDIT: The easiest way to see how the chain rule was used is just to write out the matrix function in question. Let $\mathbf{x}(t) = [x_1(t), x_2(t), \cdots, x_n(t)]^T$: $$f(\mathbf{x}) = \lambda(\mathbf{x}^T\mathbf{x} - 1) = \lambda(x_1(t)^2 + \cdots x_n(t)^2 - 1)$$ Then $\frac{df}{dt}$ can be deduced from the chain rule for multivariate functions, which states for some functions $g:\mathbb{R}^m \to \mathbb{R}^l, f: \mathbb{R}^l \to \mathbb{R}^n$ and $\mathbf{x} \in \mathbb{R}^m$, $$D(f(g(\mathbf{x})) = Df(g(\mathbf{x})) \cdot Dg(\mathbf{x})$$ Here, the $D$ is used to designate the Jacobian matrix (the multivariate generalization of the derivative) of a function, and the dot represents matrix multiplication. In our case, this translates to $$\frac{df}{dt} = \frac{df}{d\mathbf{x}} \cdot \frac{d \mathbf{x}}{dt}$$ From before, we know $df/d\mathbf{x} = 2\lambda \mathbf{x}$, and of course $\mathbf{x}'(t) = d\mathbf{x}/dt$. As these are both vectors, we can interpret the matrix multiplication to be the simple dot product.

paulinho
  • 6,553