This is a follow up question to: Gradients of marginal likelihood of Gaussian Process with squared exponential covariance, for learning hyper-parameters.
Given a covariance function:
$K(x,x') = \sigma^2\exp\big(\frac{-(x-x')^T(x-x')}{2l^2}\big)$
The gradient with respect to $l$ is:
$\frac{\partial K}{\partial l} = \sigma^2\exp\big(\frac{-(x-x')^T(x-x')}{2l^2}\big) \frac{(x-x')^T(x-x')}{l^3}$
Assuming $x$ and $x'$ are vectors of length $m$, $l$ can be made into a vector of length $m$ and the relevance of each element in $x$ can be learned (automatic relevance determination). My question is if $l$ is a vector how is the gradient for each $l_{i}$ calculated? It may be obvious but I am getting confused by the matrix calculus notation and appreciate the help.