I want to determine how the inclusion of new data effects hyperparameters of the Gaussian Process kernel. For reference assuming square exponential kernels as provided here: $$K(x,x') = \sigma^2\exp\left(\frac{-(x-x')^T(x-x')}{2l^2}\right)$$ So the derivative with respect to length scale determines what the effect to the kernel when the lengthscale changes as follows: $$\frac{\partial K}{\partial l} = \sigma^2\exp\big(\frac{-(x-x')^T(x-x')}{2l^2}\big) \frac{(x-x')^T(x-x')}{l^3}$$
I however would like to determine what is the change or effect of a single new data point to the lengthscale. What should be the symbolic expression I need to evaluate the derivative of?
Is it $$\frac{\partial l}{\partial \mu}$$ of the GP? where $\mu$ is the predictive mean of the GP as follows:
$$\mu(x^*)=K(x^*,X)^\top[K(X,X)+\sigma_n^2\mathbf{I}]^{-1} \mathbf{y_n}$$ If so how can the derivative expression be formulated. (Initial expression atleast, I should be able to workout derivitave from there itself)
Finally, if you care about the change of NLML(x,x¯) with respect to the position of x¯ we could analytically compute that derivative fairly easily (but I'll wait for feedback from you before I right it all out).
" Can I request see that analytical derivative? This is a derivative with respect to some new arbitrary data point right? – GENIVI-LEARNER Feb 10 '20 at 09:27