Maximization of log likelihood wrt $\Sigma_{t}$ subject to $\Sigma_{t}$ is positive definite

Question

Given the random vector $y_{t}=A_{t}^{1/2}\epsilon_{t}$ with $\epsilon_{t}=N(0,I)$, where $A_{t}$ is a positive definite covariance matrix for every t, I want to minimize the negative log marginal likelihood for every t: $$-\log p(y|X) = \frac{1}{2}y_{t}^T\Sigma_{t}^{-1}y_{t} + \frac{1}{2}\log|\Sigma_{t}|+\frac{n}{2}\log2\pi$$

wrt to $\Sigma_{t}$ (notice that, for every t, $\Sigma_{t}$ is indeed allowed to change). Setting the gradient to $0$, for a generic t, I get:

$$\Sigma_{t}=y_{t}y_{t}^{T}=A_{t}^{1/2}\epsilon_{t}\epsilon_{t}^{T}A_{t}^{1/2T}$$

which clearly is not positive definite because is the outer product of a vector $y_{t}$ so it has rank 1.

So I should impose the constraint $det(\Sigma_{t})>0$ and solve the constrained optimum with the Lagrangian. However I know that $E(\epsilon_{t}\epsilon_{t}^{T})=I$, so that, on average for a generic observation of $y_{t}y_{t}^{T}$, I will have have that $\epsilon_{t} \epsilon_{t}^{T}=I$ and therefore setting $\Sigma_{t}=A_{t}^{1/2}A_{t}^{1/2T}$ will maximize the log-likelihood of a generic observation under $det(\Sigma_{t})>0$.

Is there an elegant way of saying so and avoid solving the maximization with the Lagrangian and the constraint $det(\Sigma_{t})>0$?

$\log \det$ is concave, thus $-\log \det$ is convex. How do you know you're maximizing a concave function ? — Gabriel Romon, Aug 23 '19 at 21:32
Gabriel suppose (as edited) I want to minimize the following (edited) function. if I take the derivative of the gradient wrt to $\Sigma$ I now get $yy^{T}$ which is psd, therefore that is a minimum. — JMallin, Aug 23 '19 at 22:08
Ok thanks, anyway the problem to me is: how can I show that this is a min st $det \Sigma>0$. As an alternative do you have the derivative of $det(\Sigma)$? But I would prefer to just go without and show that for a generic observation $yy^{T}=A$ Holds so I can take $\Sigma=A$.. many thanks in advance Gabriel — JMallin, Aug 23 '19 at 22:22

score 2 · Answer 1 · answered Aug 24 '19 at 18:57

Finally got to a solution, that I post here for the convenience of future readers.

We have to look more broadly at the likelihood in the sample, and then revert back to the original problem of maximizing each log marginal likelihoods. Indeed, suppose that your ultimate objective is to maximize the sample log likelihood, or, analogously, minimize the additive inverse of the same function. Let's minimize its additive inverse (i.e. -log likelihood).

We know that, assuming the process is $y_{t}=A_{t}^{1/2}\epsilon_{t}$ with iid $\epsilon_{t}=N(0,I)$ and positive definite $A_{t}$ for every t, then $ y_{t}y_{t}^{T}=A_{t}^{1/2}\epsilon_{t}\epsilon_{t}^{T}A^{1/2T}$ which implies $E_{t-1}(y_{t}y_{t}^{T})=A_{t}$, therefore we can write $y_{t}y_{t}^{T}=A_{t}+U$ where U is such that $E_{t-1}(U)=0$ and $E(U)=0$. Therefore the –log likelihood to be minimized wrt $\Sigma_{t}$ can be re-written as $$-\log p(y|X) = \sum_{t=1}^{N} \frac{1}{2}y_{t}^T\Sigma_{t}^{-1}y_{t} + \frac{1}{2}\log|\Sigma_{t}|+\frac{n}{2}\log2\pi = \sum_{t=1}^{N} \frac{1}{2}\log|\Sigma_{t}|+\frac{n}{2}\log2\pi + \frac{1}{2}Tr(\Sigma_{t}^{-1}y_{t} y_{t}^{T})$$

Substituting $y_{t}y_{t}^{T}=A_{t}+U$ $$-\log p(y|X) = \sum_{t=1}^{N} \frac{1}{2}\log|\Sigma_{t}|+\frac{n}{2}\log2\pi + \frac{1}{2}Tr(\Sigma_{t}^{-1} (A_{t}+U))$$

Notice that $Tr(\Sigma_{t}^{-1} (A_{t}+U)) = Tr(\Sigma_{t}^{-1} A_{t}+\Sigma_{t}^{-1} U) = Tr(\Sigma_{t}^{-1} A_{t})+Tr(\Sigma_{t}^{-1} U)$. Notice also that, since Tr is a linear operator and commutes with expectations (see for example this), then the followinhg holds $E(Tr(\Sigma_{t}^{-1}U))= Tr(E(\Sigma_{t}^{-1})E(U))$. Assuming that $ E(U)=0$ holds in the sample (i.e. $ 1/N \sum_{t=1}^{N} U_{t} =0 \rightarrow \sum_{t=1}^{N} Tr(\Sigma_{t}^{-1} U)=0$ with some neglectable rounding in a large well-behaved sample), then we could write a proxy for the -log likelihood as: $$-\log p(y|X) = \sum_{t=1}^{N} \frac{1}{2}\log|\Sigma_{t}|+\frac{n}{2}\log2\pi + \frac{1}{2}Tr(\Sigma_{t}^{-1} A_{t})$$ Taking the gradient wrt a generic $\Sigma_{t}$ and setting it equal to 0, we get: $$ \Sigma_{t}=A_{t}$$ Where $A_{t}$ is pd by assumption. Notice also that taking and additional derivation, the Hessian is $A_{t}$ which is pd by assumption, so we have found a minimum for the negative log marginal likelihood, after rounding the expression of the sample negative log likelihood. So we do not need to impose the constraint $det(\Sigma_{t})>0$ to show that the choice of $A_{t}$ is the choice of $\Sigma_{t}$ that minimizes the –log lik subject to the requirement that $\Sigma_{t}$ is pd for every t.

Maximization of log likelihood wrt $\Sigma_{t}$ subject to $\Sigma_{t}$ is positive definite

1 Answers1