Finally got to a solution, that I post here for the convenience of future readers.
We have to look more broadly at the likelihood in the sample, and then revert back to the original problem of maximizing each log marginal likelihoods. Indeed, suppose that your ultimate objective is to maximize the sample log likelihood, or, analogously, minimize the additive inverse of the same function. Let's minimize its additive inverse (i.e. -log likelihood).
We know that, assuming the process is $y_{t}=A_{t}^{1/2}\epsilon_{t}$ with iid $\epsilon_{t}=N(0,I)$ and positive definite $A_{t}$ for every t, then $ y_{t}y_{t}^{T}=A_{t}^{1/2}\epsilon_{t}\epsilon_{t}^{T}A^{1/2T}$ which implies $E_{t-1}(y_{t}y_{t}^{T})=A_{t}$, therefore we can write $y_{t}y_{t}^{T}=A_{t}+U$ where U is such that $E_{t-1}(U)=0$ and $E(U)=0$. Therefore the –log likelihood to be minimized wrt $\Sigma_{t}$ can be re-written as
$$-\log p(y|X) = \sum_{t=1}^{N} \frac{1}{2}y_{t}^T\Sigma_{t}^{-1}y_{t} + \frac{1}{2}\log|\Sigma_{t}|+\frac{n}{2}\log2\pi = \sum_{t=1}^{N} \frac{1}{2}\log|\Sigma_{t}|+\frac{n}{2}\log2\pi + \frac{1}{2}Tr(\Sigma_{t}^{-1}y_{t} y_{t}^{T})$$
Substituting $y_{t}y_{t}^{T}=A_{t}+U$
$$-\log p(y|X) = \sum_{t=1}^{N} \frac{1}{2}\log|\Sigma_{t}|+\frac{n}{2}\log2\pi + \frac{1}{2}Tr(\Sigma_{t}^{-1} (A_{t}+U))$$
Notice that $Tr(\Sigma_{t}^{-1} (A_{t}+U)) = Tr(\Sigma_{t}^{-1} A_{t}+\Sigma_{t}^{-1} U) = Tr(\Sigma_{t}^{-1} A_{t})+Tr(\Sigma_{t}^{-1} U)$. Notice also that, since Tr is a linear operator and commutes with expectations (see for example this), then the followinhg holds $E(Tr(\Sigma_{t}^{-1}U))= Tr(E(\Sigma_{t}^{-1})E(U))$. Assuming that $ E(U)=0$ holds in the sample (i.e. $ 1/N \sum_{t=1}^{N} U_{t} =0 \rightarrow \sum_{t=1}^{N} Tr(\Sigma_{t}^{-1} U)=0$ with some neglectable rounding in a large well-behaved sample), then we could write a proxy for the -log likelihood as:
$$-\log p(y|X) = \sum_{t=1}^{N} \frac{1}{2}\log|\Sigma_{t}|+\frac{n}{2}\log2\pi + \frac{1}{2}Tr(\Sigma_{t}^{-1} A_{t})$$
Taking the gradient wrt a generic $\Sigma_{t}$ and setting it equal to 0, we get:
$$ \Sigma_{t}=A_{t}$$
Where $A_{t}$ is pd by assumption. Notice also that taking and additional derivation, the Hessian is $A_{t}$ which is pd by assumption, so we have found a minimum for the negative log marginal likelihood, after rounding the expression of the sample negative log likelihood. So we do not need to impose the constraint $det(\Sigma_{t})>0$ to show that the choice of $A_{t}$ is the choice of $\Sigma_{t}$ that minimizes the –log lik subject to the requirement that $\Sigma_{t}$ is pd for every t.