Consider a simple Ornstein–Uhlenbeck process $X(t)$:
$$ \mathrm d X(t) = - X(t) \, \mathrm dt + \sqrt{2} \, \mathrm dW(t). \tag{1} $$
If we apply Itô's lemma in its common formulation to get SDE for $X^2(t)$, we obtain
$$ \mathrm d X^2(t) = [- 2X^2(t) + 2] \, \mathrm d t + 2 X(t) \sqrt 2 \, \mathrm dW(t).\tag{2} $$
Note that both equations have the same Wiener process $W(t)$. The fact that they have the same Wiener process seems natural, since $X(t)$ and $X^2(t)$ should be driven by the same source of noise. From these two SDEs we might incorrectly conclude that
$$ \mathrm d X^2(t) - 2 X(t) \, \mathrm d X(t) = 2 \, \mathrm dt. $$
Hence it seems that the quantity $\mathrm d X^2 - 2 X \, \mathrm d X$ is deterministic. However, let us use the Euler–Maruyama discretization scheme ($\xi$ is a normally distributed random number with mean $0$ and variance $1$):
$$ X(t + \Delta t) = X(t) - X(t) \, \Delta t + \sqrt{2} \sqrt{\Delta t} \, \xi(t). $$
From it, we can calculate $\Delta X^2 - 2 X \, \Delta X$ up to $\Delta t$
$$ [X^2(t + \Delta t) - X^2(t)] - 2 X(t) [X(t + \Delta t) - X(t)] = 2 \Delta t \, \xi^2(t) + \ldots, $$
which is a random variable.
Queston: How do I correctly apply Itô's lemma (or something else) to calculate $\mathrm d X^2 - 2 X \, \mathrm d X$ ? Can it be expressed in terms of $dW(t)$? To make it even more clear, I am interested in realization specific identities (strong, not weak sense). How to formulate Ito's lemma so one would avoid paradoxes like that? I used the OU process just as an illustration, I am interested in the case of a general SDE for $X(t)$.
Update: In case you wonder why I insist that $\mathrm dX^2 - 2X \, \mathrm dX \neq 2 \, \mathrm dt $. You can use any program of your choice to simulate $X(t)$ and then calculate $g(t)$ as
$$ g(t) = [X^2(t + \Delta t) - X^2(t)] - 2 X(t) [X(t + \Delta t) - X(t)] $$
for small values of $\Delta t$. You will see that $g(t)$ is random. I attach below a screenshot from Mathematica that does that (I also tried writing my own program in Julia with the same result). I admit that $g(t)$ might not be the correct approximation of $\mathrm dX^2 - 2X \, \mathrm dX$, then please tell me how to do it right.