Incorrect use of Ito's lemma

Question

Consider a simple Ornstein–Uhlenbeck process $X(t)$:

$$ \mathrm d X(t) = - X(t) \, \mathrm dt + \sqrt{2} \, \mathrm dW(t). \tag{1} $$

If we apply Itô's lemma in its common formulation to get SDE for $X^2(t)$, we obtain

$$ \mathrm d X^2(t) = [- 2X^2(t) + 2] \, \mathrm d t + 2 X(t) \sqrt 2 \, \mathrm dW(t).\tag{2} $$

Note that both equations have the same Wiener process $W(t)$. The fact that they have the same Wiener process seems natural, since $X(t)$ and $X^2(t)$ should be driven by the same source of noise. From these two SDEs we might incorrectly conclude that

$$ \mathrm d X^2(t) - 2 X(t) \, \mathrm d X(t) = 2 \, \mathrm dt. $$

Hence it seems that the quantity $\mathrm d X^2 - 2 X \, \mathrm d X$ is deterministic. However, let us use the Euler–Maruyama discretization scheme ($\xi$ is a normally distributed random number with mean $0$ and variance $1$):

$$ X(t + \Delta t) = X(t) - X(t) \, \Delta t + \sqrt{2} \sqrt{\Delta t} \, \xi(t). $$

From it, we can calculate $\Delta X^2 - 2 X \, \Delta X$ up to $\Delta t$

$$ [X^2(t + \Delta t) - X^2(t)] - 2 X(t) [X(t + \Delta t) - X(t)] = 2 \Delta t \, \xi^2(t) + \ldots, $$

which is a random variable.

Queston: How do I correctly apply Itô's lemma (or something else) to calculate $\mathrm d X^2 - 2 X \, \mathrm d X$ ? Can it be expressed in terms of $dW(t)$? To make it even more clear, I am interested in realization specific identities (strong, not weak sense). How to formulate Ito's lemma so one would avoid paradoxes like that? I used the OU process just as an illustration, I am interested in the case of a general SDE for $X(t)$.

Update: In case you wonder why I insist that $\mathrm dX^2 - 2X \, \mathrm dX \neq 2 \, \mathrm dt $. You can use any program of your choice to simulate $X(t)$ and then calculate $g(t)$ as

$$ g(t) = [X^2(t + \Delta t) - X^2(t)] - 2 X(t) [X(t + \Delta t) - X(t)] $$

for small values of $\Delta t$. You will see that $g(t)$ is random. I attach below a screenshot from Mathematica that does that (I also tried writing my own program in Julia with the same result). I admit that $g(t)$ might not be the correct approximation of $\mathrm dX^2 - 2X \, \mathrm dX$, then please tell me how to do it right.

Just a little question: why do you think that $dX^2-2XdX$ must be a function of $dW$ (and hence random)? — Math-fun, Apr 25 '23 at 21:30
You might want to read up on quadratic variation and why it is simplistic but justified to use $(dW_t)^2=dt$, that the time average of the squares of the BM increments converges to the interval length if the discretization time step goes to zero. (This is different to the path average of the same square, both are based on the central limit theorem.) — Lutz Lehmann, Apr 26 '23 at 08:26
Please note that $dX^2-2XdX=2dt$ is correct and your "numeric manulations" are not correct. To see this let $\Delta W(t)$ replace your $\sqrt{\Delta t} , \xi(t)$ now note that the term which you write as $\Delta t \xi(t)^2$ is indeed $(\Delta W(t))^2$ and this is $\Delta t$ as the quadratic variation of $W(t)$ is $t$. — Math-fun, Apr 26 '23 at 14:08
The reason that in the simulation you don't see $dX^2-2X,dX$ being the theoretical deterministic $2,dt$ are exactly the same as those why you won't see the theoretical $(dW_t)^2=dt$ (pointed out by Lutz Lehmann). These theoretical deterministic $dt$s are limits and there is no discretization that gets you to those. Just like you will never find a numeric discretization to calculate an integral of a reasonable nontrivial function exactly. Apparently Math-fun said the same days ago. — Kurt G., Apr 28 '23 at 19:24
@KurtG. If you are so sure that $(\mathrm d W(t))^2 = \mathrm d t$ you could have at least cited your sources. Lutz Lehmann used $\mathrm d \langle W^2(t) \rangle = \mathrm d t$ (the same as $\langle \mathrm d W^2(t) \rangle = \mathrm d t$). — Yrogirg, Apr 29 '23 at 05:54
BTW: you quote Lutz Lehmann in incorrectly. $(dW)^2=dt$ is notation popular in applied maths and physics for $\langle W\rangle_t=t$ (same as $d\langle W\rangle_t=dt,$). The formula $d\langle W^\color{red}{2}\rangle_t=dt$ is *wrong*. Proof: by Ito, $d(W^2)=2W,dW+,dt$ which is the same as $W^2_t=2\int_0^tW_s,dW_s+t,.$ Therefore, $\langle W^2\rangle_t=4\int_0^tW^2_s,ds,.$ In other words: $d\langle W^2\rangle_t=\color{red}{4W^2_t},dt,.$ — Kurt G., Apr 29 '23 at 07:46
Ok, that's just different notation, I used the common interpretation of angled brackets to be the expectation, the average; you used it to denote the variance (?). — Yrogirg, Apr 29 '23 at 19:23
Once again, to be clear. I am not talking about the calculation of the variance or other statistical properties of $\mathrm d X^(t) - 2 X(t) \mathrm dX(t) = ?$. I am talking about the literal equality sign, realization specific. — Yrogirg, Apr 29 '23 at 19:29

score 3 · Answer 1 · answered Apr 30 '23 at 12:12

At the risk of repeating other user's comments:

To add to NN2's answer: $$\tag{A} X_t=X_0\,e^{-t}+\int_0^t\sqrt{2}\,e^{(s-t)}\,dW_s $$ is the correct solution to the SDE $$\tag{B} dX_t=-X_t\,dt+\sqrt{2}\,dW_t\,. $$ By the Ito rule $$\tag{C} d(X_t)^2=2X_t\,dX_t+2\,dt $$ (cf. Lutz Lehmann's answer).

You wrote "we might incorrectly conclude that" (C) holds. We disagree with this. (C) is correct and holds.
The Euler-Maruyama discretization does not refute the validity of (C) which holds only in the limit as I will now show with as elementary arguments as possible:

Using (A), (C) can be written in integral form equivalently as \begin{align}\tag{D} X_t^2=\Big(\int_0^t \sqrt{2}e^{s-t}\,dW_s\Big)^2 \end{align} (to simplify notation I assume $X_0=0\,$.) Taking $0=t_0<...<t_n=t$ we get \begin{align}\tag{E} (X_{t_i})^2-(X_{t_{i-1}})^2&=X_{t_i}\Big(X_{t_i}-X_{t_{i-1}}\Big)+X_{t_{i-1}}\Big(X_{t_i}-X_{t_{i-1}}\Big)\\ &=\underbrace{\Big(X_{t_i}-X_{t_{i-1}}\Big)\Big(X_{t_i}-X_{t_{i-1}}\Big)}_{(*)}+\underbrace{2X_{t_{i-1}}\Big(X_{t_i}-X_{t_{i-1}}\Big)}_{(**)}\,. \end{align}

In (and only in) the limit $n\to\infty$ and $\max\limits_{i=1,...,n}|t_i-t_{i-1}|\to 0$ the sum over $i$ of the terms (**) converges to the Ito integral $2\int_0^tX_s\,dX_s\,.$
In the same limit the sum of the terms (*) converges to $$\tag{F} \int_0^t2\,ds=2t\,. $$ Proof. From (B) $$ X_t=-\int_0^t X_s\,ds+\sqrt{2}\,W_t=:A_t+M_t $$ where $A_t$ has finite variation and $M_t$ zero variation but finite quadratic variation. Let's be clear that $\langle M\rangle_t$ denotes quadratic variation and not some expected value $\mathbb E[M_t]\,.$ From $\langle M\rangle_t=2\langle W\rangle_t=2t$ and $\langle A,M\rangle_t=0$ (see this answer) the proof of (F) follows.
Since the sum over $i$ of the LHS of (E) is a telescoping sum we can put it all together to $$ (X_t)^2=2t+\int_0^tX_s\,dX_s\,. $$ Again: this holds in the limit and not in an Euler-Maruyama discretization.

Thank you for the detailed answer. To add to it explicitly, if someone is to test Ito's lemma against data, one should check $\Delta X^2 - \int_0^{\Delta t} 2 X , \mathrm d X = 2\Delta t$, not $\Delta X^2 - 2 X \Delta X = 2\Delta t$. — Yrogirg, May 04 '23 at 07:43

NN2 · Answer 2 · 2023-04-30T12:58:20.497

It's wrong to do $$ \mathrm d X = - X \, \mathrm dt + \sqrt{2} \, \mathrm dW(t)\Longrightarrow X(t + \Delta t) \approx X(t) - X(t) \, \Delta t + \sqrt{2} \sqrt{\Delta t} \, \xi(t) \tag{1} $$ The discretization method does not work on the stochastic differential equations. So, $(1)$ is not correct.

If you compute $X_t$ $$X_t =X_0e^{-t}+\int_0^t\sqrt{2}e^{s-t}dW_s \tag{2}$$ you can then apply the Euler–Maruyama discretization scheme.

For information, the formula $dX^2_t-2X_tdX_t = 2dt$ must be correct. You can use $(2)$ to test it.

From $(2)$, we will prove $$dX_t^2-2X_tdX_t=2dt \tag{3}$$

For the sake of simplicity, we denote $Z_t :=\int_0^t\sqrt{2}e^{s}dW_s$, then $X_t=e^{-t}(X_0+Z_t)$.

We have some results: $$\begin{align} dZ_t &= \sqrt{2}e^tdW_t \tag{4}\\ d(Z_t^2) &= 2Z_tdZ_t + (dZ_t)^2 \stackrel{(4)}{=} 2\sqrt{2}Z_te^tdW_t+2e^{2t}dt\tag{5}\\ X_t^2 &= e^{-2t}(X_0^2+2X_0Z_t+Z_t^2)\tag{6}\\ d(X_t^2) &\stackrel{(6)}{=} e^{-2t}d(X_0^2+2X_0Z_t+Z_t^2)-2e^{-2t}(X_0^2+2X_0Z_t+Z_t^2)dt\\ &=e^{-2t} \left( 2X_0dZ_t+d(Z_t^2) - 2(X_0^2+2X_0Z_t+Z_t^2)dt \right)\\ &\stackrel{(4,5)}{=}e^{-2t} \left( \color{red}{2X_0\sqrt{2}e^tdW_t+2\sqrt{2}Z_te^tdW_t}+2e^{2t}dt - 2(X_0^2+2X_0Z_t+Z_t^2)dt \right)\\ &=e^{-2t} \left( \color{red}{2\sqrt{2}e^{2t}X_tdW_t}+2e^{2t}dt - 2(X_0^2+2X_0Z_t+Z_t^2)dt \right)\\ &= 2\sqrt{2}X_tdW_t+2dt - 2(X_0^2+2X_0Z_t+Z_t^2)e^{-2t}dt \\ &= 2\sqrt{2}X_tdW_t+2dt - 2X_t^2dt \tag{7}\\ X_tdX_t &= \sqrt{2}X_tdW_t -X_t^2 dt \tag{8} \end{align}$$

Finally, from $(7)(8)$, we can prove $(3)$ $$\begin{align} \color{red}{dX_t^2- 2 X_tdX_t} &= (2\sqrt{2}X_tdW_t+2dt - 2X_t^2dt) - 2(\sqrt{2}X_tdW_t -X_t^2 dt) = \color{red}{2dt} \end{align}$$

Remark: it's quite time-consuming! Luckily I can reach the end of the proof.

Again, we cannot apply the discretization scheme to SDEs because that is the source of errors. Take for example a well-know SDE $$\frac{dS_t}{S_t}=\sigma dW_t \tag{9}$$ where the solution is $$S_t = S_s\cdot \exp\left(-\frac{1}{2}\sigma^2 (t-s) + \sigma (W_t-W_s) \right) \tag{10}$$

With a discretization time step $t_n = \Delta t \cdot n$, from $(10)$, we have $$\begin{align} S_{t_n} &= S_{t_{n-1}}\exp\left(-\frac{1}{2}\sigma^2 \Delta t + \sigma \sqrt{\Delta t} \mathcal{N}(0,1) \right) \\ \text{or}\hspace{0.5cm} S_{t_n} &\approx S_{t_{n-1}} \left(1 \color{red}{-\frac{1}{2}\sigma^2 \Delta t} + \sigma \sqrt{\Delta t} \mathcal{N}(0,1) \right) \end{align}$$

If we use $(9)$, the red term is missing $$S_{t_n} \approx S_{t_{n-1}}\left(1 + \sigma \sqrt{\Delta t} \mathcal{N}(0,1) \right)$$

I think OP's first discretization is the first-order Euler scheme for $X_t=X_s-\int_s^tX_udu+\sqrt{2}(W_t-W_s)$, but I concur that their notation without indexes may imply incorrect understanding. — Snoop, Apr 25 '23 at 22:21
@NN2 If you can indeed use (2) to show that $\mathrm d X^2(t) - X(t) , \mathrm d X(t) = 2 , \mathrm d t$, please do it. — Yrogirg, Apr 27 '23 at 13:03
@Yrogirg I added the proof. Again, we cannot apply discretization schemes to SDEs. — NN2, Apr 30 '23 at 12:44

score 0 · Answer 3 · answered Apr 26 '23 at 10:21

0

By the Ito formula in question, you get in general for the squared process $$ d(X^2)=2X\,dx+d\langle X\rangle_t $$ where with $dX=a\,dt+b\,dW_t$ one gets the quadratic variation as $d⟨X⟩_t=b^2\,dt$, so here $=2\,dt$.

Thus the computed identity is completely normal and expected.

answered Apr 26 '23 at 10:21

Lutz Lehmann

126,666

Given a generated realization of $X(t)$ sampled at time intervals $\Delta t$, how do I show that $\mathrm d X^2(t) - 2 X(t) , \mathrm dX(t) = 2 , \mathrm dt$ ? – Yrogirg Apr 26 '23 at 11:32
1

In the numerical construction you get obviously $g(t)=(X(t+Δt)−X(t))^2$. The values $g(t+kΔt)$ are independent, so taking a moving average will give values that get ever closer to $2Δt$. Note that in general $ΔW_t$ is much larger than $Δt$, so that the drift term is negligible here. – Lutz Lehmann Apr 28 '23 at 12:49
This is indeed a correct statement, $\mathrm d X^2 - 2X , \mathrm dX = (\mathrm d X)^2$, no doubt. And I never said anything about taking an average. The question is, where the application of the Ito's lemma went wrong, why can't we just calculate $(2) - 2X (1)$ (equations are now numbered in the question)? – Yrogirg Apr 28 '23 at 14:16

Incorrect use of Ito's lemma

3 Answers3