Intuition - Ito's formula

Question

Here is some intuition for Ito's formula.

The Taylor expansion for a function $f$ about a point $y$ is $$ f(y) = f(x) + f'(x)(x-y) + \frac12 f''(x)(x-y)^2 + \dots \,.$$

If you replace $x-y$ with $dx$ and $f(y) - f(x)$ with $df(x)$, then $$ df(x) = f'(x)dx + \frac12 f''(x) dx^2 + \dots \,.$$

If you keep only the first term, you have the formula for the differential, $df = f' dx$.

If you keep the first two terms, you have Ito's formula, $ df = f'dx + \frac12 f'' dx^2$.

Is there some explanation for why functions of stochastic processes need the second derivative term when taking the differential of $f$? I know that we use the fact that "$dz^2 = dt$", where $z$ is a Brownian motion, but I don't fully understand that. I know $\mathbb{E}[B_t^2] = t$, is that related? Why is the second-order term in the Taylor expansion for the regular differentual zero?

Edit: on slide 8 of these lecture notes, we have If $dX_t = a dt + b dB_t$ is an Ito process, then \begin{align*} (dX_t)^2 &= (adt + b dB_t)^2 \\ &= a^2 dt^2 + 2(adt)(bdB_t) + (bdB_t)^2 \\ &= bdB_t^2 \,. \end{align*}

Why are the first two terms zero? (I also don't understand why the term $dB_t dt =0$, on page 10.)

There is another issue you are lefting out, and it is the quadratic variation of the process which makes to make in play the second derivative. This video explain it good (even when there are some mixed up explanations) — Joako, Dec 09 '23 at 22:00
Concerning intuition about itô's integral, see 3rd example here. — Jean Marie, Dec 09 '23 at 22:47

score 2 · Answer 1 · answered Dec 10 '23 at 01:59

As explained here Intuition between Ito-Formula, after we Taylor expand (using the Mean-value remainder) we have

$$f(B_t)=f(B_0)+\sum_{i=0}^{n-1}f'(B_{t_{i}})(B_{t_{i+1}}-B_{t_{i}})+\frac{1}{2}\sum_{i=0}^{n-1}f''(\theta_i)(B_{t_{i+1}}-B_{t_{i}})^2.$$

So now if we had that Brownian motion was of bounded variation then the last term is bounded by

$$\frac{1}{n}|\sum_{i=0}^{n-1}|f''(\theta_i)|B_{t_{i+1}}-B_{t_{i}}|\approx \frac{1}{n}\int |f''(s)||dB_s|\to 0 $$

and so it disappears. However, the reality is that Brownian motion is not of bounded variation but it is of finite quadratic-variation i.e. $\sum (B_{t_{i+1}}-B_{t_{i}})^2<\infty$ in $L^2$, and so Itô instead worked to define his integral in terms of the quadratic variation.

The same idea works if we have p-variation, see rough paths eg. Multidimensional Stochastic Processes as Rough Paths.

See also geometric intuition here https://en.wikipedia.org/wiki/It%C3%B4%27s_lemma.

Abezhiko · Accepted Answer · 2024-03-28T17:41:35.187

Thomas Kojar has provided an answer and some references already, but here is an intuitive explanation, in order to stay in the same context/spirit as your question.

1) Why $\mathrm{d}x^2 \sim 0$ in the standard case ?

It is to be recalled that, in the standard case of a function depending on a non-stochastic variable, the differential $\mathrm{d}f(x)$ is a somewhat "halfway unfinished computation" of the derivative $f'(x)$, so that $$ \frac{\mathrm{d}f}{\mathrm{d}x} = f'(x) + \frac{1}{2}f''(x)\,\mathrm{d}x + \ldots, $$ with all terms except for the first one vanishing when $\mathrm{d}x \rightarrow 0$. The confusing ambiguity comes from the fact that usually the limit is already contained implicitly inside the "d" notation; here, the notation is a little bit abused by using $\mathrm{d}f$ and $\mathrm{d}x$ before taking the limit.

2) How to treat $\mathrm{d}X_t^2$ in the stochastic case and why $\mathrm{d}B_t\mathrm{d}t \sim 0$ ?

In contrast, when the independent variable $x$ is random, namely $X_t$, then (some) terms inside $\mathrm{d}f(X_t)$ coming from $\mathrm{d}X_t^2$ cannot be ignored, because $\mathbb{E}[\mathrm{d}B_t^2] = \mathrm{d}t$. It is to be noted that another abuse of notation $-$ in a way, stochastic calculus is full of formalized abuses of notation $-$ is usually made by dropping the average, i.e. $\mathrm{d}B_t^2 = \mathrm{d}t$.

As before, higher-order terms are again considered as negligible in the limit $\mathrm{d}t \rightarrow 0$, because they would vanish when computing a yet-to-be-formalized derivative $\frac{\mathrm{d}f(X_t)}{\mathrm{d}t}$ (with $\frac{\mathrm{d}B_t}{\mathrm{d}t}$ being interpreted as a white noise in that case). In consequence, the $o(\mathrm{d}t)$ terms, i.e. all the supralinear terms with respect to $\mathrm{d}t$, are (implicitly) cut from the initial Taylor expansion. In that point of view, $\mathrm{d}B_t$ is kept because $\mathrm{d}B_t = \mathcal{N}(0,\mathrm{d}t) = \mathcal{N}(0,1)\sqrt{\mathrm{d}t} \sim \mathrm{d}t^{1/2}$, but $\mathrm{d}B_t\mathrm{d}t$ is ruled out because $\mathrm{d}B_t\mathrm{d}t \sim \mathrm{d}t^{3/2} = o(\mathrm{d}t)$.

Final remark

All the above developments are valid when the stochastic process $X_t$ is made of a deterministic component, represented by the drift term $a_t\mathrm{d}t$, and a random phenomenon modelled by a normal distribution, represented here by the gaussian noise $b_t\mathrm{d}t$. When the random event in question is not gaussian, for example in the case of a Poisson process, then you will need to adapt and rederive Itô's lemma, because the relation $\mathrm{d}B_t^2 \sim \mathrm{d}t$ is not true anymore.

The standard case itself is confusing to me. I know that the differential is related to some differential geometry stuff, but it seems so arbitrary to cut off all second-and-higher order terms... — lady gaga, Dec 14 '23 at 19:50
Feels kind of inexact to cut off terms because they are "Small" — lady gaga, Dec 14 '23 at 19:51

Intuition - Ito's formula

2 Answers2