Definitions of the Stratonovich integral and why the "average" definition is arguably correct

Question

Notations: Herein:

$\mathcal{B} := \{B(t)\}_{t \ge 0}$ denotes a standard Brownian motion, with $B(0) = 0$.
$P := \{x_i\}_{i=0}^n$ denotes a partition of the interval $[0,t]$, with norm defined in the Riemannian sense.
$\Delta B_i := B(x_i) - B(x_{i-1})$.
$\Delta x_i := x_i - x_{i-1}$.
$\mathcal{X} := \{X(t)\}_{t \ge 0}$ denotes a Stratonovich-integrable process, in whatever sense that is needed at the time.
$\int_0^t X(s) \circ \mathrm{d} B(s)$ denotes the Stratonovich integral.

The Conflicting Definitions: There are two conflicting definitions for the Stratonovich integral, which to my understanding are stated below:

$$\begin{align*} \int_0^t X(s) \circ \mathrm{d}B(s) &:= \lim_{\|P\| \to 0} \sum_{i=1}^n \frac{X(x_i) + X(x_{i-1})}{2} \Delta B_i \tag{1} \\ \int_0^t X(s) \circ \mathrm{d}B(s) &:= \lim_{\|P\| \to 0} \sum_{i=1}^n X \left( \frac{x_i + x_{i-1}}{2} \right) \Delta B_i \tag{2} \end{align*}$$

The First Definition: Definition $(1)$ seems to be motivated by averaging $X(t)$ over each interval induced by $P$. In fact we could have a "more general" integral by considering, for $\lambda \in [0,1]$,

$$\lim_{\|P\| \to 0} \sum_{i=1}^n \Big( (1-\lambda) X(x_i) + \lambda X(x_{i-1}) \Big)\Delta B_i \tag{1'}$$

where Itô integration arises from $\lambda = 0$, as an example, and Stratonovich (in the sense of $(1)$) under $\lambda=1/2$.

In my reading, I've seen this used by

The Wikipedia article on Stratonovich integrals (link)
Apparently this is used in Ioannis Karatzas & Steven Shreve's Brownian Motion and Stochastic Calculus (Amazon link)
The Encyclopedia of Math website (link)
An article by Jonathan Mattingly on The Probability Workbook (link)

The Second Definition: Definition $(2)$ seems to be inspired simply by the Riemann-Stieltjes formulation for deterministic functions:

$$\int_0^t f(x) \, \mathrm{d} \varphi(x) = \lim_{\|P\| \to 0} \sum_{i=1}^n f(\xi_i) \Delta \varphi_i \tag{2'}$$

(for $\Delta \varphi_i$ defined similarly as for $\Delta B_i$). In this case, $\xi_i \in [x_{i-1},x_i]$. This second definition of the Stratonovich integral seems to be inspired similarly: take $\xi_i$ to be the midpoints, $\varphi$ your Brownian motion, and $f$ comes from your stochastic process.

In my reading, I've seen this definition used by:

Bernt Øksendal in Stochastic Differential Equations: An Introduction with Applications (Amazon link)
Dr. Peyam on YouTube (video link)
Apparently, this arises in Steven Shreve's Stochastic Calculus for Finance (Amazon link)
Lewis Smith on this webpage

My Question: It does not seem obvious to me that these would be equivalent definitions. Moreover, I've several times seen on Math Stack Exchange (e.g. here) the claim that $(1)$ is the "correct" definition, though seeing it used elsewhere (e.g. this Math Overflow post) no one objects (openly) to $(2)$.

Hence, I'm seeking a proper, definitive answer, because I am very confused:

Which is "correct" to call the Stratonovich integral? Is it simply a matter of preference?
Is there a particular reason to prefer one over the other if there is no definitive answer?
Do any results for one definition break under the other? (Such as: does the conversion to an Itô integral break? What about properties like the chain rule?)

...or am I just totally missing something here?

Well, if $(2)$ is incorrect, why is it so? Why do people still use it? What makes $(1)$ the right definition? — PrincessEev, Apr 07 '22 at 16:13
@KurtG. I don't know why you say the second definition is wrong, it can be shown (see for instance "Introduction to stochastic integration" by H-H-Kuo) that both are equivalent. Furthermore in the answer you linked above, you state that the term $S(\Pi)$ with $\epsilon=1$ does not converge to $t/2$ but actually it does. — Chaos, Apr 08 '22 at 06:39
See also the Corollary after Theorem V-5.30 of Protter's book — Chaos, Apr 08 '22 at 06:49
@KurtG. I don't quite understand your proof, mainly because I've never heard about the H-K integral. Honestly speaking I think that Protter is a sufficiently trustworthy reference — Chaos, Apr 08 '22 at 15:00
@KurtG. I posted an answer https://math.stackexchange.com/a/4423659/607487 with some calculation that may shed some light — Chaos, Apr 09 '22 at 09:32
@Chaos . Great ! Thanks . Will read line by line tomorrow and in coming days. — Kurt G., Apr 09 '22 at 16:29
@EeveeTrainer . Chaos has finally convinced me that I was wrong here in my comments and in that post originally. Thanks for finding all those references. Apparently the two seemingly different definitions of the Stratonovich integral are equivalent. As far as I know only the book of Kuo has contained a proof. — Kurt G., Apr 13 '22 at 10:31

score 2 · Accepted Answer · answered Apr 13 '22 at 09:59

Indeed both definitions are equivalent (see for instance the Theorem V-5.30 of Protter's book).

For the sake of simplicity I'll assume that $X(t)=W(t)$. Let $0=t_0<t_1<\cdots <t_N=T$ be an arbitrary partition of the interval $[0,T]$ with $\|\pi\|:=\max_{i} |t_{i+1}-t_i|$.

Define $t^*_i:=\frac{t_{i+1}+t_i}{2}$ and consider

\begin{align*} &\sum_{i=0}^{N-1}W\left(t_i^*\right)[W(t_{i+1})-W(t_i)]\\ &=\sum_{i=0}^{N-1}[W(t_i^*)-W(t_i)][W(t_{i+1})-W(t_i)]+\sum_{i=0}^{N-1} W(t_i)[W(t_{i+1})-W(t_i)]=\mathcal I_1+\mathcal I_2 \end{align*} We know that as $\|\pi\|\to 0$ the term $\mathcal I_2$ converges to $\int_0^T W(t)dW(t)$ in $L^2(\Omega)$. In order to prove the desired result it suffices to show that $\mathcal I_1$ converges in $L^2(\Omega)$ to $T/2$.

We start by noticing that

\begin{align*} \mathbb E\left[\mathcal I_1\right]=\sum_{i=0}^{N-1}\mathbb E\left([W(t_i^*)-W(t_i)][W(t_{i+1})-W(t_i)]\right)&=\sum_{i=0}^{N-1} t_i^*\wedge t_{i+1}-t_i\wedge t_{i+1}-t_i^*\wedge t_{i}+t_i\\ &=\sum_{i=0}^{N-1}t_i^*-t_i=\sum_{i=0}^{N-1}\frac{t_{i+1}-t_i}{2}=T/2 \end{align*} Then \begin{align*} \|\mathcal I_1-T/2\|_{L^2(\Omega)}^2= \mathbb V\left(\mathcal I_1\right)=\mathbb V\left(\sum_{i=0}^{N-1}[W(t_i^*)-W(t_i)][W(t_{i+1})-W(t_i)]\right), \end{align*} due to the disjointness of the intervals in each term of the sum we can write the latter as \begin{align*} \|\mathcal I_1-T/2\|_{L^2(\Omega)}^2= \mathbb V\left(\mathcal I_1\right)&=\sum_{i=0}^{N-1}\mathbb V\left([W(t_i^*)-W(t_i)][W(t_{i+1})-W(t_i)]\right)\\ &=\sum_{i=0}^{N-1}\mathbb V\left([W(t_i^*)-W(t_i)][(W(t_{i+1})-W(t_i^*))+(W(t_i^*)-W(t_i))]\right) \end{align*}

Let $\Delta_*(i):=[W(t^*_i)-W(t_i)]$ and $\Delta^*(i):=[W(t_{i+1})-W(t^*_i)]$ \begin{align*} &\sum_{i=0}^{N-1}\mathbb V\left(\Delta_*(i)[\Delta^*(i)+\Delta_*(i)]\right)\\ &=\sum_{i=0}^{N-1}\mathbb E\left(\Delta_*(i)^2[\Delta^*(i)+\Delta_*(i)]^2\right)- (t_i^*-t_i)^2\\ &=\sum_{i=0}^{N-1}\mathbb E\left(\Delta_*(i)^2[\Delta^*(i)^2+2\Delta^*(i)\Delta_*(i)+\Delta_*(i)^2]\right)- (t_i^*-t_i)^2\\ &=\sum_{i=0}^{N-1}\mathbb E\left(\Delta_*(i)^2\Delta^*(i)^2\right)+2E\left(\Delta^*(i)\Delta_*(i)^3\right)+\mathbb E\left(\Delta_*(i)^4\right)- (t_i^*-t_i)^2\\ &=\sum_{i=0}^{N-1}\mathbb E\left(\Delta_*(i)^2\Delta^*(i)^2\right)+\mathbb E\left(\Delta_*(i)^4\right)- (t_i^*-t_i)^2\\ &=\sum_{i=0}^{N-1} (t_i^*-t_i)(t_{i+1}-t_i^*)+3(t_i^*-t_i)^2- (t_i^*-t_i)^2\\ &=\sum_{i=0}^{N-1} (t_i^*-t_i)(t_{i+1}-t_i^*)+2(t_i^*-t_i)^2 \end{align*}

Now notice that \begin{align*} (t_{i+1}-t_i^*)(t_i^*-t_i)= \left(t_{i+1}-\frac{t_i+t_{i+1}}{2}\right)\left(\frac{t_i+t_{i+1}}{2}-t_i\right)=\frac{(t_{i+1}-t_i)^2}{4}, \end{align*} and \begin{align*} (t_i^*-t_i)^2=\frac{(t_{i+1}-t_i)^2}{4} \end{align*}

and thus the latter equals

\begin{align*} \frac{3}{4}\sum_{i=0}^{N-1} (t_{i+1}-t_i)^2\leq \frac{3}{4}\|\pi\|\sum_{i=0}^{N-1} (t_{i+1}-t_i)=\frac{3}{4}\|\pi\|T \end{align*} and the last term on the right vanished as $\|\pi\|\to 0$.

An interesting property is that if we replace the standard product $"\times"$ in $$\sum_{i=0}^{N-1}W\left(t_i^*\right)\times [W(t_{i+1})-W(t_i)],$$ with the so-called Wick product "$\diamond$", then the choice of the evaluation point is irrelevant in fact $$\sum_{i=0}^{N-1}W\left(t_i^{\alpha}\right)\diamond [W(t_{i+1})-W(t_i)]\to \int_0^T W(t)dW(t)$$ where $t_i^{\alpha}:=[1-\alpha]t_{i}+\alpha t_{i+1}$ for any choice of $\alpha\in [0,1]$.

This is due to the fact that the Wick product is somehow implicit in the Itô integration via the formula

$$\int_0^T f(W(t))dW(t)=\int_0^T f(W(t))\diamond \dot W(t)dt$$ where $\dot W(t)$ denotes the distributional derivative of the Brownian motion (i.e. a white noise process).

Can we generalize this result to continuous $X$ and square integrable continuous $Y$ martingales? I can prove that for $\lambda\in\left[0,1\right]$ that $\sum_{i=0}^{N-1}\left(\lambda X_{t_{i}}+\left(1-\lambda\right)X_{t_{i+1}}\right)\left(Y_{t_{i+1}}-Y_{t_{i}}\right)\rightarrow\int_{0}^{T}X_{t}dY_{t}+\left(1-\lambda\right)\left[X,Y\right]$ ... — Kapes Mate, Dec 20 '23 at 22:42
... I have problems with the following: $\sum_{i=0}^{N-1}X_{t_{i}^{}}\left(Y_{t_{i+1}}-Y_{t_{i}}\right)=\sum_{i=0}^{N-1}\left(X_{t_{i}^{}}-X_{t_{i}}\right)\left(Y_{t_{i+1}}-Y_{t_{i}}\right)+\sum_{i=0}^{N-1}X_{t_{i}}\left(Y_{t_{i+1}}-Y_{t_{i}}\right)$, where $t_{i}^{*}\dot{=}\lambda t_{i}+\left(1-\lambda\right)t_{i+1}$. I know that the last term converges to $\int_{0}^{T}X_{t}dY_{t}$, but I am troubled with the first term. Does it converge to $\left(1-\lambda\right)\left[X,Y\right]$ by any chance? If yes, how can we prove it? — Kapes Mate, Dec 20 '23 at 22:43

Definitions of the Stratonovich integral and why the "average" definition is arguably correct

1 Answers1

Linked