90

I have read that the infinitesimal generator of Brownian motion is $\frac{1}{2}\small\triangle$. Unfortunately, I have no background in semigroup theory, and the expositions of semigroup theory I have found lack any motivation or intuition.

What is the infinitesimal generator of a process intuitively, and why is it interesting or useful to know that the generator of Brownian motion is $\frac{1}{2}\small\triangle$?

Davide Giraudo
  • 172,925
Potato
  • 40,171

3 Answers3

116

For a Markov process $(X_t)_{t \geq 0}$ we define the generator $A$ by

$$Af(x) := \lim_{t \downarrow 0} \frac{\mathbb{E}^x(f(X_t))-f(x)}{t} = \lim_{t \downarrow 0} \frac{P_tf(x)-f(x)}{t}$$

whenever the limit exists in $(C_{\infty},\|\cdot\|_{\infty})$. Here $P_tf(x) := \mathbb{E}^xf(X_t)$ denotes the semigroup of $(X_t)_{t \geq 0}$.

By Taylor's formula this means that

$$\mathbb{E}^xf(X_t) \approx f(x)+t Af(x)$$

for small $t \geq 0$. So, basically, the generator describes the movement of the process in an infinitesimal time interval. One can show that

$$\frac{d}{dt} P_t f(x) = A P_tf(x), \tag{1}$$

i.e. the generator is the time derivative of the mapping $t \mapsto P_tf(x)=\mathbb{E}^x(f(X_t))$. Reading $(1)$ as a (partial) differential equation we see that $u(t,x) := P_t f(x)$ is a solution to the PDE

$$\frac{\partial}{\partial t} u(t,x) = Au(t,x) \qquad u(0,x)=f(x).$$

This is one important reason why generators are of interest. Another, more probabilistic, reason is that the process

$$M_t^f := f(X_t) - f(X_0)- \int_0^t Af(X_s) \, ds, \qquad t \geq 0 \tag{2}$$

is a martingale. This means that we can associate with $(X_t)_{t \geq 0}$ a whole bunch of martingales, and this martingale property comes in handy very often, for example whenenver we deal with expectations of the form $\mathbb{E}^x(f(X_t))$. This leads to Dynkin's formula.

Generators are also connected with the martingale problem which in turn can be used to characterize (weak) solutions of stochastic differential equations. Futhermore, generators of stochastic processes are strongly related to Dirichlet forms and Carré du champ operators; it turns out that they are extremely helpful to carry over results from probability theory to analysis (and vica versa). One important application are heat-kernel estimates.

Example: Brownian motion In the case of (one-dimensional) Brownian motion $(B_t)_{t \geq 0}$, we see that

$$\mathbb{E}^x(f(B_t)) \approx f(x)+ \frac{t}{2} f''(x)$$

for small $t$. This formula can be motivated by Taylor's formula: Indeed,

$$\mathbb{E}^x(f(B_t)) \approx \mathbb{E}^x \left[f(x)+f'(x)(B_t-x)+\frac{1}{2} f''(x)(B_t-x)^2 \right]= f(x)+0+\frac{t}{2} f''(x)$$

using that $\mathbb{E}^x(B_t-x)=0$ and $\mathbb{E}^x((B_t-x)^2)=t$.

From $(1)$ we see that $u(t,x) := \mathbb{E}^x(f(B_t))$ is the (unique) solution of the heat equation

$$\partial_t u(t,x) = \frac{1}{2}\partial_x^2 u(t,x) \qquad u(0,x)=f(x).$$

Moreover, one can show that the solution of the Dirichlet problem is also related to the Brownian motion. Furthermore, $(2)$ yields that

$$M_t^f := f(B_t)-f(B_0) - \frac{1}{2} \int_0^t f''(B_s) \, ds.$$

is a martingale. Having Itô's formula in mind, this is not surprising since

$$f(B_t)-f(B_0) = \int_0^t f'(B_s) \, dB_s+ \frac{1}{2} \int_0^t f''(B_s) \,ds = M_t^f + \frac{1}{2} \int_0^t f''(B_s) \,ds.$$

The above-mentioned results (and proofs thereof) can be found in the monograph Brownian Motion - An Introduction to Stochastic Processes by René L. Schilling & Lothar Partzsch.

saz
  • 120,083
  • 1
    Thank you. Just what I was looking for. – Potato Mar 03 '14 at 17:18
  • How do we know that not every Martingale is of the shape in (2)? – Thomas E Apr 12 '17 at 16:22
  • @ThomasE What exactly do you mean by "every martingale"? – saz Apr 12 '17 at 17:42
  • While the given process in (2) is derived from a markov process, you just said the construction gives "a whole bunch of martingales", implying (as I read it) that there are martingales which are not constructed by a markov process and a function $f$. I would be very interested in a clarification.

    I ask because many convergence theorems are essentially martingale convergence theorems, so I wonder if any martingale could be constructed from a markovprocess, making the ergodicy theorem the root of all probalistic convergence theorems.

    – Thomas E Apr 12 '17 at 17:49
  • @ThomasE I'm not aware of any such result. (I didn't mean it this way; I'm just saying that it is possible to obtain a class of martingales associated with a given Markov process.) – saz Apr 12 '17 at 17:58
  • I did understand you. You made no such statement, yet it is a natural question to consider the inverse. What do you think, is there a chance this holds? :) – Thomas E Apr 12 '17 at 18:00
  • 2
    @ThomasE I agree that it is an interesting question, but I wouldn't expect that any martingale has such a representation, no. If you know that the filtration is generated by a Lévy process, then you can write the martingale as a stochastic integral (driven by the Lévy process)... that's a nice representation result, but a different story. – saz Apr 12 '17 at 18:26
  • Very nice post. I find the interpretation of the equation after you use Taylors formula a little weird tho. Isnt this more like the "avarage" movement for small $t$ ? Maybe this is implicit since it is probably the best one can do. I am not sure however and please correct me if I am wrong. – user123124 Oct 03 '17 at 17:34
  • @user1 Yes, you are right about that... because of the expectation value it's more like the "averaged small time movement" – saz Oct 03 '17 at 17:36
  • Maybe this is the same as the derivative being a plane in $\mathbb{R}^{n}$, is that some kind of avarage aswell? – user123124 Oct 03 '17 at 17:37
  • @user1 No, it's not. I was refering to the expectation value (=average over all $\omega \in \Omega$ with respect to the probability measure $\mathbb{P}^x$). – saz Oct 03 '17 at 17:56
  • @user1 For any $y$...? Do you mean for any $t$...? The identity $$\frac{d}{dt} P_t f = A P_t f$$ holds indeed for any $t>0$ and not only at $t=0$ (... for $f$ sufficiently nice). – saz Oct 16 '17 at 14:31
  • @saz yes it should be $t$ not $y$, but that means we know the change of the expected value under any function for any $t$ and not just $t=0$. So it more then just the derivative at $t=0$. Is that explained by the fact that we can just change the $x$ and think of the change of the process starting there? – user123124 Oct 16 '17 at 14:34
  • 1
    @user1 In some sense, yes. If you consider $t>0$, then you first have to apply the semigroup $P_t$ (in your words "change the $x$") and then take the derivative. The Markov property is indeed very crucial for all of this. – saz Oct 16 '17 at 15:07
  • @saz thats very weird, that the rate of change is disribed by the same object anywhere. It is a very complicated object tho. – user123124 Oct 18 '17 at 05:28
  • @user1 Well, any "nice" Markov process (more precisely, any Feller process) is uniquely determined by its generator. This shows that the generator encodes a lot of information about the process. – saz Oct 18 '17 at 05:48
  • @itry Not sure whether I'm as good as you think... in any case, it took a lot of practicing (e.g. solving homework problems) over many years. – saz Aug 29 '18 at 20:08
  • 1
    Could you tell me why $(P_t)$ is a semi group ? You set $P_t f(x)=\mathbb E^x[f(X_t)]$, so why $P_{t+s}=P_tP_s$ ? I don't get how to prove it. – user621345 Jan 11 '19 at 14:55
  • 1
    @NewMath Well, since I'm assuming that $(X_t){t \geq 0}$ is a Markov process, this is a direct consequence of the Markov property and the tower property of conditional expctation: \begin{align*} P{t+s} f(x) &= \mathbb{E}^x \big[ \mathbb{E}^x(f(X_{t+s}) \mid \mathcal{F}_t) \big] \ &= \mathbb{E}^x \big[ \mathbb{E}^{X_t} f(X_s) \big] = P_t(P_s f)(x). \end{align*} – saz Jan 11 '19 at 14:58
  • Yes, sorry, I didn't pay attention to the Markov property. In fact, in my cas, I have an SDE $dX_t=b(X_t)dt+\sigma (X_t)dB_t$. The generator is $Af(x)=\frac{\mathbb E^x[f(X_t)]-f(x)}{t}.$ In this situation, is $P_tf(x)=\mathbb E^x[f(X_t)]$ a semi-group ? Because I $(X_t)$ is not markovian a priori, is it ? – user621345 Jan 11 '19 at 15:11
  • @NewMath If $b$ and $\sigma$ are nice enough (e.g. if they are Lipschitz), then the SDE gives rise to a Markov process $(X_t)_{t \geq 0}$; you can find the proof for instance in Brownian motion - An introduction to Stochastic Processes by Schilling & Partzsch. – saz Jan 11 '19 at 15:13
  • @saz: I'm sorry, I don't understand your calculations of $P_{t+s}f(x)=P_t(P_sf)(x)$. Why $$\mathbb E^x[\mathbb E^x f(X_{t+s})\mid \mathcal F_t]]=\mathbb E^x[\mathbb E^{X_t}f(X_s)]=P_t(P_s f)(x).$$ I don't understand any of the equality. For me, markov process mean $\mathbb P{X_t\in A\mid \mathcal F_s}=\mathbb P{X_t\in A\mid X_s}.$ – user657324 Mar 31 '19 at 07:48
  • @saz : (Q1) is my previous post) Q2) Also, when you say "So, basically, the generator describes the movement of the process in an infinitesimal time interval", How the fact that $\mathbb E[f(X_t)]\approx f(x)+tAf(x)$ tel us about the mouvement of the process ? Maybe it gives information on how it move around $0$ (but I even don't see how), but I don't get how it gives information on the mouvement of the process for $t$ not close to $0$. Q3) Las thing, in your example with Brownian motion, why you develop $f(B_t)$ at order $2$ and not more or less ? Thank you. – user657324 Mar 31 '19 at 07:55
  • @user657324 (Q1) Take a look at any book/lecture note/... which covers Markov processes. The property of $(P_t)_t$ is called semigroup property and it is a direct consequence of the Markov property $$\mathbb{E}(f(X_t) \mid \mathcal{F}_s) = \mathbb{E}(f(X_t) \mid X_s)$$ and the tower property of conditional expectation. (Q2) I'm saying that it describes the behaviour for an "infinitesimal time interval" and so, yes, $t$ needs to be (infinitesimal) small. I'm not saying that you get immediately information on the sample path behaviour; what we do get is some information on $E^xf(X_t)$ – saz Mar 31 '19 at 08:06
  • i.e. on the distribution of $X_t$. (Q3) In order to develop $f(B_t)$ at order $>2$ we would need to assume a higher regularity of $f$. As long as $f$ is sufficiently smooth, you can develop $f(B_t)$ at any order you want – saz Mar 31 '19 at 08:08
  • So we have information on the distribution of $X_t$ but only near to $0$, right ? (sorry, it's quite new those concept). Could you enlighten me on what $\mathbb E^x[f(X_t)]$ give us information on the distribution of $X_t$ ? I'm not sure to really understand how. On other thing, in your example with the Brownian motion, you suppose that $\mathbb P{B_0=x}=1$ ? – user657324 Mar 31 '19 at 08:19
  • 2
    @user657324 Well, yes, the "approximation" for $E^x f(X_t)$ holds only for small $t$. Re your 2nd question: The distribution of any random variable $Y$ is uniquely characterized by expectations of the form $\mathbb{E}[f(Y)]$ for a class of functions $f$ which is "large enough" (e.g. $f \in C_b^2$) Thus, if we know $\mathbb{E}^x f(X_t)$ for a sufficiently large class of functions $f$, then we have information on the distribution of $X_t$. – saz Mar 31 '19 at 08:25
  • Do you know where I could find a result that prove that $Y$ is uniquely characterized by expectation of the form $\mathbb E[f(Y)]$ for a large class of function $f$ ? I don't get something. In a previous comment you say "any "nice" Markov process (more precisely, any Feller process) is uniquely determined by its generator", and here, it looks that we have only information around $0$, so how can we have information about $X_t$ when $t$ is not very small ? Maybe it's why Markov property is important ? – user657324 Mar 31 '19 at 08:31
  • @user657324 1) The class of functions $f$ needs to be sufficielty rich such that we can approximate indicator functions $1_B$ for $B$ open (or closed) by functions from this class. Such classes are called "measure determining", a quite detailed account is given e.g. in "Measures, integrals and martingales" by Schilling. Alternatively, you can look up "Urysohn's lemma" which allows e.g. to prove that $C_b^2$ is measure determining. Finally, you might want to recall that any distribution is uniquely characterized by its characteristic function, e.g. it actually suffices to know $E[f(Y)]$ for – saz Mar 31 '19 at 08:37
  • $f(x) = \exp(ix \xi)$ with $\xi \in \mathbb{R}^d$. 2) Yes, the Markov property is certainly important. The key is identity (1) (from my answer) which allows get information about $X_t$ for $t$ "not very small". – saz Mar 31 '19 at 08:39
  • Thanks for all your answer. I just found the book of Shilling you recommended to me, but I don't see where they prove that the distribution of a markov process is uniquely determinated by the generator. Do you know where I can find it precisely ? – user657324 Apr 01 '19 at 09:39
  • @user657324 It's Corollary 7.11(c) – saz Apr 01 '19 at 11:30
  • I may have an old version, but Corollary 7.11 says : Lebesgue measure is invariant under motion : $\lambda ^n=M(\lambda ^n)$ for all motion $M$ in $\mathbb R^n$. In particular, congruent set have same measure. Are you talking about this one ? – user657324 Apr 01 '19 at 12:32
  • @user657324 Sorry, we were talking about different books. I thought you were you refering to his book about Brownian motion; in there he shows that a Feller semigroup/process is uniquely characterized by its ininfitesimal generator (it's Corollary 7.11). In MIMS there is no material about Markov processes; the result on measure determining sets you can find in the 2nd edition of MIMS, Chapter 17 (it's not part of the first edition). – saz Apr 01 '19 at 12:52
  • Oh, I see. I found it. Thank you :-) – user657324 Apr 01 '19 at 12:55
  • Wait, what am I missing? How is do we remove the $x$ to get $\mathbb{E}^x(B_t-x)=0$ and $\mathbb{E}^x((B_t-x)^2)=t$? Should that not be $\mathbb{E}^x(B_t-x)=\mathbb{E}^x(B_t)-\mathbb{E}^x(x)=-x$? And $\mathbb{E}^x((B_t-x)^2)=\mathbb{E}^x(B_t)^2-2\mathbb{E}^x(xB_t)+\mathbb{E}^x(x^2)=t+x^2$? – dan mackinlay Jan 28 '20 at 02:31
  • Oh wait, $B_t$ is started at $x$. Got it. – dan mackinlay Jan 28 '20 at 02:39
  • Hi @saz, I have recently asked some questions about continuous-time Markov chain, but none of them receives any attention. Because I'm doing my thesis, I hope that you can have a look at them. They are 1, 2, and 3. – Akira Apr 30 '20 at 06:41
  • Sorry for this basic question! Can anybody explain equation 1 in a simpler form? Also i cannot understand how from 1 we can see that $u$ is the solution to the heat equation. – Denis Aug 15 '20 at 16:41
  • I am checking if $\mathbb{E}^x(f(B_t)) \approx f(x)+ \frac{t}{2} f''(x)$ satisfies the heat equation. But it gives us an extra term $\frac{1}{2} \partial_x^2 (\frac{t}{2}f''(x))$ which cannot be vanished. Am i missing something? @saz – Denis Aug 16 '20 at 11:58
  • @saz in your answer, did you assume that the Markov process has independent increment? ie $X(t+\Delta t) - X(t) = X(\Delta t) - X(0)$ always holds? – athos Dec 29 '21 at 01:54
  • How $\mathbb{E}^xf(X_t) \approx f(x)+t Af(x)$ was derived from the definition? – Kevin Nov 19 '23 at 15:48
24

In fact, there is a deeper relation between the Laplacian and Brownian motion.

Let $(M, g=\langle\cdot, \cdot\rangle)$ be a smooth Riemannian manifold without boundary. The Laplace-Beltrami operator is defined as the contraction of the covariant derivative of the differential of any smooth function on $M$

$$\forall f \in C^\infty(M): \Delta_M f := \mathrm{tr} \nabla \mathbf df = \mathrm{div}\ \mathrm{grad} \ f \in C^\infty(M),$$

where the well-known definition can be recovered with suitable generalisations of the divergence and the gradient. This means, for any orthonormal basis $E_1,...E_n$ for $T_pM$ ($p \in M$),

$$\forall f \in C^\infty(M): \Delta_M f(p) = \sum_{i=1}^n \nabla\mathbf d f(E_i,E_i) = \left\langle \nabla_{E_i}\mathrm{grad} \ f, E^i \right\rangle,$$

where we used Einstein notation. Moreover, we can generalise the term of a continuous semimartingale as follows: Every adapted $M$-valued stochastic process $X$ is a semimartingale on $M$ if, for all $f \in C^\infty(M)$, the composition map is $f(X)$ a real-valued semimartingale.

Then we can define Brownian motion on $M$ by the usual martingale problem (this is known as the extrinsic definition):

Let $X$ an adapted $M$-valued process. A process $X$ is called Brownian motion on $(M,g)$ if, for all $f \in C^\infty(M)$, the real-valued process

$$f(X) - \frac 12 \int \Delta_M f(X) \mathrm dt$$

is a local martingale.

In particular, we can prove Lévy's characterisation also for BM$(M,g)$. But this requires a reasonable definition of the quadratic variation.

The problem with this definition lies in the manifold itself: There does not exist a Hörmander-type representation of the Laplace-Beltrami operator if $M$ is not parallelizable, i.e. the tangent bundle $TM \overset\pi\longrightarrow M$ is not trivial. But it holds the fundamental relation

$$\Delta_{\mathcal O(M)} \pi^* = \pi^* \Delta_M,$$

more precisely,

$$\Delta_{\mathcal O(M)}(f \circ \pi)(u) = \Delta_M f(x),$$

for all $u \in \mathcal O(M)$ with $x = \pi(u)$. Moreover, there exist $n$ well-defined unique horizontal vectors $L_i(u) \in H_u\mathcal O(M)$, $\pi_* L_i(u) = ue_i$, $(e_i)$ basis for $\mathbb R^n$, the so called fundamental horizontal vector fields and the we define

$$\Delta_{\mathcal O(M)} := \sum_{i=1}^n L_i^2,$$

where $\mathcal O(M)$ denotes the orthonormal frame bundle, the prototypical example of a smooth principal fibre bundle whose structure group is given by the orthogonal group.

Using this relation, it is due to Malliavin, Eells and Elworthy that there always exists a lifted Brownian motion as solution of the globally defined SDE

$$\mathrm d U = L_i(U) \circ \mathrm d B^i,$$

on $\mathcal O(M)$, where $B$ is a real $n$-dimensional Brownian motion and we used Einstein notation. A solution is a diffusion generated by $\frac 12\Delta_{\mathcal O(M)}$. The idea is to solve the SDE in $\mathcal O(M)$ and $X = \pi(U)$ is the projection of the lifted Brownian motion $U$ on the manifold $M$ via $\mathcal O(M) \overset\pi\longrightarrow M$. It follows that $X$ is a Brownian motion on $M$ starting from $X_0 = \pi(U_0)$.

In geometrical terms, the idea is to "roll" our manifold $M$ by means of the (stochastic) parallel displacement along the paths of an $\mathbb R^n$-valued Brownian motion ("rolling without slipping"), known as stochastic development.

References:

  • Hsu, Elton P. Stochastic analysis on manifolds. Vol. 38. American Mathematical Soc., 2002.
  • (in german) Hackenbroch, Wolfgang, and Anton Thalmaier. Stochastische Analysis. Vieweg+ Teubner Verlag, 1994.
  • Elworthy, Kenneth David. Stochastic differential equations on manifolds. Vol. 70. Cambridge University Press, 1982.
  • Malliavin, Paul. Géométrie différentielle stochastique. Montreal, Presses de l’universite de Montreal, 1978.
wueb
  • 768
  • 1
    Beautiful answer. It might be useful to tell who $\mathcal O (M)$ and $\Delta _{\mathcal O (M)}$ are. – Alex M. Jun 23 '18 at 06:22
6

The generator is $A f (x) = \lim_{t \downarrow 0} \frac{\mathbf{E}^{x} [f(X_{t})] - f(x)}{t}$. If $X_{t}$ were a degenerate stochastic process say just given by an ODE then the generator would just give you an ODE for $f(X_t)$.

You can use a generator to for example derive PDEs relevant to the stochastic process. For a simple example say you wanted to find a PDE for the stationary distribution of $X$. Assume this distribution is given by $\pi(x)$. Take the expectation of both sides against $\pi(x)$, since it's a stationary distribution the right hand side will be $0$. On the left hand side do essencially integration by parts to move the differential opperator $A$ from $f$ to $\pi$ and think of $f(x)$ as a test function. Then you get that $A^* \pi(x) = 0$ where $A^*$ is the adjoint of $A$.

So in this example the steady state will solve $\Delta \pi = 0$.

Kai Sikorski
  • 1,075