Understanding a basic ergodic theory physical analogy

Question

The following excerpt is from the Wikipedia article on ergodic theory:

Ergodic theory is often concerned with ergodic transformations. The intuition behind such transformations, which act on a given set, is that they do a thorough job "stirring" the elements of that set. E.g. if the set is a quantity of hot oatmeal in a bowl, and if a spoonful of syrup is dropped into the bowl, then iterations of the inverse of an ergodic transformation of the oatmeal will not allow the syrup to remain in a local subregion of the oatmeal, but will distribute the syrup evenly throughout. At the same time, these iterations will not compress or dilate any portion of the oatmeal: they preserve the measure that is density. The formal definition is as follows:

Let $ T: X \rightarrow X$ be a measure-preserving transformation on a measure space $(X, \Sigma, \mu)$, with $\mu(X)=1$. Then $T$ is ergodic if for every $E $ in $\Sigma$ with $\mu(T^{-1}(E) \Delta E)=0$, either $\mu(E)=0$ or $\mu(E)=1$,

How could I formalize pouring the syrup into my oatmeal? Would I describe the oatmeal by a density function in $\mathbb{R}^n$? Say $\mu$ is Lebesgue measure on $\mathbb{R}^n$, then I have some density describing the location of the oatmeal, say $\rho d\mu$. Would the transformation be ergodic w.r.t. $$m(A) =\int_A fd\mu$$ or with resepct to $\mu$? When I describe the combination of the syrup and the oatmeal, would this create a new density at time $t$? This is confusing as I would assume the oatmeal and syrup have their own densities so I am unsure which measure the transformation is ergodic with respect to.

You need an action $T$ to describe mixing. Integration is one possibility, but is rather degenerate. Measure preserving tells us how $\mu(T)$ acts, which is a global concern, while ergodicity tells us how subspaces act. I'm not a huge fan of the definition your source gives, but it essentially comes down to saying that given a set $A$, the flow into $A$ came from and is exactly $A$ iff $A$ has full or null measure. — Brevan Ellefsen, Nov 13 '22 at 19:40
For example, if break a vial of gas in a box and track a nondegenerate subspace of the box, then the gas must "flow" out of the subspace unless we had chosen the whole box. A degenerate case could be the boundary, where the notion of flow can be hazy. In any case, this "flowing" is the action, and it's an action that's ergodic. To muddy the water a bit, we do call the measure with to respect to which $T$ is ergodic an "ergodic measure" which might have confused you. — Brevan Ellefsen, Nov 13 '22 at 19:41
To actually answer the question, in the analogy the "syrup" is a subspace of the "oatmeal", so it's with respect to the oatmeal measure that mixing the syrup throughout the mixture is "ergodic" (the analogy arguably better describes a mixing system rather than an ergodic system, though these are related notions) — Brevan Ellefsen, Nov 13 '22 at 19:45
@BrevanEllefsen So everything exists say in $\mathbb{R}^3$, and oatmeal measure, say $\mu_{\text{oatmeal}}$ is a measure on $\mathbb{R}^3$? Thus, our measure space is $(\mathbb{R^3}, \mathcal{B}{\mathbb{R}},\mu{\text{oatmeal}}) $? Would this mean that to describe where the oatmeal is, we don't necessarily say it is in some definitive set, but rather we say in this region, this is how much oatmeal there is? Would we be able to describe the set in which the oatmeal lies by say the support of $\mu_{\text{oatmeal}}$? — random_0620, Nov 13 '22 at 19:49
The oatmeal just inherits the Lebesgue measure from ambient space. The property of playing nicely with the ambient measure (the "density" part of the quote) is just stating that $T$ is measure preserving (sortaish, at least up to injectivity). You could describe the support, but for the purpose of defining ergodicity it shouldn't matter where the oatmeal is or how we choose to formalize or describe it. (all that matters is that it's compact, which is generally a requirement to give a well-defined probability measure) — Brevan Ellefsen, Nov 13 '22 at 20:01
@BrevanEllefsen So the interior of the bowl is the oatmeal with lebesgue meausre and I don't need to hink about a mass measure here? It's more the densities w.r.t. that notion of volume which describe the two? — random_0620, Nov 13 '22 at 20:03

Lee Mosher · Accepted Answer · 2022-11-13T20:32:48.740

4

When you pour a glop of syrup into your oatmeal, the contents of the bowl can be modelled as a set $X$ broken into a disjoint union of two measurable subsets $X = S \sqcup O$: $S$ represents the glop of syrup; and $O$ represents the original oatmeal.

Letting $\chi_S,\chi_O : X \to \{0,1\}$ be the characteristic functions, we have $\chi_S(x)+\chi_O(x)=1$ for all $x \in X$.

The volume measure on $X$ can be modelled as Lebesgue measure $\mu$ restricted to $X$, and normalized so that $\mu(X)=1$. You could then express both the syrup and the oatmeal as densities $\chi_S d\mu$, $\chi_O d\mu$.

The transformation itself is being modelled (somewhat unrealistically) as a measure preserving bijection $T : X \to X$.

I do not know what you mean by $f$, it is not otherwise mentioned in your post. But what I'll say is that $\mu$ itself is invariant under the transformation $T$, and so $\mu(A) = \mu(T(A))$ for all $A$, which can be written as $\int_A d\mu = \int_{T(A)} d\mu$.

One key point of ergodicity, as applied to this "syrup/oatmeal" picture, is that neither $S$ nor $O$ is invariant under $T$: neither $\mu(T^{-1}(S) \Delta S)$ nor $\mu(T^{-1}(O) \Delta O)$ is zero. To put it another way neither of the two characteristic functions $\chi_S$, $\chi_O$ is invariant under $T$.

Nonetheless, as one iterates $T$ more and more (as one stirs the oatmeal more and more), the real key point is that function $\chi_S \circ T^{-n}$ has a limit in some appropriate sense, that limit is $T$-invariant, and that limit is actually the constant function $\mu(S)$ with associated distribution $\mu(S) d\mu$ ($T$ will "distribute the syrup evenly throughout"). For this to work one has to take the limit of the function sequence $\chi_S \circ T^{-n}$ very carefully, usually in $L^2(X)$ or $L^1(X)$.

By the way, I have written my answer under the assumption that $T$ is a bijection, which allows me to be careless and write things like $\chi_S \circ T^{-n}$. One can certainly be more careful and rewrite all of this in the general case that $T$ is not a bijection.

edited Nov 13 '22 at 20:32

answered Nov 13 '22 at 19:51

Lee Mosher

120,280

Why do we consider $T^{-n} o\chi_S$ and not $T^{-n} o \chi_X$, since we are stirring both the oatmeal and the syrup? Would this correspond to at every point there is both oatmeal and syrup so it's meaningless? Also, how is this well defined since $T^{-n}:X\to X$ but $\chi_X : X\to {0,1}$? – random_0620 Nov 13 '22 at 20:09
Did you mean $\chi_S o T^{-n}?$ – random_0620 Nov 13 '22 at 20:14
Ah, yes, I'll fix that. – Lee Mosher Nov 13 '22 at 20:26
Regarding your comment about $\chi_X \circ T^{-n}$ versus $\chi_S \circ T^{-n}$, note that $\chi_X$ is the constant function with value $\chi_X(x)=1$ for all $x$. So also $\chi_X \circ T^{-n}(x)=1$ for all $x \in X$. This is a very boring situation. – Lee Mosher Nov 13 '22 at 20:31
On the other hand the sequence of functions $\chi_S$, $\chi_S \circ T^{-1}$, $\chi_S \circ T^{-2}$, $\chi_S \circ T^{-3}$,... is very interesting: $\chi_S d\mu$ is the original distribution of the syrup after it has first plopped into the oatmeal; $\chi_S \circ T^{-1} d\mu$ is the distribution of the syrup after one turn of the mixer; $\chi_S \circ T^{-2} d \mu$ is the distribution of the syrup after two turns of the mixer, and so on. – Lee Mosher Nov 13 '22 at 20:35
How does this indicator function know where the new elements of $S$ ended up? Wouldn't it not pick up the new locations and simply just "light up" over the set $S$ where it started? – random_0620 Nov 13 '22 at 20:48
That's a straightforward set-theoretical issue. For example, $\chi_S(x)=1$ if and only if $x \in S$, by definition. Next, $\chi_S \circ T^{-1}(x) = 1$ if and only if $\chi_S(T^{-1}(x)) = 1$ if and only if $T^{-1}(x) \in S$ if and only if $x \in T(S)$ if and only if, after one turn of the mixer, the subset representing the new position of the glop of syrup includes the point $x$. – Lee Mosher Nov 13 '22 at 20:57
Here, perhaps, is one other set theoretic point to understand about modelling this situation: the set $S$ represents the initial glop of syrup; the set $T(S) = {T(x) \mid x \in S}$ represents where the syrup has moved to after one turn of the mixer; ... ; for general $n \ge 1$ the set $T^n(S) = {T^n(x) \mid x \in S}$ represents where the syrup has moved to after $n$ turns of the mixer. – Lee Mosher Nov 13 '22 at 21:27
Is it true that $\chi_{T^n(S)}(x) = \chi_S(T^{-n}(x))$? – random_0620 Nov 13 '22 at 21:30
Something like that, but not exactly that: $\chi_{T^n(S)}(x)=1$ if and only if $x \in T_n(S)$ if and only if $T^{-n}(x) \in S$ if and only if $\chi_S(T^{-n}(x))=1$. Therefore, $\chi_{T^n(S)}(x) = \chi_S(T^{-n}(x))$. – Lee Mosher Nov 13 '22 at 21:33
This makes much more sense thank you very much. So if the transformation is ergodic, in the limit, the density will take over the whole space uniformly, rather than clumping up at some other location. – random_0620 Nov 13 '22 at 21:35
Although again I am assuming $T$ is invertible to allow me to write $T^{-n}(x)$; with care something similar can be derived in general. – Lee Mosher Nov 13 '22 at 21:35
Yes, your last comment is exactly the point. – Lee Mosher Nov 13 '22 at 21:35
One more question, if I were to look at the oatmeal in a physical setting without approximations, would the distribution be an atomic measure? Since oatmeal is made up of atoms and particles in a physical setting. Of course it would be insane to model this way, and would be much better to just say the density of the oatmeal is uniform and nonatomic throughout the bowl. – random_0620 Nov 13 '22 at 21:36
That's quite a bit beyond my expertise. I don't know the source of that quote, but I'm quite sure the whole "oatmeal, syrup" thing was just a way to try to impart some intuition to the concept of ergodicity. After all, most mathematical models of physical phenomena are just that, models. – Lee Mosher Nov 13 '22 at 22:38

score 1 · Answer 2 · answered Nov 14 '22 at 02:33

I would like to make a few comments; complementary to Prof. Mosher's answer.

It might be helpful to consider the following characterization of ergodicity of $(\mu,T)$: Given two measurable subsets $A,B\subseteq X$, the measure $\mu(T^{-n}(A)\cap B)$ (which models the chance of a point $x\in X$ that starts somewhere in $B$ to end up somewhere in $A$ at time exactly $n\in\mathbb{Z}_{\geq1}$ under the time evolution $T$) is comparable to $\mu(A)\mu(B)$ for large $n$, on average, that is, $(\mu,T)$ is ergodic iff

$$\forall A,B\in\Sigma: \lim_{n\to \infty} \dfrac{1}{n}\sum_{k=0}^{n-1}\left[\,\mu(T^{-n}(A)\cap B)-\mu(A)\mu(B)\,\right]=0.$$

The term over which we average here can be thought of as a "correlation" or "covariance"; the fact that it decays in time in some sense means that events get independent asymptotically. (see my answer at What's so special about standard deviation? for more on this.)

I should remark that in the above characterization of ergodicity if one replaces the square brackets with absolute value, one obtains a stronger property called "weak mixing", and if one further drops the averages one obtains an even stronger property called "strong mixing" (the "mixing hierarchy" goes further than that, but the definitions get more sophisticated, at least with this formalism.)

There is something to be said about the time parameter being discrete. One could make sense of this by referring to the fact (?) that human perception is discrete (biologically there is a minimum time length beyond which we don't perceive), and ergodic theory is supposed to model (human) observation. Or else one could think that for one reason or another (due to the fault or low precision of the instruments, or the cost etc.) we make stroboscopic observations. Of course mathematically there is a well-developed ergodic theory of continuous time (and beyond).
This is more of a historical comment. Arguably one of the earliest such heuristics regarding mixing properties is from Halmos's Lectures on Ergodic Theory (p.37), where he mixes vermouth and gin. There is a nice discussion of this in Brown's Ergodic Theory and Topological Dynamics (p.15); here is an excerpt:

To borrow an illustrative example from Halmos [32], suppose that a mixture is made containing 90% gin and 10% vermouth. If the process of stirring the mixture is ergodic, then after sufficient stirring any portion of the container will contain on the average (with respect to the number of stirrings) about 10% vermouth.

The correspondence is as follows: $A$ represents an anonymous part of the container one is observing, $B$ represents that part of the container where the vermouth is originally located. Thus $\mu(B)=0.1$. In order for the observation to be nontrivial, say $\mu(A)>0$, so that the observed part has positive volume. Then renormalizing the correlation above, we get that for $n$ large, on average

$$\dfrac{\mu(T^{-n}(A)\cap B)}{\mu(A)}\approx \mu(B) = 0.1,$$

that is, approximately 10% of the part we are observing will be occupied by vermouth. (How large $n$ ought to be depends on $A,B$ and how good of an approximation one wants, assuming $(\mu,T)$ is fixed.)

For the record, this heuristic only started making sense to me after getting accustomed to being able to think of the direction of time consistently (formally the difference between the past and the future may be confusing, and it depends on the interpretation).

Understanding a basic ergodic theory physical analogy

2 Answers2

Linked