5

Let us have measurable spaces $(S_1, \Sigma_1)$ and $(S_2, \Sigma_2)$. Idea of measurable function $f$ with respect to $\Sigma_1,\Sigma_2$ is the following. $f:$ $S_1 \to S_2$ has to be such that: $$\forall E_2: f^{-1}(E_2) \in \Sigma_1, $$ where $ E_2 \subset \Sigma_2, E_1 \subset \Sigma_1 $.


Another author looks for elements of $S_1$, lets say $x$, and says that function $f$ is measurable if: $$\{x: f(x) \in E_2 \} \in \Sigma_1 $$


My question is: WHY we have: $f$ from the $S_1 \to S_2$ but not $f$ from the $E_1 \to E_2$!?

Or, if we wish to travel from sets, then the inverse function $f^{-1}$ will be from the $S_2 \to S_1$.

I made a sketch of the 1st definition of this measurable function, and my picture does not seems to be coherent with what I naturally want to find. enter image description here

I mean for me it would be natural to define that $f^{-1}$ goes back the same route from which it started ($S_2 \to S_1$).

wikipedia and few books I have been reading does not shed light on this topic except what is unclear for me. Other googling did not help either. thank you.



UPDATE Thank you all very much. All three answers complement each other and give me a wider picture of what to think about. I was thinking the hole day, and will after a while, but I have troubles and many ideas still somehow mix in the head.

To summarize. The first take home message I got:

1) From simple calculus. We build image and preimage functions firstly on elements of some set. Then we have something like a function on sets.

2) There is no restrictions ( inverse exist, or it is injective, surjective or bijective function) for measurable function $f$.

3) There was a great insight why exactly preimage is what we have to use in building the mapping. But I agree it somewhere in my heart, and still try to make it more understandable to brain and I fail. I will post below more text about what I understand and what I do not.

4) What about second definiton where we have just image. I mean, are these definitions equal? It does not use preimage function.

5) I am also trying to find analogue in Probability Theory, where we have measurable space $(\Omega,\mathcal{F})$, measurable function(r.v.) $X$ that makes mapping $X: \Omega \to \mathcal{R}$, where $(\mathcal{R},\sigma(\mathcal{R}))$ is another measurable space.

My question here is the following. Let $w \in \Omega, X(w) \to r$ where $r \in \mathcal{R}$. $X^{-1}(r) = w$. Here we do not use correspondence of subsets of sigma algebras. Am I right? I omitted which r and w should be because I think I can write something stupid=)

6)And to conclude. The general problem/question for me was and remains, that I still try to imagine in my head that we should have for a measurable function some kind of correspondence of image and preimage, but this is not enough(or other reason stated) as I read from answers. Maybe there is possible to do example/picture to answer the question of the reason of function (image) to be from set X to set Y, while the preimage function may work not necessary with the same “matter”, in our case some subsets of sigma algebras, that are generated by initial sets (or shortly part of my question is: why measurable function is not an inversion function that maps from same sets X and Y?).

Am I right that this definition of measurable function ( $\forall E_2: f^{-1}(E_2) \in \Sigma_1, $) allows for certain sets $E_1$ to have no correspondence with all the subsets of sigma algebra $\Sigma_2$? If yes, this is the point where $f$ may have mapping from $E_1$ to a an empty set $\emptyset$?

FIN1. Is such a mnemonics mathematically correct:

Function $f$ that is an image of set $X$ to set $Y$ (here I mean image of every element $x$ of $X$ to elements $y$ of $Y$) is measurable, if for any subset generated by $\sigma(Y)$ we have a preimage that is inside a $\sigma(X)$.

FIN2.And again,

For every function $f$, subset $A$ of the domain and subset $B$ of the codomain we have $A \subset f^{−1}(f(A))$ and $f(f^{−1}(B)) \subset B$.

If $f$ is injective we have $A = f^{−1}(f(A))$ and if ''f'' is surjective we have $f(f^{−1}(B)) = B$.

So, in order to understand this better I wanted at least to show myself visually that this holds.

I have included picture of my thoughts which give me completely opposite result. Should I post this as separate question? enter image description here

Ievgenii
  • 413
  • 2
    Correction: For $f$ to be measurable you must have $f^{-1}(E_2)\in\Sigma_1$ for all $E_2\in\Sigma_2$. (Notation: $f^{-1}(E_2)={x\in S_1:f(x)\in E_2}$.) – John Dawkins Oct 31 '15 at 19:06
  • You see, I have no idea of why we have such mapping. =) – Ievgenii Oct 31 '15 at 19:11
  • Note that S_i is an element of the respective Sigma algebra. If a function maps some set to another it also maps subsets of this set to subsets of the other set. – Dirk Oct 31 '15 at 19:57

3 Answers3

12

John Dawkins gives a nice explanation in his answer, but I'd like to mention some further details. I believe you are confused mainly due to notation, or a so-called "notational abuse" we like to do, namely we use the same notation for a function that maps "points" to "points" and its associated image function that maps "point sets" to "point sets". I'll leave what a "point" means vague, for our purposes it is sufficient to keep in mind that if $x$ is a point in some set $X$, then $\{x\}$ is a member of the power set $\mathcal{P}(X)$ of $X$, and is a point set. In any case, I believe your problem consists of the following parts:

  1. Basic set theory, difference between point function $f(x)$ and its associated set functions $f(A)$ and $f^{-1}(B)$;
  2. Definition of a $\sigma$-algebra and a measurable function;
  3. Why a measurable is defined in terms of the preimage function rather than the image function.

1. First let's talk about some set-theoretical basics. Let $X$ be a nonempty set. Then we call the collection $\mathcal{P}(X)$ of all subsets of $X$ the power set of $X$. Note that the members of $\mathcal{P}(X)$ are the subsets of $X$ (so your drawing might be a little off unless I interpreted it incorrectly).

Now let $X$ and $Y$ be two nonempty sets and let $f$ be a function from $X$ to $Y$. If $A\in\mathcal{P}(X)$ and $B\in\mathcal{P}(Y)$ (so that $A$ is a subset of $X$ and $B$ is a subset of $Y$), then we denote by $f(A)$ the set of all $f(x)$'s where $x$ is from $A$ and by $f^{-1}(B)$ the set of all $x$'s for which $f(x)$ is a member of $B$, i.e.,

$$f(A):=\{f(x)\in Y|x\in A\}\in\mathcal{P}(Y),$$

$$f^{-1}(B):=\{x\in X|f(x)\in B\}\in\mathcal{P}(X).$$

Note that the $f$ in the expressions "$f(x)$" and "$f(A)$" actually denotes different mathematical objects, even though they are related very naturally (of course one might inquire about how one knows $x$ and $A$ are not of the same hierarchy (indeed, why?)). In some texts the notations $f[A]$ and $f^{-1}[B]$ are used when the argument of $f:X\to Y$ is a subset of the set on which it is defined, that is, $X$. Also keep in mind that I did not claim anything about injectivity of $f$, so $f^{-1}$ does not mean the inverse of $f:X\to Y$, nor does it actually need the existence of the inverse in order to make sense.

When sets are considered together with their power sets, associated with $f:X\to Y$ we immediately see two functions between the power sets. The image function $I_f:\mathcal{P}(X)\to\mathcal{P}(Y)$ takes any member $A$ of $\mathcal{P}(X)$ and maps it to $f(A)$ as defined above, and the preimage function $P_f:\mathcal{P}(Y)\to\mathcal{P}(X)$ takes any member $B$ of $\mathcal{P}(Y)$ and maps it to $f^{-1}(B)$, i.e.,

$$I_f:\mathcal{P}(X)\to\mathcal{P}(Y), I_f{A}:=f(A),$$

$$P_f:\mathcal{P}(Y)\to\mathcal{P}(X), P_f(B):=f^{-1}(B).$$

I'll leave it to you to prove that these two functions (between power sets) are well-defined, i.e., they are indeed functions. As a further remark, if $f$ does have an inverse $f^{-1}$, then $P_f$ is the inverse of $I_f$ (why?) (recall that this means that $P_f \circ I_f$ is the identity function of $\mathcal{P}(X)$ and $I_f\circ P_f$ is the identity function of $\mathcal{P}(Y)$).

Since we only need to know $f$, $X$ and $Y$ to know $I_f$ and $P_f$, we denote these last two functions by $f$ and $f^{-1}$ also (this is the notational abuse I was talking about) (one other reason is that there is a everlasting shortage of ink). As an exercise you could compare the meanings of $f^{-1}(y)$ and $f^{-1}(\{y\})$ for some $y\in f(X)$, when $f$ is injective and when it is not injective.


2. We have only sets and their power sets so far, of whose we can compare for instance the cardinalities. However we would like to talk about stronger objects, that is to say, objects that are more than sets, objects that are sets and that have a "structure" of sorts, e.g. an operation (like addition), or a metric (a set with a structure is sometimes called a space); and we would like to compare them. It turns out specifying some subsets of the power sets is a nice way to introduce structures to sets; and investigating how the image function and preimage function behave when restricted to those specified subsets helps us compare the respective structures of each set.

Let $S$ be a nonempty set. A subset of $\mathcal{P}(S)$ (so a collection of subsets of $S$) $\Sigma_S$ is a $\sigma$-algebra, if $S\in\Sigma_S$, and it is closed under complements and countable unions. The pair $(S,\Sigma_S)$ is a measurable space. If $(S_1,\Sigma_1)$ and $(S_2,\Sigma_2)$ are two measurable spaces and $f:S_1\to S_2$ is a function, then $f$ is measurable if

$$\forall E_2\in\Sigma_2(\subseteq\mathcal{P}(S_2)):f^{-1}(E_2)(=P_f(E_2))\in\Sigma_1(\subseteq\mathcal{P}(S_1)).$$

Observe that the parantheses include information we have already established (I wrote them again for convenience). Also note that this is equivalent to requiring that $P_f:\Sigma_2\to\Sigma_1$ is well-defined (why?).


3. I believe Rudin's presentation of the subject in his Real and Complex Analysis is in accordance with the relatively abstract framework. He declares on p. 8 the following:

The class of measurable functions plays a fundamental role in integration theory. It has some basic properties in common with another most important class of functions, namely, the continuous ones. [...] Our presentation is therefore organized in such a way that the analogies between the concepts topological space, open set and continuous function, on the one hand, and measurable space, measurable set, and measurable function, on the other, are strongly emphasized.

For more concrete reasons why we use the preimage function you can have a look at this math.SE thread . Admittedly the use of preimages in the definition was intuitive to me, but I'll think about it and if I come up with some alternative explanation I'll add it to my answer.

Alp Uzman
  • 10,742
9

Let me pick up where A. Alp Uzman left off: Why are we using the preimage function and not the image function?

OK, remember we have sets $S$ with an additional structure $\Sigma$ which is given by a set of subsets. Let's ignore the specific case and only remember that $\Sigma$ gives us special subsets.

Indeed, let's look at the minimal case: Assume that $\Sigma$ specifies exactly one subset. Let's call the elements of that single subset "special".

Now we want to define "special functions" from $(S_1,\Sigma_1)$ to $(S_2,\Sigma_2)$.

At the set level, as A. Alp Uzman described, we have two associated functions available, the image function and the preimage function.

Now your first impulse would probably be to demand that the image function maps the special set of the domain to the special set of the codomain. Now that would of course imply that special elements are mapped to special elements. However the condition does not say what happens to non-special elements. Indeed, even a function that maps the complete domain to the special elements of the codomain would fit that definition.

Now consider the condition that the preimage of the special set of the codomain is the special set of the domain. By the definition of the preimage function, this also implies that special elements are mapped top special elements. However now additionally, the non-special elements must also be mapped to non-special elements of the codomain (because if they were mapped to special elements, they'd be in the preimage of the codomain's special set, violating that the preimage is the special set of the domain).

Quite obviously, it makes more sense to call a function that maps special elements to special elements and non-special elements to non-special elements a "special function", as it actually preserves speciality. But this condition is enforced by using the preimage function, not the image function.

So the preimage function give a "more precise" condition. This can be seen also from the fact that for any $A\subset S_1$, we have $f^{-1}(f(A))\supset A$ but for any $B\subset S_2$, $f(f^{-1}(B)) = B$. So if we put conditions on the image function, those conditions can "bleed" out of the specified sets on the source, while conditions on the preimage functions are precise in both directions.

That's why it makes more sense to define property-preserving maps using the preimage function. For our "special space", special functions are those where the preimage of the special set is special. In topology, continuous functions are those where the preimage of an open set is open. And in measure theory, measurable functions are those where the preimage of a measurable set is measurable.

celtschk
  • 43,384
6

In measure theory (or probability theory) to study the behavior of a function like $f$ we often look at sets of the form $\{x\in S_1:f(x)\in E_2\}$, and it is useful to have a briefer notation for such a set, whence $f^{-1}(E_2)$. And this notion is more than just notation: the set mapping $f^{-1}$ so defined maps the power set of $S_2$ into the power set of $S_1$, and a function $f:S_1\to S_2$ is $\Sigma_1/\Sigma_2$ measurable provided $f^{-1}$ maps $\Sigma_2$ into $\Sigma_1$.

And there is a certain coherence to the notation. We have the direct image $f(E_1)$ of a subset of $S_1$, defined by $f(E_1)=\{f(x): x\in E_1\}$. If $f$ happens to be one-to-one (and onto $E_2$), then $f^{-1}(E_2)$ as defined in the preceding paragraph is the same as the direct image of $E_2$ under the point mapping $f^{-1}$ (here the inverse function of $f$). The set mapping $f^{-1}$ make sense even when $f$ is not one-to-one. The dual use of the notation $f^{-1}$ (sometimes a point mapping, sometimes a set mapping) may be a source of confusion, but usually context indicates the proper interpretation.

John Dawkins
  • 25,733