If I initially introduce random variables with upper case letters, does it make sense to use the notation $\mathbb{E}\left[ p(x \mid z ) \right]$?

Question

Suppose that you initially use upper case letters to denote r.v.s. For example, you could say

Consider the r.v.s $X_1, \dots, X_n$ and $Z_1, \dots, Z_m$ where the $Z_i$s give rise to $X_j$s

Would it make sense to use then $$\mathbb{E}\left[ p(x_1, \dots, x_n \mid z_1, \dots, z_m ) \right]$$? In other words, would these notations be consistent with each other? And why?

Or should I say instead

Consider the r.v.s $x_1, \dots, x_n$ and $z_1, \dots, z_m$ where the $z_i$s give rise to $x_j$s

that is, with lower case letters? And why?

Or maybe I should use the notation

$$\mathbb{E}\left[ p(X_1, \dots, X_n \mid Z_1, \dots, Z_m ) \right]$$

if I say

Consider the r.v.s $X_1, \dots, X_n$ and $Z_1, \dots, Z_m$ where the $Z_i$s give rise to $X_j$s

???

NOTE: I assume that the expected value operator only gets r.v.s as input. ALSO, my goal is to have consistent notation, i.e. if I use upper case letters to denote r.v.s in one place, I want ALWAYS to use upper case letters to denote r.v.s

score 1 · Answer 1 · answered Jul 27 '20 at 18:35

1

Mathematical writing is case (and font) sensitive. This allows one, for instance, to have a collection of sets, $\mathcal{A}$, an element $A$ of $\mathcal{A}$ and an element $a$ of $A$. The three objects $a, A, \mathcal{A}$ are different. We use the same letter to remind ourselves that there is a relationship between them. So we might say $a \in A, b \in B$ but saying $a \in B, b \in A$ is weird.

In probability, one often (but not always) uses upper case letters to denote random variables and lower case letters to denote real numbers (or other non-random values). If you are using this convention, then $p(x_1,\dots,x_n|z_1,\dots,z_m)$ is a constant and hence $$\mathbb{E}[p(x_1,\dots,x_n|z_1,\dots,z_m)] = p(x_1,\dots,x_n|z_1,\dots,z_m) $$ in the same way that $\mathbb{E}[7] = 7$.

If you had defined your random variables with upper-case letters, I would most likely assume that $x_1,\dots,x_n,z_1,\dots,z_m$ are non-random values associated to the random variables for example maybe the relationship is $\mathbb{P}(X_i = x_i) = 1/2$. But before that, I would be confused why $x_1,\dots,x_n,z_1,\dots,z_m$ haven't been defined.

Otherwise, you should define your random variable as $X$ and write $\mathbb{E} [X]$ or define it as $x$ and write $\mathbb{E}[x]$ but you should not mix upper and lower cases for the same object.

answered Jul 27 '20 at 18:35

Trevor Gunn

27,041

I am specifically talking about using densities, which behave as random variables, inside expected values. Please, see this https://math.stackexchange.com/q/3770993/168764. Your intent is good but it doesn't clear my doubts at all. I am trying to understand why the notation in https://arxiv.org/pdf/1601.00670.pdf makes sense and if I can introduce r.v.s as upper case letters and then use $\mathbb{E}\left[ p(x \mid z) \right]$, where $p(x \mid z) $ is not supposed to be "constant" because otherwise some reasonings in that paper don't seem to make sense (or maybe I am wrong). – Jul 27 '20 at 18:39
In other words, suppose I use $\mathbb{E} \left[ p(x \mid z) \right]$, where $x$ is fixed (i.e. the data) but $p(x \mid z)$ is a random variable (because otherwise how could we take its expectation), how should I introduce or denote the random variables associated with the random variable $p(x \mid z) $? Which letters should I use? Upper $Z$, lower $z$, or maybe something else? That's my question. – Jul 27 '20 at 18:46
@nbro Ok, but that's not what you wrote in your question. You wrote (paraphrasing): if my random variables are defined as $X, Z$ does it make sense to use $x, z$ in the expectation or should I use $X, Z$? Also, as far as I can tell, the paper you are reading does not use upper case letters at all so I'm confused why you are even asking about it. And your real issue isn't upper versus lower case but $x$ fixed versus $z$ random, then how does this question differ from your previous one? – Trevor Gunn Jul 27 '20 at 18:50
Because I want to use upper case letters to denote random variables (as opposed to the paper), but I don't understand if that will be consistent with the notation of the paper where they use lower case letters inside the expectation. Is it clear now? – Jul 27 '20 at 18:51
@nbro In my answer I have explained that the case matters and if you want to write $X_1,\dots,X_n,Z_,1\dots,Z_m$ for your random variable then you need to write $p(X_1,\dots,X_n|Z_1,\dots,Z_m)$ or otherwise define $x_1,\dots,x_n$ and/or $z_1,\dots,z_m$. If you're asking what would be consistent with the paper: the answer is neither. The paper is using $p(\mathbf{x}|\mathbf{z})$ to mean the conditional distribution of $\mathbf{x}|\mathbf{z}$ not the distribution $p$ applied to the random variables $\mathbf{x}$ and $\mathbf{z}$. I.e. $p(\mathbf{x}|\mathbf{z})$ is a distribution, not a rv. – Trevor Gunn Jul 27 '20 at 19:00
See https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence#Definition Notice that the input is two distributions not two random variables. Perhaps you would be more comfortable using the notation $p_{X|Z}$ or something instead. – Trevor Gunn Jul 27 '20 at 19:00
The thing that you forgot is that $p(x \mid z)$ is supposed to be a likelihood, so $x$ is fixed. So, how is your notation with upper case letters adapt to this case? Also, why is the input "two distributions not two random variables"? Input to what? – Jul 27 '20 at 19:01
@nbro With $z$ being observed, and $X$ being random, that would be $p_{X|z}$ which is the function taking an outcome $x$ for $X$ to $p_{X|z}(x) = p(x,z)/p(z)$ (the paper swaps the $x$s and $z$s here). I am using lower case to denote that $z$ is an outcome (e.g. a non-random real number) and $X$ is in upper case to denote that it is a random variable. What I call $p_{X|z}$, the paper calls $p(\mathbf{x}|\mathbf{z})$ (again, they swap $x$ and $z$). – Trevor Gunn Jul 27 '20 at 19:06
Why did you swap? I am talking about the likelihood? The swap only confuses me. Can you please explain how it works for the likelihood. Why did you avoid $p_{x \mid Z}$? – Jul 27 '20 at 19:08
@nbro I swapped because you swapped. You were the one who wrote $p(x|z)$ when the paper writes $p(z|x)$. And sorry but I don't have more time to spend on this. Read the definition on Wikipedia https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence#Definition and if you still are confused, make a new question because I consider this one answered. – Trevor Gunn Jul 27 '20 at 19:24
In the paper, they also have $p(x \mid z)$ and not just $p(z \mid x)$. But, ok, if you don't have more time, I understand. – Jul 27 '20 at 19:26

If I initially introduce random variables with upper case letters, does it make sense to use the notation $\mathbb{E}\left[ p(x \mid z ) \right]$?

1 Answers1