3

What is wrong with the statement,

"If $x$ is a continuous random variable, with probability density function $f(x)$, the probability that it lies in $(x_1,x_2)$ is $$P(x_1<x<x_2)=\int_{x_1}^{x_2}f(x)\,\mathrm dx."?$$

Wherein lies the benefit of introducing an additional variable, $X$, for a random variable?

A moderator deleted a similar posting yesterday without giving any reason why. Please give me an answer or a reason for refusing to answer, this time.

  • Yes, that follows from the standard definition of a probability density function in the standard context. Trusting that the upper limit of your integral was meant to be $x_2$ – lulu Mar 15 '22 at 22:14
  • Not sure what this has to do with naming anything. – lulu Mar 15 '22 at 22:18
  • 2
    @lulu The question is about the abuse of notation in using $x$ both for the random variable and for the variable of integration. – Misha Lavrov Mar 15 '22 at 22:18
  • @MishaLavrov. Really? That's a pretty standard abuse of notation. Especially for someone who is just learning the basics. What makes you think that was the point here? – lulu Mar 15 '22 at 22:20
  • @lulu The question "Wherein lies the benefit of introducing an additional variable, $X$, for a random variable?" is what makes me think that. – Misha Lavrov Mar 15 '22 at 22:21
  • @MishaLavrov but no such variable was even introduced. Well, maybe you are right. Yes, it would be better to call the dummy variable something else. – lulu Mar 15 '22 at 22:22
  • 3
    @JeremyRiley Perhaps you could comment here? Are you just asking about the wisdom of using the same symbol for the integration variable and for the random variable you are observing? If so, then of course it is always poor practice to use the same symbol to denote two different things. – lulu Mar 15 '22 at 22:35
  • Thank you all for your contributions: I learned something. Though I still don't see the pitfalls, so long as we differentiate the naming of the boundaries (such as $x_1$ & $x_2$ in this example) from the variable argument of the probability density or mass function which, in my mind, is the random variable. – Jeremy Riley Mar 16 '22 at 06:54

1 Answers1

3

There is nothing factually wrong about the statement; however, what you wrote would generally be considered poor notation. This is because you have used the same symbol in two places of the same expression to denote different things, i.e, the left hand side uses $x$ to denote a random variable while the right hand side uses $x$ to denote a variable of integration.

Writing mathematics is about clearly communicating ideas and this abuse of notation would likely introduce confusion without any added benefit, e.g. making the expression more compact. For this reason, modern statistical notation uses capital letters to denote random variables to make them visually distinct from deterministic variables such as those used in an integral.

  • Makes sense. Though my scheme would work if I had named the integration variable something else, that would defeat the intended avoidance of a new variable. – Jeremy Riley Mar 16 '22 at 07:08
  • I had an afterthought. What is the lowercase variable of the probability distribution function called? Is it correct to state that the probability density (or mass) function is a "probability distribution function" of the random variable? Aren't we then implying that the random variable is the argument to such a function? – Jeremy Riley Mar 16 '22 at 12:43
  • 1
    @JeremyRiley The lowercase $x$ in the density function $f(x)$ is called an independent variable and it is deterministic (nonrandom) unlike the random variable $X$. This is precisely why we should use the notation I mentioned in my answer because using $x$ to denote a random variable makes it look like $f(x)$ is a function of a random variable, which it is not. – Aaron Hendrickson Mar 16 '22 at 14:18
  • Some call the function argument a "value" of the random variable. It is deterministic only in that it determines a probability, which is inherently not deterministic. Anyway, you helped clarify my confused mind quite a bit and I have already upvoted all of your answers. Thanks. – Jeremy Riley Mar 16 '22 at 16:24
  • Another afterthought. A random variables is never the independent variable of a distribution function, such as PDF, PMF, or CDF. The independent variable in such functions is a value or a bound of the random variable. These functions are alternatively expressed as the probability of a relation (equality or inequality, or both) between the random variable and these bounds, e.g., $$P(a<X<b)=F_{X}(b)-F_{X}(a).$$ – Jeremy Riley Mar 19 '22 at 10:33