5

I refer here to a simple linear regression whose true representation is given by the equation: $y_i=x_i'\beta+u_i$,

where as usual $x_i$ is a $K\times1$ vector of independent explanatory variables, $\beta$ is a $K\times1$ vector of parameters and $u_i$ the error term of the $i$th observation and by construction of the true representation $u_i$ is considered to be independent of $x_i$.

I have trouble to understand why the conditional expectation $E[u_i\mid x_i]=0$ is considered stronger (or more natural) than the unconditional expectation $E[u_i]=0$. Almost every textbook naturally uses the conditional version instead of the unconditional one.

The reason for my difficulties is the following argument:

Firstly, it is clear to me that by the Law of Total Iteration $E[E[u_i\mid x_i]] = E[u_i]=0$. Hence conditional expectation implies unconditional expectation.

But don't we expect the error term $u_i$ to be independent of $x_i$ by construction? If that is the case, would not $E[u_i]=0$ imply $E[u_i\mid x_i]=0$ ? Or in other words, we if assume the error term to be independent of the explanatory variable, does the distinction of unconditional vs. conditional expectation really matter here?

I think there is a similar argument for the unconditional vs conditional homoskedasticity. Textbooks use the conditional one, i.e. $V[u_i^2\mid x_i] =\sigma^2$, which is preferred to the conditional variance $V[u_i^2]=\sigma^2$. Their difference should not matter if the error term is independent of the explanatory variable, should it?

Given a linear regression, I cannot find an example or perhaps a graphical representation or even a good story how an error term can be unconditionally mean-zero but conditionally biased. If I have a theoretical scatter plot of error term with mean zero, how am I suppose to slice out the $X$ dependent part and say, look here, the error is indeed biased, when by definition the error term exists only because of the regression set up in the first place.

Maybe I have difficulties understanding the conditional expectation fully. If I $X$ is given, or in other words, if I have knowledge of $X$, how can this change my expectation of $u_i$ when both are considered independent? What does this conditional expectation really mean and how does it improve my understanding of the underlying regression and to what means in contrast to the unconditional one?

Sorry if this seems to be confusing and probably a stupid question - I have been pondering about these concepts for a while and cannot find an illustrative answer that I am happy with.

Majte
  • 303
  • I think that belongs to stats.. I put it there – Majte Mar 21 '13 at 21:37
  • "Law of total expectation" and "law of iterated expectations" are both conventional names for the same thing, but "law of total iteration" (with no mention of expectation) seems to confuse those with each other. $\qquad$ – Michael Hardy Jul 22 '16 at 18:22

3 Answers3

4

Indeed, if E[u|x]=0 then E[u]=0 because E[u]=E[E[u|x]]. In the other direction, E[u]=0 does not imply E[u|x]=0 as the case x=u shows since E[u|u]=u. But if x and u are independent then E[u|x]=E[u] hence the implication [ if E[u]=0 then E[u|x]=0] becomes true.

Did
  • 279,727
  • ohh great, thank you! So, it does not make sense to imply a so called "stronger" condition if we assume that in the true relationship we have a priori independence of $e_i$ and $u_i$? So what's the fuzz about the conditional expectation? Is it because of the conditional variance, i.e. $e_i$ and $u_i$ can be independent but the conditional variance can be still dependent on x? What I try to say is, I don't understand the reasons why we use the conditional statements here.. how would this practically ever matter? – Majte Mar 21 '13 at 22:36
  • 1
    Well, there are plenty of other statistical relationships than the two extremes being a function of (and then E[u|x]=u) or being independent of (and then E[u|x]=E[u]). – Did Mar 21 '13 at 23:35
  • ohh I see.. well, in this case I need to just continue to learn and believe you for now :) thanks! – Majte Mar 21 '13 at 23:44
1

Perhaps we should add another condition here: the type of explanatory variable in the model, deterministic or stochastic?

In a model with a deterministic, or non-stochastic, explanatory variable, the assumption E(u)=0 and Var(u) = \sigma^2 is enough. To quote William H. Green (2011): "The assumption of nonstochastic regressors at this point would be a mathematical convenience. With it, we could use the results of elementary statistics to obtain our results by treating the vector x_i simply as a known constant in the probability distribution of y_i." Additionally, this unconditional expectation comes from the fact that E(u|x) = E(u)E(x) = E(u), that is the error term and the explanatory variable is independent.

While in a model with a stochastic explanatory variable, the conditional expectations are needed E(u|x) and Var(u|x). If in the stochastic regressor case we assume that these regressors and the error term are independently distributed, the OLS estimators are still unbiased but they are no longer efficient (Damodar and Porter, 2008).

Here are more excerpts on the matter:

"Values taken by the regressor X may be considered fixed in repeated samples (the case of fixed regressor) or they may be sampled along with the dependent variable Y (the case of stochastic regressor). In the latter case, it is assumed that the X variable(s) and the error term are independent, that is, cov (X_i , u_i) = 0." (Damodar and Porter, 2008)

"Realistically, we have to allow the data on x_i to be random the same as y_i , so an alternative formulation is to assume that x_i is a random vector and our formal assumption concerns the nature of the random process that produces x_i. If x_i is taken to be a random vector, then Assumptions 1 through 4 (the Classical Linear Regression Assumptions) become a statement about the joint distribution of y_i and x_i." (Green, 2011)

References: Green, William H., 2011. Econometric Analysis 7th Ed, Prentice Hall. Damodar, G., Porter, D., 2008. Basic Econometrics, McGraw Hill Education.

P.S. If anyone knows how to use the TeX codes please let me know how.

-1

Here is another answer to your question. You've correctly noted that the condition E[u|x]=0 --- called "mean independence" of u and x --- is weaker than full-on independence of u and x. Bruce Hansen's notes on Econometrics give a particularly simple counterexample. (The link to the book is here: http://www.ssc.wisc.edu/~bhansen/econometrics/. See pages 17-18.) Suppose u=ex where x and e are independent standard normals, i.e. x~N(0,1) and e~N(0,1). Then conditional on x, u has the distribution N(0,x^2). Therefore E[u|x]=0, but u is obviously not independent of x.

This simple counterexample illustrates how, in practice, the error terms might be DEPENDENT on a regressor variable but still MEAN-independent of it. In particular, the conditional distribution u|x might depend on x even though E[u|x]=0. Your problem may be that you're only imagining uncorrelated NORMAL distributions. If x and y are jointly normal but uncorrelated, then the conditional distribution u|x (where u=y-E[y|x]) doesn't depend on x: it's always the marginal distribution of y. In that special case, u and x are actually independent.

The point is that, in general, running a regression amounts to using E[y|x] to predict y in an arbitrary joint distribution (the normality assumption is used only for inference), and all you need to "assume," in that context, is that u is mean-independent of x. (I put scare quotes around "assume" because it's not really an assumption: if you DEFINE the error term as u:=y-E[y|x], then it's a simple CONSEQUENCE, not an assumption, that E[u|x]=0 [and from this it also follows fairly straightforwardly that u is uncorrelated with any function of x, though of course zero correlation doesn't imply independence]. The real assumption you make in running a basic regression is that E[y|x] is linear.) You don't need to assume the stronger condition that u and x are actually independent: nothing in the theory of regression requires it, so why impose it, if it unreasonably restricts the class of problems you can consider?