I refer here to a simple linear regression whose true representation is given by the equation: $y_i=x_i'\beta+u_i$,
where as usual $x_i$ is a $K\times1$ vector of independent explanatory variables, $\beta$ is a $K\times1$ vector of parameters and $u_i$ the error term of the $i$th observation and by construction of the true representation $u_i$ is considered to be independent of $x_i$.
I have trouble to understand why the conditional expectation $E[u_i\mid x_i]=0$ is considered stronger (or more natural) than the unconditional expectation $E[u_i]=0$. Almost every textbook naturally uses the conditional version instead of the unconditional one.
The reason for my difficulties is the following argument:
Firstly, it is clear to me that by the Law of Total Iteration $E[E[u_i\mid x_i]] = E[u_i]=0$. Hence conditional expectation implies unconditional expectation.
But don't we expect the error term $u_i$ to be independent of $x_i$ by construction? If that is the case, would not $E[u_i]=0$ imply $E[u_i\mid x_i]=0$ ? Or in other words, we if assume the error term to be independent of the explanatory variable, does the distinction of unconditional vs. conditional expectation really matter here?
I think there is a similar argument for the unconditional vs conditional homoskedasticity. Textbooks use the conditional one, i.e. $V[u_i^2\mid x_i] =\sigma^2$, which is preferred to the conditional variance $V[u_i^2]=\sigma^2$. Their difference should not matter if the error term is independent of the explanatory variable, should it?
Given a linear regression, I cannot find an example or perhaps a graphical representation or even a good story how an error term can be unconditionally mean-zero but conditionally biased. If I have a theoretical scatter plot of error term with mean zero, how am I suppose to slice out the $X$ dependent part and say, look here, the error is indeed biased, when by definition the error term exists only because of the regression set up in the first place.
Maybe I have difficulties understanding the conditional expectation fully. If I $X$ is given, or in other words, if I have knowledge of $X$, how can this change my expectation of $u_i$ when both are considered independent? What does this conditional expectation really mean and how does it improve my understanding of the underlying regression and to what means in contrast to the unconditional one?
Sorry if this seems to be confusing and probably a stupid question - I have been pondering about these concepts for a while and cannot find an illustrative answer that I am happy with.