For a general understanding of variance - or of its square root, standard deviation - see this question. As for Brownian motion:
One important property of variances is this: if $X,\,Y$ are uncorrelated (as happens e.g. if they're independent), $\operatorname{Var}(X+Y)=\operatorname{Var}X+\operatorname{Var}Y$. Thus when we add many small uncorrelated contributions, we expect a variance to be proportional to the number of such terms. This is why Brownian motion has a variance proportional to the time elapsed.
With regard to the question's edit, the motive for squaring is partly the geometric interpretation in terms of inner products you'll find at the above link, partly the additivity we've discussed, and partly that it's pretty much always far more mathematically tractable. Related to that last point is the fact that least-squares minimisation is an especially tractable regression technique in model fits. Why is this related? Because the error terms in such a fit should be compared to their null-hypothesis standard deviation.