Your question seems to imply that least squares regression is the only method to fit a linear model. As mentioned in other answers, there are other perfectly legitimate methods that can be used to fit a linear predictor. A common thread of these methods is that they are tractable, i.e.: there are concrete steps that can be taken to find the actual solution (or rather, an approximation to the solution within some acceptable tolerance).
Tractability is not an intrinsic property of the method. What is tractable at any given time depends on the state of technological developments. If some day quantum computing becomes part of standard technological development, then the list of tractable methods will be greatly expanded.
In times of Gauss and Euler, the list of tractable methods was far more limited than our current list, and least squares was a technological advance with lasting consequences.
A second important quality of whatever method one chooses is effectiveness. Gauss use of least squares helped him make important accurate predictions in the context of astronomical observations. I speculate that faced with Gauss success, researchers wanted to know how he did it, rather than why he did what he did.
A third feature of fitting methods that tilts the scale in favour of least squares is interpretability. We seek models to abstract patterns from observations, so that we can understand differences and make predictions. The theoretical framework of least squares provides guidance in model building. Lately I had the chance to apply minimization of the sum of the absolute values of the error (with a quadratic penalty on the size of the parameters), known as LASSO. The model was selected by crossvalidation. None of the niceties regarding significance of the coefficients, which are at the core of least squares, are immediately available. For some time, much of observational sciences consisted of finding statistically significance in model parameters, because the producing a model was the goal of data analysis. Of late, there has been an increasing interest in using models to prediction expected outcomes, which have resulted in the addition of various model fitting methods to the analyst tool-box.
Since I digressed, I will summarize my answer:
* least squares regression is not the only method in use to fit linear models
* the methods in use are those that are tractable (can be implemented), effective (solve the problem at hand), yield interpretable results (when the need arises)
As for the popularity of least squares:
* from the mid 1700's to time of wide-spread availability of computing machines, least squares regression was the state of the art in linear model fitting (disregard the objections of the Bayesians, they had conjugate pairs, but not until the late 20th century they could handle more general parameter priors)
* least-squares regression, when it assumptions are met, provides a framework that can be use for guidance in model building
Now I'll digress again by addressing objections:
Objector: ...but least squares minimizes a function that is differentiable..
Answer1: So? Convex minimization is well developed, numerical methods are available
Answer2: it is 2016, enough with eighteenth century technology
Objector: ...but p-values, where art thou?
Answer: if you need p-values to publish, then use least-squares. You can also use other methods of model fitting, and estimate the distribution of parameter estimates through, for example, bootstrapping. If what you need are predictions, then you need not worry about p-values. Use statistical methods to ensure your models are stable, and the results reproducible. The importance of p-values in the scientific literature has been overplayed, either by dishonesty or ignorance. The loss of significance or strength of relations in successive repetitions of many experiments is a well documented fact, cause by p-value significance driven models.
Objector: ... but all the hordes trained in least squares...?
Answer: (speechless)
Objector: ...but should we dispose of least-squares in our model building pursues?
Answer: No. There is nothing intrinsically wrong with least-squares. It applies when the hypothesis underlying the method hold (namely, Gaussian distribution of the remainders, iid observations), and in any case, least squares gives the BLUE, which is all you often need.
Hope that helps. thanks for the question.