Because both of them are useful.
You explicitly mentioned the square function. Therefore, I want to give some examples. The main idea is that the non-differentiability of $|\cdot|$ is useful in minimization problem.
Estimators
We know that the arithmetic mean $\hat{\mu}=\sum_{i=1}^n x_i$ gives
$$\min_{\mu} \,(x_i-\mu)^2$$
but it is less well-known that the median gives
$$\min_{\mu} \, |x_i-\mu|.$$
Signal Processing
Let's use image processing as an example. Suppose $g$ is a given, noisy image. We want to find some smoother image $f$ which looks like $g$.
The Harmonic L$^2$ minimization model solves
$$-\bigtriangleup f + f = g $$
and it turns out to be equivalent to solving a minimization problem:
$$\min_{f} \,(\int_{\Omega} (f(x,y)-g(x,y))^2 dxdy + \int_{\Omega} |\nabla{f(x,y)}|^2 dxdy).$$
An enhanced version is the ROF model. It solves
$$\min_{f} \,(\frac{1}{2} \int_{\Omega} (f(x,y)-g(x,y))^2 dxdy + \lambda \int_{\Omega} |\nabla{f(x,y)}| dxdy).$$
Notice that for appropriate $\lambda$, these two models only differ by a square. Another remark is that $|\cdot|$ gives the Euclidean norm when the argument is a vector. However, the idea still applies since the norm is non-zero
Model Selection
In classical model selection problem, we are given a set of predictors and a response (in vector form). We want to decide which predictors are useful. One way is to choose a "good" subset of predictors. Another way is to shrink the regression coefficients.
The classical regression model solves the following minimization problem:
$$\min_{\beta_0,...,\beta_p} \sum_{i=1}^n (y_i-\beta_0-\sum_{j=1}^p \beta_j x_{ij})^2$$
The Ridge Regression solves the following:
$$\min_{\beta_0,...,\beta_p} \sum_{i=1}^n (y_i-\beta_0-\sum_{j=1}^p \beta_j x_{ij})^2+\lambda \sum_{j=1}^p {\beta_j}^2$$
, so that larger $\beta_j$ gives penalty.
Another version is Lasso, which solves
$$\min_{\beta_0,...,\beta_p} \sum_{i=1}^n (y_i-\beta_0-\sum_{j=1}^p \beta_j x_{ij})^2+\lambda \sum_{j=1}^p |\beta_j|.$$