I'll look at this question only from the perspective of classical propositional logic and classical first-order logic, not taking into account all the other kinds of exotic logics out there.
As you said, a formal language $\mathcal L$ is a subset of the set of all sentences over an alphabet $\Sigma$, which you commonly write as $\Sigma^*=\bigcup_{n\in\mathbb N}\Sigma^n$.
In the case of classical propositional logic we want to reason about formulas, that is strings over a certain alphabet, made up from Boolean connectives using a set of atomic propositional variables, say $Var=\{p_0,p_1,\dots\}$. For this, we may define an alphabet $\Sigma_p=Var\cup\{(,),\land,\lor,\neg\}$ using the variables and a bunch of auxiliary symbols we want to use later in our formulas. The question now remains how we ´´filter´´ the correct formulas like $(p_0\lor p_1)$ about which we would like to reason later from the whole set $\Sigma_p^*$ which contains a lot of nonsensical strings, e.g. $((\lor\neg$.
In other words, we may phrase the question as how to obtain the language of classical propositional logic, say $\mathcal L_p$, from the set of strings $\Sigma_p^*$. This can for example be achieved by specifying $\mathcal L_p$ to be the smallest subset $X$ of $\Sigma_p^*$ to satisfy the following closure conditions.
- $Var\subseteq X$,
- If $\phi,\psi\in X$, then $(\phi\land\psi),(\phi\lor\psi),\neg\phi\in X$.
There are of course a lot of other ways to define a language for classical propositional logic. On the basis of this language however, we can the move on to specify things like semantical and syntactical consequences (and all the other things commonly studied in logic).
Note, that you may as well add other logical symbols like $\rightarrow$, $\leftrightarrow$ to the alphabet and language over additional clauses for specifying $\mathcal L_p$. The way I chose was to restrict myself to some subset of connectives sufficient to semantically (if we later have such a thing as a semantics) specify the other connectives by abbreviations.
We may proceed in a similar way for classical first-order logic. In a similar spirit, we define a base alphabet (which is now a little more complex).
For this, let $\sigma=\sigma_f\cup\sigma_r\cup\sigma_c$ be a so called signature, the set of function ($\sigma_f$), relation ($\sigma_r$) and constant symbols ($\sigma_c$) in the first_order language we want to define. We define a function $\mathrm{ar}:\sigma\to\mathbb N$ (the so called arity function, telling you, well, what arity the symbols have). Every first-order language is relative to such a signature. We commonly set $\mathrm{ar}(c)=0$ for $c\in\sigma_c$, i.e. constant symboles have an arity of $0$.
We also fix a set of variables $V=\{x_0,x_1,\dots\}$. Then, we can define the alphabet of first-order logic over the signature $\sigma$ as
$$\Sigma_\sigma=V\cup\sigma\cup\{(,),\neg,\lor,\land,=,\forall,\exists\}$$
where it is also sometimes customary to leave out the equality symbol.
Again, we now pose the question of how we can distill the desired language of first-order logic over $\sigma$, say $\mathcal{L}(\sigma)$, from the set of all strings $\Sigma_\sigma^*$.
For this, you commonly go on to specify what you mean by terms over the signature $\sigma$ and then define the formulas in $\mathcal{L}(\sigma)$ from that.
Let me know in the comments if I shall elaborate on that.