The usual formal approach in textbooks is to treat syntactic classes such as formulas or terms as subsets of the set of all strings (lists) of symbols drawn from a small vocabulary (yes, the set of all lists is a free monoid, but that fact doesn't add much value in this context: you just need to know that you can construct new lists from old by prefixing or postfixing a list with a symbol or by concatenation of two lists). The subset is typically defined by saying that is the smallest set closed under certain constructions.
So, for example, we might have a vocabulary comprising variables $x_1, x_2, \ldots$, a single (binary) operator symbol $+$ together with brackets and commas as punctuation symbols. We could then define the set $\cal T$ of all terms to be the smallest set of strings of these symbols that:
- Contains each string "$x_i$" comprising a single variable.
- Contains "$(t_1+t_2)$" whenever it contains $t_1$ and $t_2$ (here I have taken $t_1$, prefixed it with "(", postfixed with "+", concatenated it with "t_2" and then postfixed with ")").
Here, for readability I am letting $\LaTeX$ show the strings with spaces and these are to be ignored. So "$x_1$", "$x_2$" and "$(x_1+x_2)$" are all terms as is "$((x_1+(x_2+x_3))+x_4)$", but "$+x_1$", "$x_1+x_2$", "$((x_1+x_2))$" and $(x_1+x_2)+x_3$" are not.
Typically, formal reasoning about syntactic classes is done at the lowest level by complete induction on the length of the strings. With the above definition, an important property you might want to prove of $\cal T$ is that any term $t \in \cal T$ is either a single variable or has the form $(t_1+t_2)$ for some uniquely determined $t_1, t_2 \in \cal T$. (The definition is very precise about the placement of brackets to ensure this.) Having proved a result like that, you know that every term is either atomic (a variable) or is uniquely represented by combining two sub-terms with the operator symbol (together with some brackets that are just there to make things unambiguous). So you can prove properties of terms at a slightly more abstract level by induction over the structure of a term: i.e., by showing that a property holds of atomic terms and holds of any term if it holds of its sub-terms.
As has been suggested in the comments, in an introductory course, you may be able to skip over some of these details, e.g., just by assuming the more abstract point of view, where a term can be viewed as a tree with leaves labelled by atomic symbols and nodes labelled by operator symbols (and this is how you would represent syntax in a computer implementation of formal syntax). With such an approach you would use informal linear representations of the trees like "$(t_1 + t_2) + t_3$" where the brackets are just there to show the order of construction of the tree. You would also very likely adopt conventions for leaving those brackets out, as we do in arithmetic and most programming languages.
[Disclaimer: I have allowed the set of atomic symbols in my example to be infinite: some authors would object to that and require me to work with a finite vocabulary, so that my infinite set of variables would be defined as strings of of symbols of some particular form (just as happens in the definitions of programming languages.]