This reference to sets in "the smallest set $X$ such that..." can easily be circumvented.
You can define a proposition as any assembage that appears in a sequence of assemblages $A_1, A_2, \dots, A_n$, in which each assembage is formed from the preceding ones in certain well-defined ways, as in Definition 2.1.2. Then whenever you need to use a proposition in your formal language, you can require that a suitable finite sequence of assemblages first be exhibited.
The problem is more substantial when you seek to prove propositions about assemblages, propositions, formal proofs, etc. In this case, you must decide whether the meta-theory you are working in is some form of set theory, which is easiest for purposes of exposition, or some form of arithmetic. Much of what is proved in logic about formal proofs, etc., could be formulated in fairly weak systems of arithmetic (where integers correspond via an effective bijection to assemblages). The main problem is that few people have the stomach for the details involved, and certainly, no one would want to inflict all of that on a beginner.
Edit: Your comments clarify what your concerns are. Before answering the questions, I think it would be clearest if I talked a little bit about the ways you can look at a system of logic as a foundation for mathematics.
In the most basic use of logic, you are merely setting up a system of symbols to formalize the statements and proofs of ordinary mathematical argument, say with set theory as a foundation. This is a purely syntactic level.
You have a certain intuitive conception of what the statements are meant to represent, and you need to check the following two things. First, you want to check that you consider the methods of deduction in your system to be valid, i.e. that the system is sound. That is, applied to true statements, they should result only in true statements. Second, you want to check that all the forms of argument ordinarily considered valid in mathematics have counterparts in the formal deductive system, i.e. that the system is complete.
At this purely syntactic stage, you have no formalized concept of the "meaning" of a statement. Therefore, the two points above cannot be proved in any rigorous sense. Soundness can be checked only by convincing yourself that, according to your intuitive sense of what the statements are supposed to "mean," the rules of deduction are acceptable. Completeness, at this stage, can be thought of as experimental in the sense that no one has observed a mathematical proof regarded (by humans) as valid that could not, with sufficient effort, be formalized in the system.
Of course, the forms of reasoning humans consider valid are based on our innate conceptions of logic.
Now let's think about your concern about a vicious circle. At this stage, you have a formal system of proof that you're satisfied with. There's nothing to stop you now from starting to use it to prove theorems of set theory. You don't really need to know any metatheorems to make use of it this way, so there can be no vicious circle.
In practice, you may need to prove a few metatheorems to convince yourself of the intuitive soundness and completeness of your system. Mostly, these will consist of various shortcuts that you can employ in formal proofs.
All of these basic metatheorems can be formalized in a fairly weak system of arithmetic (including some basic form of induction). But if your purpose is really to convince yourself of the two points I mentioned, whose content is intuitive anyway, this shouldn't matter, since all you need as you go along is to check that the metatheorems use no more than intuitively obvious facts about arbitrary finite sequences of characters written down on paper.
The next stage in developing logic is semantic, in the sense that you are concerned with giving a formal definition of the "meaning" of statements and proving metatheorems including, at minimum, formal equivalents of the two points above.
Let's think about what's involved at this stage. For the sake of simplicity, let's say that, instead of set theory, you're concerned with groups or, more generally, with structures consisting of a bunch of objects and a binary operation on them.
It won't have escaped your attention that the word "bunch" above is a cop-out to avoid using the word "set." There's really no way around the fact that a group is bound to be, in some sense at least, something like a set, as is the group operation, which is a "bunch" of triples.
Even set theory, ultimately, is about a "bunch" of objects called sets and a binary relation denoted $\in$ between them, itself a "bunch" of pairs of objects.
I think this makes it clear that the natural medium in which to formulate semantic questions is going to be akin to some version of set theory.
How is this done? Let's return to the example of groups. Now some logicians, for good reasons, like to consider a group to be a set $G$ together with a binary operation (representing the group operation), a unary operation (representing the taking of inverses) and a constant (representing the unit element). But to keep things simple, let's say that the language of groups consists instead of just a binary operation. (There are some technicalities here regarding the status of the binary relation $=$, but they are of no real consequence. For definiteness, we'll assume that the symbol $=$ is interpreted by the actual equality relation in $M$.)
Now you define a structure ${\cal M} = (M,\cdot)$ in the language of groups to be a set $M$ together with a binary operation $\cdot$ on $M$. (I'll just write $M$ for $\cal M$ below.) You define, by induction on the complexity of a formula $\varphi(v_1, \dots, v_n)$ with free variables $v_1, \dots, v_n$, the concept of such a structure $M$ satisfying $\varphi$ when $v_1, \dots, v_n$ are replaced with particular elements $x_1, \dots, x_n$ of $M$.
A statement is just a formula $\varphi$ with no free variables, and a theory is a set $T$ of statements. For example, $T$ could be the set of axioms for groups. A structure $M$ is a model of $T$ if it satisfies each statement in $T$.
At this point we can begin to think about how to formalize the two points from before. Let a theory $T$ and a statement $\varphi$ be given. Then we'll say that $\varphi$ is a syntactic consequence of $T$ if there there is a formal proof (according to your system) of $\varphi$ using only axioms in $T$ as assumptions. We'll say that $\varphi$ is a semantic consequence of $T$ if every model of $T$
also satisfies $\varphi$.
Now the soundness of your system of proof means that whenever a statement $\varphi$ is a syntactic consequence of a theory $T$, it is also a semantic consequence of $T$. The completeness of the system is the converse, namely that whenever $\varphi$ is a semantic consequence of $T$, it is also a syntactic consequence of $T$.
Gödel's completeness theorem states that certain well-known systems of proof, which are fairly easily shown to be sound, are also complete. In other words, if you choose one of these systems, then you have a good system of formal proofs that allows you to deduce a statement if and only if you ought to be able to deduce it.
Now if we switch from group theory to set theory, we'll say that a structure in the language of set theory is a set $M$ together with a binary relation $E$ on $M$, which is meant to represent set membership (but which need not be anything like the actual $\in$ relation between elements of $M$). If the pair $(M,E)$ satisfies, say, the axioms of ZFC, then we'll say it's a model of ZFC.
So here you have to face the reality that in the universe you're working in (at Level 2), you are using some version of set theory, and some of the formal objects (at Level 1) within your theory are models $(M,E)$ of some version $T$ of set theory (perhaps quite a different one), with $T$ also being a formal object of the theory. It can be a bit disconcerting to manipulate similar objects at Levels 1 and 2 simultaneously.
Now I'll try to answer your questions.
- I can use naive set theory as my meta theory to develop Mathematical logic, right? (In this way I would be able to use sets, functions, relations, and other “structures” because they come from it)
The answer is yes. But think about what you mean by "develop mathematical logic."
If what you mean is to set up the basic syntactic system in which you will prove mathematical statements, then you need, at most, some very obvious statements about finite sequences of characters on paper. While set theory is a convenient language in which to discuss this, it isn't necessary.
If what you want to do is more elaborate, for example to prove rigorously various facts about what can or can't be proved from particular sets of axioms, then the specific set of axioms of your metatheory may be very relevant. For example, if your metatheory is ZFC + (There exists a strongly inaccessible cardinal), then you can prove that the theory ZFC is consistent (by exhibiting a model of ZFC). But if your metatheory is ZFC, then either your metatheory is inconsistent or your metatheory cannot prove that the theory ZFC (a formal object within the metatheory) is consistent. (This is a consequence of Gödel's incompleteness theorem.)
- Back to sets: We already use some ideas from logic to work in naive set theory.. wouldn’t that be circular?
I think what you mean here is that, for example, we use the idea of conjunction in defining the intersection of two sets, etc. Please correct me if I'm wrong.
This is not a problem because such uses of conjunction ultimately amount to using the conjunction symbol in certain places in formal proofs. As long as you are obeying the rules of formal proofs, including the rules for manipulations involving the conjunction symbol, you are following the rules of the game.
Of course, what you are really doing is carrying out arguments which, psychologically speaking, are in the semantic realm, but which could be reduced with some work to formal syntactic proofs.
- I was intrigued by the ”arithmetic way”. Can you please give an example of how that can be done? Or some literature?
With respect to syntax, the arithmetization of formal proofs will necessarily be discussed in any source that presents a detailed proof of Gödel's incompleteness theorem. Two possible sources would be Mathematical Logic by Cori and Lascar, or Fundamentals of Mathematical Logic by Hinman. The theory of recursive functions is fundamental for this.
On the other hand, I'm just not familiar at all with the ways in which semantic logical concepts (most naturally developed within set theory) can be transposed into arithmetic. For example, I don't know how complex a system of arithmetic you would need in order to formulate and prove meaningful versions of Gödel's completeness theorem. In any case, I anticipate that this area must be highly technical. If all you care about is soundness, I suspect that you can get by with less. Once you've learned enough logic (using set theory as your metatheory), you could ask a new question specifically about this.
The bottom line, then, is that there is no vicious circle if all you want to do is use your system of formal proofs to write down mathematics. However you must take a certain amount of mathematics (a subset of ZF) on faith if you wish to prove that your system is sound and complete.