As you pointed out, to prove Cantor-Schroeder-Bernstein theorem, one needs to prove the following lemma:
If $A_1 \subset B \subset A$ and $|A_1|=|A|$, then $|B|=|A|$.
According to the hypothesis, there exists some one-to-one mapping $f$ from $A$ onto $A_1$. So, we need a one-to-one mapping $g$ from $A$ onto $B$. How can we find that?
One may think that we should embed the set $B$ in the set $A$ by inclusion so that we find the required mapping as the inverse of the inclusion mapping, that is, the mapping$$h:A \to B \\ h(x)=x.$$This mapping is one-to-one, but it cannot be defined on the whole set $A$ because for $x \in A-B$ $h(x)=x$ is not contained in the set $B$.
One may think that we should use the mapping $f$ as our required mapping, that is,$$k:A \to B \\ k(x)=f(x).$$This mapping is one-to-one, but it cannot cover the whole set $B$ because the range of the function $f$ is the set $A_1 \subset B$ and there may be some $x\in B$ which is not in the set $A_1$.
As we see there are two extreme approaches for finding the required mapping; the first one covers the whole set $B$, and the second one covers the whole set $A$. So, it is intuitively expectable that to find the required mapping we should use both the mappings (approaches) simultaneously so that both the sets $A$ and $B$ are covered in a one-to-one manner.
But, there exists a problem. If we use both the mappings simultaneously, for example $g: A \to B \quad g(x)=\begin{cases}f(x) & \text{if } x\in C; \\ x & \text{if } x \in A-C \end{cases}$ for some subset $C \subset A$, then we may miss either the onto property or the one-to-one property, because the ranges of the pieces of $g$ may overlap.
So, our original problem is reduced to finding some subset $C \subset A$ such that the ranges of the pieces of the mapping are disjoint and the union of them is equal to the set $B$. Now, how to find such a $C$?
Here is an idea. Since the function $h(x)=x$ cannot be defined on $C_0=A-B$, as explained above, let us map this subset of $A$ by the function $k(x)=f(x)$ into the set $B$. So, we obtain the mapping$$g_0(x)= \begin{cases}f(x) & \text{if } x \in C_0; \\ x & \text{if }x \in A-C_0 \end{cases}.$$But, we have missed the one-to-one property because the ranges of the pieces overlap (In fact, $f[C_0]$ is contained in the range of the second one, since $f[C_0] \subset f[A] \subset A-C_0$).
So, we need to remove the problematic points $C_1=f[C_0]$ from the domain of the second piece (since the domain and the range of the function $h(x)=x$ are the same) to retain the one-to-one property. However, since we need to define the mapping $g$ on the whole set $A$, we need to add such points to the domain of the first piece. So, we obtain the mapping$$g_1(x)= \begin{cases}f(x) & \text{if } x \in C_0 \cup C_1; \\ x & \text{if }x \in A-(C_0 \cup C_1) \end{cases}.$$But, we have missed the one-to-one property because the ranges of the pieces overlap (In fact, $f[C_1]$ is contained in the range of the second one, since $f[C_1]=f^2[C_0] \subset f^2[A] \subset A-(C_0 \cup C_1)$).
So, we need to remove the problematic points $C_2=f[C_1]$ from the domain of the second piece (since the domain and the range of the function $h(x)=x$ are the same) to retain the one-to-one property. However, since we need to define the mapping $g$ on the whole set $A$, we need to add such points to the domain of the first piece. So, we obtain the mapping$$g_2(x)= \begin{cases}f(x) & \text{if } x \in C_0 \cup C_1 \cup C_2; \\ x & \text{if }x \in A-(C_0 \cup C_1 \cup C_2) \end{cases}.$$
$$\vdots \qquad \vdots \qquad \vdots$$
But, we have missed the one-to-one property because the ranges of the pieces overlap (In fact, $f[C_{n-1}]$ is contained in the range of the second one, since $f[C_{n-1}]=f^n[C_0] \subset f^n[A] \subset A-(C_0 \cup C_1 \cup \cdots C_{n-1})$).
So, we need to remove the problematic points $C_n=f[C_{n-1}]$ from the domain of the second piece (since the domain and the range of the function $h(x)=x$ are the same) to retain the one-to-one property. However, since we need to define the mapping $g$ on the whole set $A$, we need to add such points to the domain of the first piece. So, we obtain the mapping$$g_n(x)= \begin{cases}f(x) & \text{if } x \in C_0 \cup C_1 \cup \cdots \cup C_n; \\ x & \text{if }x \in A-(C_0 \cup C_1 \cup \cdots \cup C_n) \end{cases}.$$
$$\vdots \qquad \vdots \qquad \vdots$$
This pattern motivates us to define the mapping $g$ as follows.$$g(x)=\begin{cases}f(x) & \text{if } x \in C; \\ x & \text{if } x \in A-C \end{cases}, \qquad C= \bigcup_{n=0}^{\infty }C_n$$Noting that$f[C]=\bigcup_{n=1}^{\infty }C_n$, we can easily see that the mapping $g$ is one-to-one because each of its pieces is and the ranges of the pieces are disjoint and it is onto the set $B$.
Addendum
Looking at how the $C_n$'s are constructed, one may think that the existence of the set $C$ (and so the proof of the theorem) relies on the existence of some infinite set like $\mathbb{N}$ to be able to define the sets $C_n$'s recursively. However, in this section we show that such a view is not correct.
In fact, to obtain the bijective mapping $g$, we need some sets $C$ such that the values of the function $f$ at the points of $f[C]$ do not lie outside of $f[C]$. The existence of such a set can be guaranteed by applying some fixed-point theorem (Knaster-Tarski Theorem) to some monotone function of sets, as follows.
Let $F: \mathcal{P}(A) \to \mathcal{P}(B)$ be monotone, i.e., if $X \subset Y$, then $F(X) \subset F(Y)$ ($\mathcal{P}(A)$ is the power set of $A$). Consider the set $T= \{ X \subset A \mid F(X) \subset X \}$. It can be easily seen that $\overline{X}=\bigcap T$ is the least fixed point of $F$ (Proof: $A \in T$, so $T \neq \varnothing$ and so $\overline{X}=\bigcap T$ can be defined. Since $F$ is monotone and for any $X \in T$ we have $\bigcap T \subset X$, $F(\overline{X}) \subset F(X)$ for every $X \in T$, so $\overline{X} \in T$.
Since $F$ is monotone and $F(\overline{X}) \subset \overline{X}$, we have $F(F(\overline{X})) \subset F(\overline{X})$, so $F(\overline{X}) \in T$.
However, since $\overline{X} \subset X$ for every $X \in T$, we have $\overline{X} \subset F(\overline{X})$. Thus, $F(\overline{X})=\overline{X}$.
If $F$ has some other fixed points $X'$, i.e., $F(X')=X'$, then $X' \in T$. Since $\overline{X} \subset X$ for every $X\in T$, we conclude that $\overline{X}=\bigcap T$ is the least fixed point of $F$).
Consider the function $F(X)=(A-B)\cup f[X]$. Clearly, it is monotone, so the set $C=\overline{X}$ defined above is its least fixed point.
Now, we can easily see that the mapping $g:A \to B$ defined by$$g(x)=\begin{cases}f(x) & \text{if } x\in C; \\ x & \text{if } x \in A-C \end{cases}$$ is one-to-one and onto the set $B$ (We only need to note that $$\begin{align}f[C] \cup (A-C) & =f[C] \cup (A-((A-B) \cup f[C])) \\ & = f[C] \cup ((A-(A-B)) - f[C]) \\ & =f[C] \cup (B-f[C]) \\ & =B \end{align}$$(Please note that in the above calculation we have used the fact that $f[C] \subset A_1 \subset B \subset A$) and$$\begin{align}f[C] \cap (A-C) & =f[C] \cap (A-((A-B) \cap f[C])) \\ & = f[C] \cap ((A-(A-B)) - f[C]) \\ & =f[C] \cap (B-f[C]) \\ & = \varnothing \end{align}$$(Please note that in the above calculation we have used the fact that $f[C] \subset A_1 \subset B \subset A$)).
Now, the least fixed point of the function $F$ can be obtained recursively as follows.
Clearly the function $F$ is continuous, meaning that for any nondecreasing sequence of subsets of $A$, $\langle X_i \mid i \in \mathbb{N} \rangle$, $X_i \subset X_j$ whenever $i \le j$, we have$$F \left ( \bigcup_{i \in \mathbb{N}}X_i \right ) = \bigcup_{i \in \mathbb{N}} F \left ( X_i \right ).$$Let us define recursively $X_0=\varnothing$, $X_{i+1}=F(X_i)$ and then define $\overline{X}=\bigcup_{i \in \mathbb{N}}X_i$. Clearly, the $\langle X_i \mid n \in \mathbb{N} \rangle$ is a nondecreasing sequence of subsets of $A$. So we have$$\begin{align}F \left ( \bigcup_{i \in \mathbb{N}} X_i \right ) & = \bigcup_{i \in \mathbb{N}}F(X_i) \\ & = \varnothing \cup F(X_0) \cup F(X_1) \cup \cdots \\ & = X_0 \cup X_1 \cup X_2 \cup \cdots \\ & = \bigcup_{i \in \mathbb{N}} X_i. \end{align}$$Thus, $\overline{X}=\bigcup_{i \in \mathbb{N}}X_i$ is a fixed point of $F$.
Now, if $X'$ is another fixed point of $F$, since $F$ is monotone and $\langle X_i \mid i \in \mathbb{N} \rangle$ is a nondecreasing sequence of subsets of $A$, we have$$\varnothing \subset X' \quad \Rightarrow \quad X_1=F(\varnothing ) \subset F(X')=X' \\ X_1 \subset X' \quad \Rightarrow \quad X_2=F(X_1) \subset F(X')=X' \\ \vdots \qquad \vdots \qquad \vdots \\ X_{n-1} \subset X' \quad \Rightarrow \quad X_n =F(X_{n-1}) \subset F(X')=X' \\ \vdots \qquad \vdots \qquad \vdots$$So, $\overline{X}=\bigcup_{i \in \mathbb{N}} \subset X'$. Thus, $\overline{X}$ is the least fixed point of $F$.
Hence, we conclude that the fixed point of the function $F(X)=(A-B) \cup f[X]$, $C$, must be of the form$$\begin{align}C & =(A-B) \cup ((A-B) \cup f[A-B]) \cup ((A-B) \cup f[A-B] \cup f[f[A-B]]) \cup \cdots \\ & = C_0 \cup (C_0 \cup C_1) \cup (C_0 \cup C_1 \cup C_2) \cup \cdots \\ & = \bigcup_{i \in \mathbb{N}}C_n\end{align}$$(Please remember that $f$ is injective), which was already obtained from our original argument.
Therefore, the existence of the set $C$ can be confirmed without needing existence of some infinite set like $\mathbb{N}$. Now, if $A$ is a finite set, then $B$ must be equal to $A$ and so $C=A-B=\varnothing$. But, if $A$ is an infinite set (so the existence of some infinite set has been already assumed in our theory), then the set $C$ is constructed from an infinite chain of sets, as explained above.