So I will use the figure 4 from the paper (CutQC: Using Small Quantum Computers for Large Quantum Circuit Evaluations) as the example, as it is too much to draw out a brand new circuit.

So you have subcir1 and subcirc2. You will run the subcirc1, three times.
- The first time will be as is, with no additional gates and get the probability vector. This circuit run corresponds with the Z and I measurement basis runs.
- This time, add an H gate to the third qubit of subcirc1. This is to measure in the X basis.
- The third time, you will add a Sdg and H gate to measure in the Y basis.
You will run subcirc2 4 times. Each time, you will initialize the subcirc2 with |0$\rangle$, |1$\rangle$, |+$\rangle$ and |i$\rangle$ state. So now you have all the runs of the circuit complete. You should have 7 different probability state vectors complete.
For the first circuit, we only care about the first and second qubit. Not the third. So for a particular state, in the paper they use the example of |01010$\rangle$, the least significant two digits come from subcirc1 and the next three come from subcirc2.
So, in the term P$_{1,1}$, they add the probability of the two states that are relevant for the desired output state. P$_{1,1}$ corresponds with [Tr(AI) + Tr(AZ)] part of term A$_1$ from equation 2. They then multiply this by the P$_{2,1}$ term, which is the probability of the subcirc2, when initialized with |0$\rangle$ on its first qubit. You get the probability of state 010. These two terms can then be multiplied together, and then divided by 2, to get the probability of the state of |01010$\rangle$. This can then be repeated for each of the other states of the full circuit, to get the total probability of the full circuit.
Comment if there was anything specific that was confusing with my explanation or that I am missing.
Best.