I have a question which I suppose is quite basic.
Let's say I want to measure the average of an obersvable which is the sum of non-commuting Pauli strings on $N_q$ qubits:
$$ \langle O\rangle =\sum_i^{N_p}\langle P_i\rangle, $$ where each $P_i$ is a Pauli string by sampling from a circuit some number of times.
Since each shot is independent, if there is one Pauli in the sum, the estimation error $\epsilon = \sigma(O)/\langle O\rangle$ will be $\sqrt{p(1-p)/N_s}$ where $p$ is the probability for obtaining $P_i = 1$ for each shot, and $N_s$ is the number of shots per Pauli.
Furthermore, since the Paulis don't commute, I need to make a change of basis to measure each one separately, which means I need to modify my circuit. Thus, the number of different circuits I will need to execute is $N_p$. Since changing a circuit takes time on quantum hardware, I pay a penalty when increasing $N_p$, and so I was wondering if I can reduce $N_s$ when increasing $N_p$.
Now, my question is: How does $\epsilon$ scale with $N_p$?
And, more specifically, since $\langle O\rangle$ is just a sum of i.i.d binary variables, is it correct to say that the error should actually scale as $(N_sN_p)^{-\frac{1}{2}}$ and not as $(N_s)^{-\frac{1}{2}}$?
For example, if I want to get a 1% error and with one Pauli in the sum I need 2000 shots, is it correct than when I have 10 Paulis I still need to take only 2000 shots overall (200 for each Pauli)?
Note that this argument says the error does not depend on the number of qubits directly, which seems surprising.