8

Using classical shadow(or refer to this post for basic things about classical shadow), we can predict linear functions like $Tr(O\hat{\rho})$ with number of copies(referred paper): $$ 2\log(2M/\delta)*\frac{34}{\epsilon^2}||O-tr(O)/2^n||^2_{shadow}\tag{1} $$

If I want to measure the fidelity of quantum states, that is, if $\rho$ is pure state, and $\hat{\rho}$ is the actual state, and I want to predict the fidelity between $\rho$ and $\hat{\rho}$. The formula for this fidelity is $Tr(\rho\hat{\rho})$ which can easily be deduced from the definition of fidelity $F(\rho,\sigma)=Tr\sqrt{\rho^{1/2}\sigma\rho^{1/2}}$. And we can replace the $O$ in linear function $Tr(O\hat{\rho})$ with $\rho$, so we change the fidelity function into $Tr(\rho\hat{\rho})$, i.e., the quantum state fidelity.

And there is a result in this paper(referred paper), with eq. (S16) said that with some cased, we can realize $||O-tr(O)/2^n||^2_{shadow}\le3Tr(O)^2$, which should be $3Tr(\rho)^2=3$ in my case of predicting the fidelity. So now eq.(1) becomes: $$ 2\log(2M/\delta)*\frac{34}{\epsilon^2}*3 $$ which has nothing to do with the dimension of quantum states? It seems really strange to me, so do I make some mistakes?


References

Predicting Many Properties of a Quantum System from Very Few Measurements Eq.(S13)

narip
  • 2,964
  • 2
  • 9
  • 32

1 Answers1

7

Without really checking your arguments, there is a fundamental reason why the scaling could be fine, but it is still not strange at all.

The point is that you estimate with additive precision, but your fidelity could be exponentially small. E.g. if the fidelity is $10^{-8}$, you need a much smaller error to resolve that compare to a fidelity of, say, 1/2. Thus, estimating with relative precision would be much better, but this would then mean that your sample complexity should scale with the dimension (in the worst case).

Moreover, I think the question is not really about shadow tomography. If $\rho$ is a pure state, then the fidelity is nothing but the Born probability in a projective measurement, say in the computational basis. So this is just estimating probabilities in disguise. Then, something like Hoeffding's inequality would give you an identical scaling in error and confidence level, namely $$ N \geq \frac12\log(2/\delta) \varepsilon^{-2}, $$ Note that this does not depend on the dimension! But still, this is only an additive estimate.

Markus Heinrich
  • 4,882
  • 8
  • 17
  • 2
    +1. Hoeffdings inequality does recover roughly the same scaling if one assumes that the random variable $X=\text{tr}(O \hat{\rho})$ is bounded, but that the authors instead used a "median-of-means" estimator which just exchanges information about the bounds of $X$ for knowledge of $\text{Var}[X] \leq \lVert O \rVert_{shadow}$. But there's no significant difference in intuition about the concentration behavior. – forky40 Oct 11 '21 at 16:42
  • 1
    @forky40 Sure, I know. My point was that the same scaling occurs when one wants to estimate probabilites (which are obviously bounded, as is the fidelity). So there is nothing odd about the non-appearance of the dimension here. – Markus Heinrich Oct 12 '21 at 05:46
  • @MarkusHeinrich What does the additive precision mean? – Sherlock Oct 19 '21 at 14:34
  • 1
    @Sherlock Additive error means that your estimate, say $\hat X$, is known to be at most $\varepsilon$ away from the true value $X$, i.e. $| \hat X - X | \leq \varepsilon$. In contrast, estimating with relative/multiplicative error means that $| \hat X - X | \leq \varepsilon |X|$. The second notion is clearly much stronger. – Markus Heinrich Oct 20 '21 at 07:05
  • 1
    Markus Heinrich I see. @forky40 If the author of the post's statement is correct, I can choose $O=\begin{pmatrix} 1&0\0&0\end{pmatrix}$ to get $Tr(\rho O)=\rho_{11}$, and $O=\begin{pmatrix}0&0\0&1\end{pmatrix}$ to get $\rho_{22}$, and choose $O=\sigma_x$ and $O=\sigma_y$ to get $\rho_{12}$ and $\rho_{21}$, which will only need Order of $\log d$ samples, with $d$ stands for the dimension of the density matrix. But as far as I know, doesn't the best full tomography needs $rand(\rho)d$? – Sherlock Oct 22 '21 at 06:24
  • @Sherlock Why do you think this is $O(\log d)$? If you estimate the matrix entries of a state $\rho$ directly, you obviously need $d^2$ measurements as there are $d^2$ entries ... here $4=2^2$. For every measurement, you might get away with a $d$-independent number of samples, but still, you have $d^2$ measurements! – Markus Heinrich Oct 22 '21 at 07:27
  • @MarkusHeinrich But in the post of the author, he mentioned that the total sample complexity is $2\log(2M/\delta)\frac{34}{\epsilon^2}||O-tr(O)/2^n||^2_{shadow}$, and we can choose only $d^2$ kind of $O$, so replace $M$ with $d^2$, we can get $4\log(2d/\delta)\frac{34}{\epsilon^2}||O-tr(O)/2^n||^2_{shadow}$. – Sherlock Oct 22 '21 at 08:36