Verifying Merkle root correctness without completely reconstructing the tree

Question

Lets say Alice has a list of values, and Bob sends Alice a Merkle root that he claims is for this list of values. The Merkle tree construction algorithm is mutually known of course.

Alice can then pick arbitrary values from the list and ask Bob for their Merkle proof.

Lets say Alice wants to avoid constructing the whole tree to verify Bob's Merkle root. How sure can she be that Bob's Merkle root is correct after Bob has supplied N Merkle proofs? In other words what is the probability that Bob has lied about the Merkle root after N queries?

EDIT:

I have an answer in mind but I don't think it's right. Lets say we have a tree with M nodes in total (counting all layers), and each proof in that tree contains N hashes as data points in the proof. If Alice asks for K distinct proofs, then she can be sure the root is correct with a certainty of (N*K)/M.

Seems naive and like this would only be a lower bound though, since there's more to it than that. The tree is made of hashes that have preimage resistance, so it's not a mere question of counting known data points vs. total data points... or is it?

EDIT #2:

Another formulation:

Given a known vector of arbitrary values V and a Merkle root R constructed by building a Merkle tree for V, if I have N Merkle proofs how sure can I be that R was constructed honestly and correctly from V?

Obviously I could rebuild the tree since I have (or can compute) all of V, but lets say I don't want to because it's time consuming etc.

No, just a question that came up in a discussion with a friend of mine about proof of space systems. — Adam Ierymenko, Jun 17 '21 at 18:48
In a second read, this question is missing details. Is Bob allowed add additional values? What if Bob deletes one value and you never asked that? Use counterargument to arrive the result! I wonder what you and your friend thought? — kelalaka, Jun 17 '21 at 18:56
No, Bob can't add values because that would change the Merkle root. Lets say I know or am able to compute all the values in the list. How can I tell that Bob did likewise and constructed his tree honestly? If I ask him for proofs, how sure am I after N proofs? Seemed interesting to us because it could be a form of proof-of-space-time if the list is expensive to construct. — Adam Ierymenko, Jun 17 '21 at 18:59
But Bob calculates the root, not Alice, so during the Merkle-Tree setup, Bob can add/delete/modify values. — kelalaka, Jun 17 '21 at 19:13
Yes. Alice wants to get Bob to prove that all values were included without having to calculate all values or construct the whole tree, so she picks values at random and asks for proofs. Any proof that contains a node that conflicts with the same value in another proof proves that Bob is being dishonest. How many queries to reach, say, a 99% certainty that Bob didn't cheat? — Adam Ierymenko, Jun 17 '21 at 19:19
One more thing that may not have been clear: when Bob sends the root to Alice, he commits to it. He can't change the root so he can't actually add or remove values. What he could have done though is cheat by making an incomplete tree or using a few incorrect values. — Adam Ierymenko, Jun 17 '21 at 22:33

score 1 · Accepted Answer · answered Jun 18 '21 at 03:29

In other words what is the probability that Bob has lied about the Merkle root after N queries?

Well, here is one way Bob could cheat; he could take the list of $M$ values $$V_1, V_2, ..., V_M$$ and instead of forming a Merkle tree based on that, select a value $c$ index and a value $V'_c \ne V_c$, and instead form the list $$V_1, V_2, ..., V_{c-1}, V'_c, V_{c+1}, ..., V_M$$ and form the Merkle tree based on that list (and give Alice the computed root, which is with quite high probability different from the root from the original list).

Then, when Alice gives him the $K$ indices to base the $K$ proofs on, he forms them based on the modified list and the tree he computed. He will be detected as cheating only if one of the indices she asked for happens to be the index $c$; if none of the indices she asked for happened to be for index $c$, then the proofs she would receive are all valid and consistent with each other (as they correspond to a self-consistent Merkle tree, with the revealed leafs being the values she expects)

The probability of Bob being caught cheating, that is, the probability that one of the indices Alice picked is $c$, is $K/M$. Hence the detection probability against any strategy is no more than that (it might be less if there is an even better strategy Bob could employ).

So, to detect Bob cheating with probability 0.99, we have $K > 0.99M$; at that point, it'll actually be less computation for Alice if she were just to recompute the tree herself.

Seemed interesting to us because it could be a form of proof-of-space-time if the list is expensive to construct

Now, looking at the problem as a proof-of-work is a bit more interesting (proof-of-space doesn't work, as the tree and authentication paths can be computed without increasing the amount of computation required significantly); obviously, the cheating strategy I gave above doesn't break that problem, as Bob is doing the full amount of work. If I had more time, I'd try to investigate that more...

Verifying Merkle root correctness without completely reconstructing the tree

1 Answers1