Byzantine fault-tolerant consensus - Why 33% threshold

Question

Can somebody tell me why in dbft the threshold is 1/3 for malicious nodes? I mean that seems pretty abitrary. I have no problem if it is abitrary but is it?

Note: I'm not referring to Bitcoin. I chose this Forum, because there is no game-theory or Antshares/Neo-forum.

David Schwartz · Accepted Answer · 2017-08-30T13:57:38.780

23

We have a mathematical proof that to tolerate n malicious nodes, you need 2n + 1 good nodes. The full proof is found in G. Bracha and T. Rabin, Optimal Asynchronous Byzantine Agreement, TR#92-15, Computer Science Department, Hebrew University. It's also well known in the industry. It is not possible for an asynchronous system to provide both safety (the guarantee that all non-malicious nodes will eventually agree on what progress was made) and liveness (the ability to continue to make forward progress) with more than this number of malicious failures.

You can trivially ensure safety by simply making no forward progress at all. And you can trivially make forward progress unsafely by just letting each node do whatever they want. Neither of these modes of operation are useful.

Let's take a step back to make this answer more helpful:

Why do you need a distributed agreement algorithm at all? Well, you need one in cases where there is more than one way a system could validly make forward progress and you need all the participants in the system to agree on which one of them.

Consider a simple example: I have $10 in the bank, and I write two $10 checks, one to Alice and one to Bob. Either one alone is valid, but we can't let them both go through.

If we had a central authority, they could just clear whichever one they saw first. But what if we don't want a central authority or don't want a single point of failure? And what if we have potentially malicious participants?

Well, you could just sort the checks after representing them as binary data. But that's where the asynchronous component bites us. When do we sort them? Say I see both checks and sort them. How do I know that one second later I won't see a third check that sorts first? And maybe someone else already saw that one. Ouch!

So, we have the following requirements:

1) Our system is asynchronous.

2) Some participants may be malicious.

3) We want safety, that is, we do not want one honest participant honoring one check and one honest participant honoring the other.

4) We want liveness, that is, it's not fair just saying we never clear any checks. Sure, that's safe, but not useful. We want to be sure that we eventually agree on which checks to clear.

So, now the question arises -- how many dishonest partcipants can we tolerate in our asynchronous system and still guarantee both safety and liveness?

As a simple way to get the gist of the proof, though it is not rigorous:

Suppose we have n nodes of which h are honest and d are dishonest. Obviously, n = h + d. Now the system needs to come to consensus on which of two checks to clear.

Think about the case where all the honest nodes are evenly split about the two directions the system could make forward progress. The malicious nodes could tell all the honest nodes that they agree with them. That would give h/2 + d nodes agreeing on each of two conflicting ways the system could make forward progress.

In this case, the honest nodes must not make forward progress or they will go in different directions, losing safety. Thus, the number of nodes required to agree before we can make forward progress must be greater than half the number of honest nodes plus the number of malicious nodes, or we lose safety.

If we call t the threshold required to make forward progress, that gives us: t > (h/2) + d. This is the requirement for safety.

But the malicious nodes could also fail to agree at all. So the number of nodes required to agree before we can make forward progress must be no more than the number of honest nodes or we lose liveness.

This gives us t <= h. Or h >= t. This is the condition for liveness.

Combining the two results, we get:

h >= t > (h/2) + d
h > (h/2) + d
(h/2) > d
d < (h/2)

Thus the number of faulty nodes we can tolerate is less than half the number of honest nodes. Thus we cannot tolerate 1/3 or more of the nodes being dishonest or we lose either safety or liveness.

edited Aug 30 '17 at 13:57

answered Aug 30 '17 at 12:57

David Schwartz

51,554
6
106
178

Can you provide a link? Your description could not clearify it for me. I also do not know what is meant by directions. In a peer to peer network there is no direction. – Ini Aug 30 '17 at 13:08
What do you want to tell me with that? – Ini Aug 30 '17 at 13:13
yes. I was asking for a link to "G. Bracha and T. Rabin, Optimal Asynchronous Byzantine Agreement, TR#92-15, Computer Science Department, Hebrew University". "Think about the case where all the non-faulty nodes are evenly split about two directions the system could make forward progress." I do not know what is a DIRECTION in a peer to peer system. – Ini Aug 30 '17 at 13:16
Can you maybe assign a number or letter to the different kinds of non faulty nodes in your example. In your example one half does something else then the other half but you do not assign different characters or so to them, which makes it pretty hard do understand. – Ini Aug 30 '17 at 13:30
I cannot figure out what this sentence should mean "Think about the case where all the non-faulty nodes are evenly split about two directions the system could make forward progress." If the non-faulty node goes in a other direction then another non-faulty isn't it then a faulty node? – Ini Aug 30 '17 at 13:34
Why would they disagree? Because they received a different message from a speaker? Like in Antshares examples? https://github.com/neo-project/docs/blob/master/en-us/node/consensus.md – Ini Aug 30 '17 at 13:41
Ok I'll study that abit now. Give me some time. Maybe I'll have another question, but I'll confirm this as the right answer after I thought abit about it. – Ini Aug 30 '17 at 13:44
how is forward progress defined? Is it any progress regardless of whether it is disruptive or not? Can you also define the terms safety, liveness? Maybe with a reference to a speaker scenario like in Antshares? – Ini Aug 30 '17 at 13:51
Is that a correct definition of forward progress: forward progress is made after reaching consensus is reached. ? – Ini Aug 30 '17 at 14:02
1

@Invader For formal proof purposes, forward progress is usually defined as ruling out at least one future state of the system that was previously possible. For purposes of a particular algorithm, it's usually defined in an algorithm specific way such as confirming one transaction or whatever. – David Schwartz Aug 30 '17 at 14:13
In the example where faulty nodes agreee with all non-faulty nodes (which go in two directions) safety is not there. Because half of the non-faulty nodes have not agreed with the other half of non-faulty nodes. "safety - the guarantee that all non-malicious nodes will eventually agree on what progress was made". Can you explain, what I did not grasp there? What happens if the non-faulty nodes do not agree? The system would make forward-progress anyway or not, because the threashold is reached? – Ini Aug 30 '17 at 14:15
Also I do not understand why a non-faulty note should go in the same direction as a faulty one. A non-faulty node would always reject non-valid transactions/blocks. Or in other words if a faulty node agrees with a non-faulty nodes direction why is that a malicious action of the faulty-node? – Ini Aug 30 '17 at 14:20
Let us continue this discussion in chat. – David Schwartz Aug 30 '17 at 14:22
How does this relate to PoW? I mean in PoW we can have d > (h/2) and everything still works fine, but it takes longer to achieve arguably finality. In PoW if d is to big, then there is the possibility of a chain-rewrite. – Ini Apr 27 '18 at 17:24
1

@Invader Everything doesn't still work fine. It may still work fine or it may fail. There's a statistical chance you're fine and a statistical chance the dishonest nodes will win. PoW sacrifices safety to tolerate more dishonest nodes, and that's a perfectly reasonable choice to make. – David Schwartz Apr 27 '18 at 17:34
So this formula d < (h/2) actually says that you can create a system that has have safety (does not produce forks) and liveness in case you have d < (h/2), but you cannot create such a system with the same properties with d >= (h/2)? – Ini Apr 27 '18 at 18:07
Or in other words. You cannot create a system that has safety and liveness and d >= (h/2) byzantine nodes. True? – Ini Apr 27 '18 at 18:20
@Invader Right. You can pick a threshold and then compute when you lose safety and when you lose liveness. But no threshold will preserve both safety and liveness once you exceed 1/3 failed nodes. – David Schwartz Jul 07 '18 at 23:12
Thank you, So in this case, contrary to belief that Bitcoin is secure till majority (= 1/2) of mining power is honest should be wrong, isn't it? Thanks – Questioner Nov 08 '18 at 10:37
1

@sas Bitcoin isn't secure even if all mining power is honest. Consider, for example, if a network failure splits the network in two. Both sides will produce blocks every 20 minutes on average and both sides will eventually have sufficient confirmations to rely on transactions that could conflict from one side to the other. The applicability of these theoretical results to practical systems is not always simple or direct. – David Schwartz Nov 08 '18 at 17:55
@David Schwartz , With considering the well-known paper titled: "Impossibility of Distributed Consensus with One Faulty Process" (https://apps.dtic.mil/dtic/tr/fulltext/u2/a132503.pdf) showing that no completely asynchronous consensus protocol can tolerate even a single unannounced process death, can we still assume that the network is asynchronous ? As in that case the network cannot tolerate even one faulty node. Thank you. – Questioner Feb 26 '20 at 09:32
@Questioner It's the difference between theory and practice. Bitcoin has zero partition tolerance, so in theory it's completely useless. But it works perfectly fine in practice, even if the network partitions. – David Schwartz Feb 26 '20 at 17:59

Gregory Magarshak · Answer 2 · 2017-12-06T16:26:19.640

1

According to Wikipedia's article on Byzantine Fault Tolerance, that is only true when messages can be forged. Because showing a solution exists to the whole Byzantine Generals Problem can be reduced to showing a solution would exist to the problem of one General and two Lieutenants, and each Lieutenant wouldn't be able to tell whether the messages they were getting were really from the General or forged by the other Lieutenant.

However, for unforgeable messages (or where forgeries are infeasible) you could have an arbitrary number of traitors. Quote:

"A second solution requires unforgeable message signatures. For security-critical systems, digital signatures (in modern computer systems, this may be achieved in practice using public-key cryptography) can provide Byzantine fault tolerance in the presence of an arbitrary number of traitorous generals."

Source: https://en.m.wikipedia.org/wiki/Byzantine_fault_tolerance

To address the derivation given by @DavidSchwartz for example:

"Think about the case where all the honest nodes are evenly split about the two directions the system could make forward progress. The malicious nodes could tell all the honest nodes that they agree with them. That would give h/2 + d nodes agreeing on each of two conflicting ways the system could make forward progress."

If each honest node only accepted signed communications from others, then by gossiping, the honest nodes would learn of the malicious nodes' duplicity and stop listening to them. An algorithm could theoretically be designed that would automatically maintain both forward progress and liveness in each subset of honest participants, whose connections form a connected graph. (Obviously if their only links are through malicious participants, they cannot get any messages through to each other reliably.)

In the XRP consensus protocol, these signatures do constitute proof of malfeasance by the actual validators. The trick is figuring out how a group can achieve consensus as to who is a validator and who is not. Gossiping proof of malfeasance should be enough to stop an honest node from listening to that validator. As noted in their whitepaper, they value correctness before consensus.

edited Dec 06 '17 at 16:26

answered Dec 06 '17 at 16:17

Gregory Magarshak

95
10

@DavidSchwartz let me know if I made any mistake here, but I think that unforgeable signatures are a get-out-of-jail-free card allowing both liveness and progress in theory. There may, however, be an additional theoretical result on lower bounds that I (and the editors of the wikipedia article) are not aware of. – Gregory Magarshak Dec 06 '17 at 16:27
I've yet to see any scheme proposed that had anything like that capability. Your gossip scheme doesn't work -- you just create a new Byzantine agreement problem of how to know when you're done gossiping. Do you use gossip to solve that agreement problem? – David Schwartz Apr 27 '18 at 22:48
No, we simply have the sender of the transaction make a final endorsement before it is posted to the ledger, but after it is signed by a supermajority of validators. If the supermajority is not reached before a timeout (e.g. because of a netsplit) the sender can just submit the transaction to a different group of validators (we implemented sharding by transaction) so maybe that group will come to a consensus first. The sender then simply endorses one approved transaction and ignores the other. This is only possible because the sender has ultimate say over if they want to send something. – Gregory Magarshak May 23 '18 at 23:41
I don't see how this helps you. The sender can sign two conflicting transactions if they want to. Say there's a split of honest validators, the dishonest validators sign both ides of the split, and the sender signs both transactions too. The sender already has to be dishonest for a double spend anyway. – David Schwartz May 24 '18 at 08:16
If there is a split of honest validators in a consensus group, then at least one of the subgroups after the split won’t be able to achieve a supermajority, and the one that does sign a supermajority (assuming there is one) needs only ONE honest validator to detect that a dishonest validator has given (and signed) two different answers for the whole thing to be rejected by any self-interested recipient. – Gregory Magarshak May 25 '18 at 16:13
The key why any of this works is because in crypto-currency, the recipients are ultimately the final arbiters of what consensus result they consider untainted (valid and unforked history). Every time A pays B, B is ultimately either going to accept the payment, or not. They can take their sweet time looking for claims of violation and forks and investigate them all. The goal is to simply build aledger that maximizes the probability of surfacing this information. – Gregory Magarshak May 25 '18 at 16:15
I don't think you can actually achieve the results you think you can achieve. If you wait X time to accept a payment, someone can release conflicting signatures also after X time. Now you might accept a payment others don't. It just isn't nearly as easy as it seems and you wind up just pushing the weakness to another corner. – David Schwartz May 25 '18 at 18:44
It doesn’t matter if someone releases conflicting signatures. B waits for a supermajority of the validators for token T to SIGN a statement saying they didn’t see a double-spend, then A endorses the transfer. As long as the majority of the honest nodes have signed the statement, they won’t change it later, and even if they do, B can produce the statement they signed contradicting themselves. The whole point is that there can be more than 33% dishonest nodes and it will still work. – Gregory Magarshak May 27 '18 at 03:21
I don't see how. Your quorum can't be more than 66% because otherwise with 34% dishonest nodes you make no forward progress. So say you have 66% honest nodes and they're split 50/50. Each txn gets 33% from its half of the honest nodes and 33% from the dishonest nodes, so each side meets your quorum. The dishonest nodes talk only to the sender, the sender signs both (he's trying to double spend, so of course he does), and everything now goes to two receivers, each accepting the conflicting txn with 66% and a sender sign. No honest node (but recips) ever see the conflict. – David Schwartz May 27 '18 at 08:23
Why can’t the quorum be more than 66%? That is the whole point. If the group makes no forward progress, the sender simply issues another transaction to another group, and endorses whichever group does reach a consensus first. If the other group ever does reach a consensus also, it is dropped by the sender. The 33% nodes may be simply netsplit and not malicious. Re-read what I originally wrote in my second comment on this thread. – Gregory Magarshak May 28 '18 at 14:18
If the quorum is more than 66%, then you can't make forward progress if 33% of the nodes are dishonest. They just refuse to reach a quorum and you're dead. Tolerating 33% or more dishonest nodes necessarily caps the quorum at 66%. (As my answer explains.) – David Schwartz May 28 '18 at 16:32
I just addressed that above. Let them refuse to reach a quorum. The sender A simply issues another transaction to ANOTHER consensus group and endorses whichever group reaches a consensus first, ignoring the other. So yes one consensus group makes no forward progress... and so what? – Gregory Magarshak May 29 '18 at 18:55
Then what if the sender endorses both consensus groups? You keep pushing the problem someplace else. If we're talking about resisting a double spend, which we are, the sender is necessarily dishonest. I seems like you didn't read and understand my answer that shows that you have to pick a quorum that protects against both failure modes. – David Schwartz May 29 '18 at 19:13
The honest participants don't know, for a given transaction, whether the sender is attempting a double spend or the dishonest nodes are attempting to fail to reach a quorum. You need to pick a quorum that protects against both threats. And the quorum must be large enough that the sender can't send two conflicting txns to two different groups and get them both approved. – David Schwartz May 30 '18 at 13:31
It seems to me that you didn’t understand my answer which makes me suspect there is some misunderstanding. I believe it js that in your conception the consensus is always global about all transactions in a block, and in mine it’s about each token. The sender may be dishonest but they’d only be hurting themselves by paying twice for the same goods. Let me make it simple: you pull out a credit card and the transaction stalls for an hour. So you pull out another credit card and it goes through. When the first credit card transaction completes successfully you don’t endorse it or dispute it. – Gregory Magarshak May 30 '18 at 21:42
What if the two transactions spend precisely the same asset? And if you're going to argue that you can't have two different consensus groups for the same asset, then you can't tolerate losing 1/3 or more of any one consensus group without at least one asset becoming unusable. – David Schwartz May 31 '18 at 15:17
Maybe I'm misunderstanding you. If you want to sketch out your proposed protocol in more detail, I'd be happy to review it. – David Schwartz May 31 '18 at 15:29
OK I will write it up on our forum and link you to it. Then you can comment there (easier and probably more appropriate than here). – Gregory Magarshak Jun 01 '18 at 03:42
Okay @DavidSchwartz I have updated our article on Consensus here, as promised: https://forum.intercoin.org/t/intercoin-technology-consensus/80 . I took the time to describe it in much more detail, and also compare it to other systems including XRP Consensus Process. Please take your time to understand the system it's describing, and I'd be happy to see what you have to say. You can make an account easily on that forum, and post your review as a comment there, because it probably doesn't belong on stackoverflow. – Gregory Magarshak Jun 20 '18 at 03:39
See my comments. That's a consensus design that could only used for incredibly simple applications. For example, you couldn't use it to support a decentralized exchange like the XRP Ledger has or smart contracts like Ethereum has. – David Schwartz Jun 21 '18 at 00:48
Ideally, people should all learn about falsification of claims they care about, and make up their own minds who to believe. That’s how people reliably arrive at the truth. Having a Ministry of Truth — whether made up of PoW, PoS or representatives etc. — creates an attractive honeypot for hacking. It’s also much more centralized. But in this stackoverflow question my main point is just that there is a way to do BFT consensus in the face of MORE than 33% malicious nodes. – Gregory Magarshak Jun 24 '18 at 11:18
Except there isn't. It doesn't work with more than 33% malicious nodes. If the malicious nodes take both sides, you have two majorities. Nothing in your scheme stops malicious nodes from taking both sides. – David Schwartz Jun 24 '18 at 16:15
Both sides of what? There is only disproof. There is no voting. If malicious nodes SAY two different things then the other nodes GOSSIP that and proceed accordingly! For example if a malicious or not malicious node actually showed a fork then it doesn’t matter that it also said there was no fork to someone else. EVEN ONE node can prove there was a fork! – Gregory Magarshak Jun 25 '18 at 22:21
If there is only disproof, there is no proof and nobody can ever accept a txn as valid because they have no idea what disproof other people might have. Without proof of the absence of disproof, you can never rely on a transaction at all. If I have no way to know that disproofs don't exist, I can't rely on a transaction because others might have disproofs. How can Alice rely on a txn from Bob if Charlie may have a disproof? Alice must somehow determine that Charlie has no disproof -- that's the other side. – David Schwartz Jun 26 '18 at 16:57
By the same token you can ask, how do I know the nodes that arrived at a consensus are the whole network and not a small subnetwork of a much larger one which will eventually reach a conflicting consensus? This is more about the step prior to consensus - namely selection of the set of nodes that will matter in the first place, for token T or transaction X. For us we use Kademlia but a naive implementation would just be to list the IP addresses of the nodes directly in the token history itself. Either way you need to have this list. Ripple has the UNL. – Gregory Magarshak Jul 02 '18 at 06:13
But XRP doesn't rely on 100% agreement on the nodes that matter and it has ony a single network that's human managed. It's not clear how you can have sufficient agreement to allow a deterministic algorithm to winnow that set down to an agreed consensus set. XRP has a working, deployed system that's fully specified. We had to solve every problem to do that. You can't handwave solutions to the hard problems and compare that to actual implementations, – David Schwartz Jul 02 '18 at 21:28
Are you imagining some global consensus on which nodes matter for some particular asset or transaction? If so, how is that global consensus achieved? XRP uses humans because it only has to do that once and has an algorithm that doesn't require perfect agreement to allow the system to evolve without a central authority. You can't begin by assuming the hard problem is solved, then compare your solution to the easy remaining problems to the solutions to the hard problem itself that you began by assuming was solved! – David Schwartz Jul 02 '18 at 21:29
It is achieved using a technique called Kademlia which is used in DHTs. The SAFE Network uses it in much the same way as we do: each token has a group of the closest N computers in XOR space to watch it. But there are other ways to do it as well. That consensus is not global, not all nodes know every output. – Gregory Magarshak Jul 03 '18 at 22:17
Download the SAFE Network and run it yourself, it is a working implementation. They solve many related problems, including transferring ownership of files and proving that they are still being stored without revealing their contents. – Gregory Magarshak Jul 03 '18 at 22:20
You might want to add a link to Lamport's paper on Byzantine Fault Tolerance https://people.eecs.berkeley.edu/~luca/cs174/byzantine.pdf . The algorithm you're talking about is called SM in the paper. – qbt937 Aug 12 '18 at 06:01
This answer is incorrect. It is now well understood that Byzantine agreement requires f < n/3 under partial synchrony or asynchrony even with digital signatures, but can be solved with f < n/2 under synchrony with digital signatures. See e.g. https://eprint.iacr.org/2017/307.pdf – qweruiop Aug 14 '18 at 07:54
Maybe you are right, but please define "Byzantine agreement" rigorously so we can evaluate the claim. It seems that if n/3 < f < n/2 it can be solved at least with synchrony and digital signatures as you said. Beyond n/2 you can't know whether there was a netsplit and a larger consensus will override yours, so you can't make forward progress if malicious nodes fake a netsplit. – Gregory Magarshak Aug 14 '18 at 19:37
Also how do you explain this, then? https://www.trustnodes.com/2018/08/10/vitalik-buterin-proposes-consensus-algorithm-requires-1-honest ... Leslie Lamport proposed it actually. – Gregory Magarshak Aug 14 '18 at 19:39
Also @qbt937 here is what Lamport's paper that you linked says: Now that we have introduced signed messages, our previous argument that four generals are required to cope with one traitor no longer holds. In fact, a three-general solution does exist. We now give an algorithm that copes with m traitors for any number of generals. (The problem is vacuous if there are fewer than m + 2 generals.) – Gregory Magarshak Dec 26 '19 at 06:40
@GregoryMagarshak Yes, which agrees with your answer. Though I fail to see what point you are making in your comment. – qbt937 Dec 26 '19 at 08:22

Yaoshiang · Answer 3 · 2019-07-05T03:03:45.397

The intuition here... could two nodes get to the point that the majority (e.g. both nodes) agree? If one of the nodes is malicious, then no, that node can always vote against the honest node.

Could three? Seems like they might - couldn't the two honest nodes figure out who the third dishonest node is? But in fact, they cannot - the dishonest node D can tell honest node A one thing, and honest node B another thing.

Let's say they are trying to agree on the color of the sky.

Honest Node A hears from honest node B: "I think the sky is blue, and node D says the sky is blue too". But A hears from dishonest node D: "I think the sky is red, and node B thinks the sky is red too". What is A gonna believe?

In the case of 4 nodes, if there's only 1 dishonest node, node A would hear two versions of "the sky is blue" and one version of "the sky is red" - so A would go with the sky being blue.

Essentially, if a malicious 1/3rd of the nodes can counteract the vote of an honest 1/3rd... then you need one more node as a tie break. Hence, 3f+1 nodes, where f is the number of dishonest nodes 2f+1 is the number of honest nodes.

Byzantine fault-tolerant consensus - Why 33% threshold

3 Answers3

Linked