Current location - Quotes Website - Signature design - The core problem of distributed system-Byzantine problem and its algorithm
The core problem of distributed system-Byzantine problem and its algorithm
Byzantine problem is more extensive, which discusses the consistency problem in the scene where a few nodes are allowed to do evil (the news may be forged). Byzantine Fault Tolerance (BFT) algorithm discusses how to realize the knowledge of * * * system under Byzantine conditions.

Before the issue of Byzantine generals, there was a paradox between the two generals: the two generals had to agree on whether to attack or retreat through messengers, but the messengers might get lost or be blocked by the enemy (the news was lost or forged). How to reach an agreement? According to FLP impossibility principle, there is no universal solution to this problem.

Byzantine problem, also known as Byzantine general problem, is a fictional model put forward by Leslie Lamport and other scientists in 1982 to explain the consistency problem. Byzantium was the capital of the ancient Eastern Roman Empire. Due to the vast territory, many generals guarding the border (multiple nodes in the system) need to send messages through messengers to reach some consistent decisions. But because there may be traitors among generals (the nodes in the system are wrong), these traitors will try to send different messages to different generals in an attempt to interfere with the achievement of knowledge. The problem in Byzantium is how to get loyal generals to act in unison under such circumstances.

It is pointed out that for Byzantine problem, if the total number of nodes is n and the number of mutinous generals is f, the problem will be solved when N≥3F+ 1, which is guaranteed by BFT algorithm.

For example, when N=3 and F= 1.

The proposer is not a traitor. If the proposer makes a proposal, the traitor can claim that he has received the opposite order. Then the third person (loyal minister) received two opposite messages and could not judge who was the traitor, so the system could not reach an agreement.

The proposer is a traitor, and two opposing proposals are sent to two other people respectively, and the other two people both receive two opposing messages, so it is impossible to judge who is the traitor and the system cannot reach an agreement.

More generally, when the proposer is not a traitor and puts forward the proposal information 1, there will be N-F confirmation information 1 and f uncertain information (possibly 0 or 1, assuming that the traitor will try to interfere with the consensus) for the cooperator, n-f >; F, that is, the N & gt2F case reached an agreement.

When the promoter is a traitor, he will try to make the opposite proposal to N-F partners. Judging from the partners who receive 1, there will be (N-F)/2 messages 1 and (N-F)/2 messages 0 in the system; According to the partners who receive 0, there will be (N-F)/2 messages 0 and (N-F)/2 messages 60s1in the system; There is also F- 1 uncertain information. If the cooperators want to reach an agreement, they should make a further judgment on the obtained message, ask others for the message value of a suspected object, and take the majority as the information value of the suspect. This process can be further recursive.

Leslie Leslie Lamport and others proved in the article "Reaching an Agreement in the Case of Faults" that when the number of traitors is less than 1/3, there is an effective Byzantine fault-tolerant algorithm (in the worst case, F+ 1 round interaction is required). On the other hand, if there are too many defectors, exceeding 1/3, there is no guarantee of consistent results.

So, when the traitor exceeds 1/3, is there a possible solution?

Imagine f traitors and l loyalists. Traitors who deliberately do bad things can give wrong results or not respond. At some point, if F traitors don't respond, L loyalists can get the right result by taking the majority. When F traitors gave a malicious proposal and F loyalists went offline, the remaining L-F loyalists could not mix with traitors at this time, and it was still necessary to ensure that most of them could get the correct results. Therefore, l-f >; F, that is, L & gt2F or NF & gt2F, so the overall scale n of the system should be greater than 3F.

The number of nodes in the Byzantine system can be guaranteed to be at least 4, and at this time, at most 1 bad nodes are allowed.

3. Byzantine fault-tolerant algorithm

Byzantine Fault Tolerant (BFT) is a fault-tolerant algorithm for Byzantine problems, which solves how to realize * * * knowledge when network communication is reliable but nodes may fail. Byzantine fault-tolerant algorithm was first discussed in the paper "Polynomial Algorithm of Byzantine Protocol" published by Leslie Lamport and others in 1980, and then a lot of improvement work appeared. There has always been a high complexity problem in solving Byzantine problems until the PBFT algorithm was put forward.

In 1999, the practical Byzantine fault-tolerant (PBFT) algorithm proposed by Castro and Liskov in their paper "Practical Byzantine Fault-tolerant and Active Recovery" is optimized on the basis of previous work, which reduces the complexity of Byzantine fault-tolerant algorithm from exponential level to polynomial level for the first time and has been widely used. When the total number of failed nodes does not exceed 1/3, the security and activity can be guaranteed at the same time.

PBFT algorithm uses cryptography related technologies (RSA signature algorithm, message authentication coding and digest) to ensure that the message transmission process is not tampered with and destroyed.

The basic process of this algorithm is as follows:

Firstly, a node is selected as the main node by rotating or random algorithm, and then it is called view); As long as the master node does not switch;

In the view, the client will request

All nodes process the completion request, and the processing result will be

Indeed; The preparation and submission phases ensure that the confirmation requests between different views remain in order:

Pre-preparation stage: the master node assigns a suggestion number to the request received from the client, and then sends out a pre-preparation message.

Preparation stage: after receiving the preparation message, the replica node checks whether the message is legal, and if it passes the check, it sends the preparation message to other nodes.

Submission stage: broadcast the submission message to tell other nodes that the proposal n in view V is ready ... If at least 2 f+ 1 verification submission messages are collected, the proposal is passed.

The specific implementation also includes view switching, checkpoint mechanism and so on. And you can refer to the content of the paper by yourself.