The Byzantine Empire, the Eastern Roman Empire, had huge wealth and had long been interested in its neighbors. For this reason, it sent 10 armies to surround this enemy. Although this enemy is not as powerful as the Byzantine Empire, it is enough to withstand the simultaneous attacks of five conventional Byzantine armies. For some reasons, these 10 armies cannot be gathered together to break through at a single point, but must attack simultaneously in separate encirclement states. Any one of their armies has no chance of winning if they attack alone. Unless at least 6 armies attack at the same time, they can capture the enemy country. They are scattered around the enemy country and rely on communications troops to communicate with each other to negotiate attack intentions and attack time. The problem that troubles these generals is that they are not sure whether there are traitors among them. The traitors may change the intention or time of the attack without authorization. In this state, could the Byzantine generals find a distributed protocol that would allow them to negotiate remotely and win the battle? This is the famous Byzantine Generals Problem in distributed systems, which means that messages can not only be lost, delayed, replayed, but also forged.
The PBFT (Practical Byzantine Fault Tolerance) algorithm was proposed by Miguel Castro and Barbara Liskov in 1999. It solved the problem of inefficiency of the original Byzantine fault-tolerant algorithm and reduced the algorithm complexity from exponential to polynomial. level, making the Byzantine fault-tolerant algorithm feasible in practical system applications.
PBFT is a state machine copy replication algorithm, which generally includes three protocols: consistency protocol (agreement), checkpoint protocol (checkpoint) and view change protocol (view change). The algorithm must satisfy the following two properties:
Safety: safety means nothing bad happens. Liveness: liveness means that something good eventually happens.
In a In a Byzantine system, to tolerate f Byzantine node errors, the number of replicas must be at least 3f+1, which is a prerequisite for security. Due to network delays or downtime, there are f nodes in the system that do not reply (f nodes include Byzantine nodes and non-Byzantine nodes, and in the worst case, f nodes are all non-Byzantine nodes). There may be f among the remaining 2f+1 responses. Byzantine nodes, thus obtaining n-2f>f, that is, the number of non-Byzantine nodes in the response is greater than the number of Byzantine nodes (f+1>f).
If the algorithm does not rely on synchronization to provide security, it must rely on synchronization to provide activity, otherwise it violates the FLP theorem (in an asynchronous communication scenario, even if only one process fails, no algorithm can guarantee that non-failed processes can reach consensus. sex). System activity can be guaranteed when Byzantine nodes do not exceed f and delay(t) is bounded. delay(t) represents the time interval from message sending to reception.
In a view, a primary will be selected from replicas, and the remaining replicas are called backups. If the behavior of the master node is abnormal, perform a view change to change the master. ?
The game starts by sending a request from the client to the primary. State machine operation, timestamp. The game ends after receiving at least f+1 replicas responses from the client. View number, timestamp, client identity, replica number, request result. why f+1? Because there are f Byzantine nodes among the 2f+1 committed that appear to agree to the request, but in fact will not reply to the request at all
3.1 The big show------three-phase agreement
Pre-prepare:
Primary assigns a sequence number n to the client request and discovers pre-prepared messages to all backups. View number, summary of the message.
Prepare:
Backups accept prepared messages if the following conditions are met: 1. The client request and prepared message have correct signatures.
2. The current view number is v. ? 3.Backups has never received a pre-preparation request with sequence number n but different digest in current view v. ? 4.h If the above conditions are met, backups receives the pre-preparation message, enters the prepare phase, broadcasts the preparation message to other nodes, and writes the pre-preparation and preparation messages Enter the log. commit: If backups receive 2f including its own prepare message consistent with the pre-prepared message, the request message and the pre-prepared message have the same view v and sequence number n, and After the relevant message has been written to the log, it enters the commit phase and broadcasts a confirmation message to other nodes. If each node receives 2f+1 identical commits messages, it sends a reply message to the client. 3.2 Garbage Collection ? PBFT is a state machine replica replication algorithm. Replicas will record executed messages in local logs. In order to save memory, a mechanism is needed to clean up log. When will it be cleaned? It is unwise to execute it after each operation because it is more resource-intensive. It can be cleaned regularly, such as every 100 times. We call the state executed after the request a checkpoint; the checkpoint with proof is called a stable certificate. When the node receives 2f+1 checkpoint messages, it can be proved that the stable checkpoint is correct. Log messages before the stable checkpoint can be deleted. When cleaning the checkpoint, replica i broadcasts a checkpoint protocol to other replicas, which is the last correctly executed request sequence number and its current status summary. If each replica receives 2f+1 checkpoint messages with the same sequence number and digest, then each replica can clear the log information with a sequence number less than or equal to n. The checkpoint protocol is also used to update horizontal lines. The low horizontal line is equal to the sequence number of the latest stable checkpoint, and the high horizontal line is the log size. 3.3 View changes When the master node hangs up, or during the commit phase some nodes receive 2f+1 commits, and some do not receive 2f+1 commits, resulting in inconsistent status. These conditions require changes to the view to provide system liveness and security. When the request times out, the backup node enters view v+1 and broadcasts the view change message. Stable checkpoint sequence number, is a stable checkpoint proof, is a set that contains a set of messages related to the request (the requested sequence number is greater than ). Contains 2f+1 identical preparation messages. ?When the master node of view v+1 receives 2f identical view change messages and broadcasts new view messages to other replicas, is 2f+1 view change messages. The calculation rules of are as follows: 1. Determine the serial number and. where is equal to the stable checkpoint sequence number in , and is equal to the maximum prepare message sequence number in . ? 2. The master node allocates a pre-prepare message to each sequence number n between and . If contains the combination corresponding to n, the corresponding prepared message is (that is, the request corresponding to sequence number n has 2f+1 prepare messages, and this request is still submitted in the new view). If does not contain the corresponding combination of n, the null message is submitted as , that is, no processing is performed. After receiving the new view message, the replica broadcasts a prepare message, enters v+1, and the view change is completed.