The most basic technical principle of PoW algorithm is to use hash algorithm. Assuming that the hash value Hash(r) is found, if the original data is r(raw), the operation result is R(Result).
R = hash (r)
The characteristic of Hash function hash () is that the result R can be obtained for any input value R, and it cannot be inferred from R. When the original data R changes 1 bit, the value of the result R changes completely. In the PoW algorithm of Bitcoin, the algorithm difficulty d and the random value n are introduced, and the following formula is obtained:
Rd = hash (r+n)
The formula requires that when the random value n is filled in, the first d bytes of the calculation result Rd must be 0. Because the result of hash function is unknown, every miner has to do a lot of calculations to get the correct result. After the calculation results are broadcast to the whole network, other nodes only need to do a hash operation to check. PoW algorithm uses this method to make the calculation consume resources, and the verification is only needed once.
?
PoS algorithm requires the node verifier to pledge a certain amount of funds to qualify for mining and packaging, and the regional chain system adopts random method when selecting packaging nodes. The more money a node pledges, the greater the probability that it will be selected as a packaging block.
In POS mode, each coin generates 1 coin age every day. For example, if you hold 100 coins for 30 days, then your coin age is 3000. At this time, if you verify a POS block, your currency age will be cleared, and you will get the corresponding digital currency interest from the block.
The process of a node blocking through PoS algorithm is as follows: To become a blocking node, an ordinary node must first pledge its own assets. When it is its turn to block, it will broadcast the block package to the whole network, and other verification nodes will verify the legitimacy of the block.
?
DPoS algorithm is similar to PoS algorithm, and shares and rights are also pledged.
But the difference is that DPoS algorithm adopts the way of entrustment pledge, which is similar to the way of selecting N super nodes by universal suffrage.
Voters vote for a node. If a node is selected as an accounting node, then this accounting node can often repay its voters in any way after receiving a reward.
These N bookkeeping nodes will be blocked in turn, and the nodes will supervise each other. If they do evil, the mortgage will be deducted.
By trusting a small number of honest nodes, unnecessary steps in the process of group signature can be removed and the transaction speed can be improved.
?
Byzantine problem:
Byzantium was the capital of the ancient Eastern Roman Empire. In order to defend each fief, an army led by a single general was stationed, and the generals could only send messages by courier. In a war, all generals must reach an understanding of * * * and decide whether to fight * * *.
But there may be traitors in the army, and these people will affect the understanding of the generals. The problem of Byzantine generals refers to how the remaining generals reach a unanimous decision when one of them is known to be a traitor.
BFT:
BFT is Byzantine fault tolerance, and Byzantine fault tolerance technology is a fault tolerance technology in the field of distributed computing. Byzantine hypothesis is a model of the real world. Due to hardware errors, network congestion or interruption, and malicious attacks, computers and networks may have unpredictable behaviors. Byzantine fault-tolerant technology is designed to deal with these abnormal behaviors and meet the specification requirements of the problems to be solved.
Byzantine fault-tolerant system;
Fault nodes are called Byzantine nodes, and normal nodes are non-Byzantine nodes.
Assuming that the distributed system has n nodes and the Byzantine nodes of the whole system are not more than m (n ≥ 3m+ 1), the Byzantine fault-tolerant system needs to meet the following two conditions:
In addition, the Byzantine fault-tolerant system needs to achieve the following two indicators:
PBFT is a practical Byzantine fault-tolerant algorithm, which solves the problem of low efficiency of the original Byzantine fault-tolerant algorithm. The time complexity of this algorithm is O (n 2), which makes it possible to solve the Byzantine fault-tolerance problem in practical system applications.
?
PBFT is a state machine replication algorithm. All copies are operated during view rotation. The master node is determined by the view number and the number of nodes, that is, the master node p = v mod |R|. V: number of views, |R| number of nodes, and p: number of main nodes.
The * * * identification process of PBFT algorithm is as follows: the client initiates a message request and broadcasts it to each replica node, and one of the master nodes initiates a pre-prepare message and broadcasts it. Other nodes get the original message and send the preparation message after the verification is completed. Each node receives the 2f+ 1 prepare message, which means that it is ready and sends the submit message. When the node receives 2f+ 1 submission messages and the client receives f+ 1 same reply messages, it means that the request initiated by the client has reached the whole network.
The specific process is as follows:
Client c sends
After receiving the request from the client, the master node needs to submit the following contents:
A. Whether the signature of the client request message is correct.
Illegal drop request. The correct request is assigned a number n, which is mainly used to sort the client's requests. And then broadcast a
Replica node I receives the pre-preparation message from the master node and needs to submit the following contents:
A. Whether the signature of the prepared message of the master node is correct.
B whether the current replica node has received the prepared message with the same V number and different signatures.
Whether the abstracts of c.d. and m are consistent.
Whether d. n is in the interval [h, H].
Illegal drop request. Correct request, replica node I sends the message.
When the master node and the replica node receive the preparation message, they need to submit the following contents:
A. Whether the signature of the message prepared by the replication node is correct.
B whether the current replica node receives n in the same view v.
Whether c. n is in the interval [h, H].
Is the D.D. the same as the D in the currently received preparatory document?
Illegal drop request. If the replica node I receives 2f+ 1 verified preparation messages, it sends the message < COMMIT, v, n, d, i>, and v, n, d, i are the same as the above preparation messages. & ltCOMMIT, v, n, d, i> Make the signature of the replica node i. Record the submit message in the log to recover the unfinished request operation during the view change. Record the preparation messages sent by other replica nodes in the log.
When the master node and the replica node receive the submit message, they need to submit the following contents:
A. Whether the signature of the submission message of the replica node is correct.
B whether the current replica node receives n in the same view v.
Whether the abstracts of c.d. and m are consistent.
Whether d. n is in the interval [h, H].
Illegal drop request. If the replica node I receives 2f+ 1 verification submission messages, it means that most nodes in the current network have reached * * * awareness, run the operation o requested by the client, and return.
?
If the master node is evil, it may assign the same serial number to different requests, or not assign serial numbers, or make adjacent serial numbers discontinuous. The backup node should be responsible for actively checking the legality of these serial numbers.
If the master node disconnects or does evil and does not broadcast the client's request, the client sets a timeout mechanism, and if it times out, it broadcasts the request message to all replica nodes. The replica node detects that the master node is evil or offline, and starts the view change protocol.
To view the change agreement:
Replica node broadcast
When the master node p = v+ 1 mod |R| receives 2f valid view change messages, it broadcasts them.
The replica node receives the new view message from the master node and verifies its validity. If it is valid, enter the v+ 1 state and start the pre-prepared message processing flow in O.
?
In the above algorithm flow, in order to ensure that the previous request can be restored in the process of view change, each replica node records some messages in the local log, and the replica node needs to clear the previously requested messages after executing the request.
The simplest method is to synchronize the knowledge of the current state again after replying to the message, which is expensive, so you can synchronize the state again after executing multiple requests k (for example, 100). The state synchronization message is a checkpoint message.
Replica node I sends
This is an ideal situation. In fact, after the replica node I sends the checkpoint message to other nodes, other nodes have not completed k requests, so they will not respond to I's request immediately, and will move forward at their own pace, but the checkpoints sent at this time have not yet formed stability.
In order to prevent the processing request of I from being too fast, the above-mentioned high and low water level interval [h, H] is set to solve this problem. The low water level H is equal to the number of the last stable checkpoints, and the high water level H = h+L, where L is the value we specify, which is equal to the integer multiple of the number of requests processed periodically by checkpoint, and can be set to L = 2K. When the replica node I processes the request exceeding the high water level H, it will stop at this time, wait for the stable checkpoint to change, and then move on.
?
In the blockchain scenario, it is generally suitable for private chain and alliance chain scenarios that require strong consistency. For example, in the blockchain super ledger project led by IBM, PBFT is an optional protocol. In the Fabric project of Hyperledger, the * * * recognition module is designed as a pluggable module, which supports * * * recognition algorithms such as PBFT and Raft.
?
?
Raft is based on a Leader-driven * * * knowledge model, in which an excellent Leader will be selected, and the Leader will be fully responsible for managing the cluster, and the Leader will be responsible for managing the replication logs between all nodes of the Raft cluster.
?
In the following figure, the leader of the cluster (S 1) will be selected during the startup process, and all commands/requests from the client will be serviced. All nodes in the Raft cluster maintain a distributed log (replication log) to store and submit commands (log entries) issued by clients. The leader accepts log entries from the client and copies them among all followers (S2, S3, S4, S5) in the Raft cluster.
In Raft cluster, the minimum number of nodes is needed to provide the expected level of knowledge assurance, which is also called arbitration. The minimum number of votes required to perform an operation in a Raft cluster is (N/2+1), where n is the total number of members in the group, that is, at least half of the votes are cast, which is why the cluster nodes are usually odd. So in the above example, we need at least 3 nodes to have * * * knowledge guarantee.
If the legal arbitration node is unavailable for any reason, that is, the voting does not exceed half, no agreement can be reached in this negotiation, and a new log cannot be submitted.
?
Data storage: Tidb/TiKV
WSJ: Chandler of Alibaba
Service discovery: consulting &; etcd
Cluster scheduling: HashiCorp Nomad
?
Only failed nodes (CFT) can be accommodated, but not evil nodes.
Sequential voting can only be applied serially, so its performance is poor in high concurrency scenarios.
?
Raft solves the problem of distributed knowledge by solving three main sub-problems around leader election, managing distributed logs and security functions of algorithms.
When we start a new Raft cluster or a leader is unavailable, a new leader will be elected through negotiation among all member nodes in the cluster. Therefore, in a given instance, the nodes of a Raft cluster can be in any of the following states: followers, candidates or leaders.
When the system starts, all nodes are followers. If they don't receive the leader's heartbeat signal for a period of time, followers will turn into candidates; .
If a candidate node receives the votes of most nodes, it can be transformed into a leader, and the remaining candidate nodes return to the follower state;
Once the leader discovers that the $ TERM of a leader node in the system is higher than himself, he will become a follower.
Raft uses a heartbeat-based RPC mechanism to detect when to start a new election. During normal times, the leader will send heartbeat messages to all available followers on a regular basis (in fact, the log may be sent with the heartbeat). Therefore, other nodes start from the slave state and remain in the slave state as long as it receives the periodic heartbeat from the current leader.
When the follower times out, it will start the election process in the following ways:
According to the responses received by the candidate nodes from other nodes in the cluster, three election results can be obtained.
* * * The realization of knowledge algorithm is generally based on replication state machine. What is a replication state machine?
Simply put: the same initial state+the same input = the same end state. Different nodes should use the same deterministic function to process input without introducing uncertain values, such as local time. It is a good idea to use replication log, which has the characteristics of persistence and order preservation and is the cornerstone of most distributed systems.
With the Leader, all concurrent requests of the client can form an orderly log (status) sequence on the Leader side to indicate the processing order of these requests. Then, the leader sends his own log sequence to his followers to maintain the global consistency of the whole system. Note that it is not strong consistency, but final consistency.
A log consists of log entries with an ordered number (log index). Each log entry contains the term number ($ TERM) when it was created, and the data contained in the log. The data contained in the log can be of any type, from simple type to blockchain. Each log entry can be represented by a [$ term, index, data] sequence pair, where $ term stands for term, index stands for index number and data stands for log data.
The leader tried to execute the replication command on most nodes in the cluster. If the replication is successful, the command will be submitted to the cluster and the response will be sent back to the client. Similar to two-phase submission (2PC), but different from 2PC, the leader only needs more than half of the nodes to agree (in working state).
Both the leader and the follower may collapse, so the log maintained by the follower may have the following situation compared with the leader.
When the Leader is inconsistent with the Follower, the leader forces the Follower to copy its own log, and the leader will try from back to front, and try the previous log entry (decreasing the nextIndex value) after each Appendextension fails, until each follower's log consistent position point is successfully found (based on the above two guarantees), and then overwrite the follower's entries one by one. Therefore, missing or extra entries may persist for multiple terms.
?
Require the candidate's log to be at least as up-to-date as other nodes. Otherwise, the follower node will not vote for the candidate.
This means that each submitted entry must exist in at least one of these servers. If the candidate's log is at least as up-to-date as other logs in most logs, it will save all submitted entries and avoid log rollback events.
In other words, during any term, at most one leader can be elected. This is very important. In a replication set, there can only be one leader at any time. At the same time, there is a leader in the system, and this leader is called brain division, which is a very serious problem and will lead to the loss of data coverage. In raft, this property is guaranteed by two points:
Therefore, there must be only one leader in a certain term.
?
When the state of nodes in a cluster changes (cluster configuration changes), the system is prone to system failures. Therefore, in order to prevent this, Raft uses a method called two stages to change the cluster members. Therefore, in this method, before the new member configuration is realized, the cluster is first changed to an intermediate state (called joint knowledge). Joint knowledge enables the system to respond to client requests even when switching between configurations, and its main purpose is to improve the usability of distributed systems.