The owner of digitally signed data generates a signature for each record in the database, and gives the data and signature to the cloud service provider. Users get records and corresponding signatures when querying, and the correctness and completeness of records are verified by signatures. This method requires a lot of signature operations and is very expensive. The main idea of Merkle-based hash tree method is that the data owner constructs a Merkle hash tree (MHT) according to the records in the database, signs the root node and gives it to the cloud service provider. When users query, they get the returned records and related nodes of Merkle hash tree, and recalculate Merkle hash tree until the root node is verified. Because this method uses multiple hash operations and one signature, the generation efficiency and verification efficiency of Merkle hash tree are much higher than the first method, but Merkle hash tree is a binary balanced tree with a deep depth, and the cost of constructing verification objects and queries is still high. The main idea of probability-based method is sampling verification and cross verification, including challenge-response method, pseudo-tuple insertion method and double encryption method. Compared with the above two methods, this method is the most efficient and can meet most application requirements, but it cannot provide 100% verification.
The above three methods can realize data integrity verification in the cloud. However, when users store tens of GB or more data in cloud computing, when checking the integrity, when migrating data to and from the cloud storage system, they need to pay the migration fee of the cloud storage system, and the fee will be higher and higher with the increase of data volume, which will also consume the network bandwidth of a large number of users and reduce the network utilization rate. Based on this situation, new requirements are put forward for data integrity verification in cloud storage, that is, directly verifying the integrity of stored data in cloud computing environment, without downloading the data to the client first, and then uploading the data again after the client completes the verification. However, a more serious problem of data integrity verification in the cloud is that users can't know the whole data set, users don't know which physical servers their data are stored on or where those physical servers are located, and the data set may change dynamically and frequently, which makes the traditional technology of ensuring integrity invalid. Therefore, data integrity verification in cloud computing environment is an urgent problem to be solved, and it is also the premise of whether cloud computing can be widely used.