? Ciphertext \"gsHgw832iSI\" Key \"1112" />
Current location - Quotes Website - Personality signature - Using cryptographic principles to explain why HTTPS is secure
Using cryptographic principles to explain why HTTPS is secure

Symmetric encryption

We want to encrypt a piece of text. The process may be like this:

Plain text "Hello World"? Algorithm

-------> ? Ciphertext "gsHgw832iSI" Key "111222"

The decryption process is like this:

Ciphertext "gsHgw832iSI" ?Algorithm

-------> ? Plain text "Hello World" key "111222"

The key here is that the encryption and decryption have the same key and the same algorithm (reciprocal), So it's called "symmetry."

The most secure symmetric encryption algorithm: disposable board

Is there an absolutely secure encryption algorithm? have!

If each character in the plaintext is mapped to a character in the ciphertext through a character in the key, the length of the key is equal to the length of the text, and the key is only used once, then this This kind of key is vividly called a "one-time pad".

Since the length of the disposable board is equal to the length of the plaintext and ciphertext, and it must be discarded after use, the key cannot be repeated, so it is quite secure and cannot be cracked even in theory. Just imagine, even if you guess the key correctly and get the plaintext, it is also possible that the key is another and you can get another kind of plaintext. How can you be sure that the plaintext is the former one and not the later one? In fact, when the ciphertext is determined, any given plaintext will have a corresponding key corresponding to it! So there's absolutely no point in trying to hack it.

However, the amount of information we usually need to encrypt is very large. If we want to encrypt a 1GB file, the key must also be 1GB, so transmitting and saving the key is too scary. Therefore, disposable boards are not practical.

Compromise: Symmetric encryption with fixed key length

For algorithms whose key length is smaller than the length of the plaintext and ciphertext, it can be cracked in theory, because when the ciphertext is determined Below, it is not possible to specify a plaintext arbitrarily, there is a corresponding key corresponding to it (because the same key needs to encrypt different blocks, if the plaintext is not the original text, it is impossible to find a key that can be applied to all block).

But if the key length is sufficient, it can be approximately considered unbreakable. Experts believe that a 112-bit key is secure enough because it would take an astronomical amount of time to crack it. To round up, 128 bits are usually used. The AES algorithm is today's standard for symmetric encryption algorithms.

Big question: How to send the key

If A wants to send a confidential document to B, he will definitely use key K to encrypt it. He sent the ciphertext to B first, which was no problem. But the question is, how to tell B the key K so that he can decrypt it?

If you send the key directly, anyone can peek into the key. As long as the person has intercepted the file he sent, it can be decrypted.

Is it okay to encrypt key K with a new key? Of course not, because, how do you transmit this new key?

At this time, A finally figured out a way to save the key in the USB disk and hand it to B personally. This is a good method and does "solve" the key sending problem. But if you think about it carefully, he can put the files directly on the USB disk to B, so why encrypt them and make it so troublesome? But the most important thing is that if in today's world, people still need to meet to transmit information, wouldn't this be a return to the primitive era?

This is a famous problem: the key exchange problem. It seems that there is no way to solve this problem.

Until, asymmetric encryption algorithms appeared.

Asymmetric encryption

If there is an algorithm in which the keys used for encryption and decryption are different (ie: asymmetric), can it solve the key exchange problem?

The answer is: yes. B can first send the encryption key to A, then A encrypts the file, then A sends the ciphertext to B, and finally B uses the decryption key to decrypt. The whole process was extremely perfect. Because the encryption key can only be used for encryption, even if the key is intercepted, it will not cause the file to be decrypted.

So we can call the encryption key the "public key" and the decryption key the "private key".

But if you think about it carefully, this mechanism has extremely high requirements on the algorithm:

The public key and the private key must correspond one to one through a certain function.

But it must be extremely difficult to directly obtain the private key through the public key.

The algorithm using prime numbers is the simplest algorithm that meets this requirement. It is based on this mathematical principle: given two large prime numbers, it is easy to find their product; but knowing their product, it is extremely difficult to find out which two prime numbers are multiplied (human beings have not found a quick way , can only be tried one by one). For example, let the computer decompose this large sum: 1522605027922533360535618378132637429718068114961380688657908494580122963258952897654000350692006139=3797522 7936943673922808872755445627854565536638199×40094690950920881030683735292761468389214899724061

Finding the product through prime numbers, any computer can complete it in less than 1 microsecond. But if you break it down, It takes a long time. This is only a composite number of 330 digits (100 decimal digits). The actual large composite number used in the RSA algorithm is as high as 2048 digits. It would take tens of billions of years to complete the decomposition, so it is very safe.

Just imagine, if we regard two large prime numbers as the private key and their product as the public key, as long as there are corresponding encryption and decryption algorithms, the key exchange problem can be solved. The principle of RSA is based on this theory, and the specific implementation is slightly more complicated (the public key and private key are not directly obtained through this method, and need to go through some transformations, which will not be discussed here).

But is there really no problem with such key exchange? No!

Digital signature

There is an extremely important issue that has not yet been resolved, namely: Who can prove that it was B who sent the public key to A? If anyone can impersonate B, then A will be easily fooled. A may use someone else's public key to encrypt it, then send it to B, and be intercepted by someone else during the sending process. As a result, B cannot read the file, and someone with malicious intentions decrypts the file.

It seems that it is necessary to imitate the "signature" that has a long history in life.

Let’s first look at the characteristics of traditional signatures:

Only the signer himself can sign the name (pattern).

Although outsiders cannot imitate, they can check. It's hard to imitate, but easy to check.

Do you think of asymmetric encryption? Isn’t it the characteristic of asymmetric encryption that “forward direction is easy but reverse direction is difficult”? Traditional asymmetric encryption is "public key encryption, private key decryption". If the opposite is true and it becomes "private key encryption, public key decryption", we are surprised to find that this is a perfect match:

Only the signer himself can sign the information by encrypting the information with the private key. .

Outsiders can check by decrypting it with the public key. If the decryption is successful, it is confirmed, otherwise it is falsified.

This is a digital signature.

At this point, there is only the last link left, which is: Who will prove that the public key represents B? It seems that the inspection is meaningful only if an authoritative organization is introduced to tell everyone who each public key represents.

This is the most complex problem in cryptography, prompting people to propose the brilliant ideas of "digital certificates" and "public key infrastructure".

Digital certificates, public key infrastructure

Such authoritative organizations are called CAs (Certificate Authority). Their responsibility is to issue certificates to prove that each public key represents Who are they? The "who" here can be an individual, an organization, or another CA. In this way, a huge "Public Key Infrastructure (PKI)" can be formed.

?Root CA ? |

?-------------------

?| |

2 Level CA Level 2 CA ? | | ---------- ? Personal | | Level 3 CA ? Level 3 CA ? |

? ----------< /p>

? | |

?Personal

Imagine that we let each certificate contain the following content:

Subject (the issuer’s public key and the public key of the issuer)

The issuer uses the private key to digitally sign the subject

In this way, as long as the root CA is determined, a huge chain of trust is established .

Root CA, the corresponding certificate is called the root certificate, which is set artificially by the operating system manufacturer. In Windows, macOS, and Linux, there are dozens to hundreds of root CAs that are trusted by default.

When we develop software, we can also build a root CA by ourselves. The method is to make the public key of the issuer and the public key of the issuer the same. Such a certificate (i.e. "self-signed certificate") will Recognized as a root certificate by the operating system, you then need to tell the operating system to "trust the certificate". But be careful, never send this kind of certificate to a stranger and then ask him to click to trust the certificate, because this is risky for others. After all, we cannot force others to trust.

HTTPS

Finally we can go back to HTTPS. Understanding the details of HTTPS is of little significance to programmers. What is important is that through the above chapters, we can finally believe that the security of HTTPS is technically guaranteed. People can be sure that the website they are visiting is indeed this website and not impersonated by hackers. You can also be confident that all data transmitted is encrypted and cannot be read by outsiders.

How does a CA issue a certificate to a website? First, you generate the private key and public key on your computer, send the public key to the CA, and then the CA verifies whether you are the owner of the website. There are several methods:

Send an email to webmaster@ The domain name.com or postmaster@the domain name.com, if you can receive it, it proves that you are the owner of the website.

Allows you to add a file to a certain directory of the website. If you can, it proves that you are the owner of the website.

Allows you to add an entry to the DNS of your domain name, which if you can, proves that you are the owner of the website.

After verification, the CA will issue the certificate to you.

Performance optimization

Asymmetric encryption is 100 times (software implementation) or 1000 times (hardware implementation) slower than symmetric encryption. If you purely use the above encryption, decryption and signature methods, the performance will be very poor. This is where another tool in cryptography is used: hash.

Hash has many synonyms: fingerprint, digest, hash, checksum, hash, which refers to a mathematical method that uses very little information to make it "equivalent" to a large amount of information. This sounds weird, but think about it, don’t fingerprints have this characteristic? Through a fingerprint, a person can be identified; conversely, through a person, his fingerprints can also be identified. The beauty of this is that the amount of information contained in a fingerprint is far less than the amount of information contained in a single person (but still an astronomical amount, so still no duplication).

For example, we calculate the hash of this text: A hash function is any function that can be used to map data of arbitrary size to data of fixed size. The values ??returned by a hash function are called hash values , hash codes, digests, or simply hashes. One use is a data structure called a hash table, widely used in computer software for rapid data lookup. Hash functions accelerate table or database lookup by detecting duplicated records in a large file. An example is finding similar stretches in DNA sequences. They are also useful in cryptography. A cryptographic hash function allows one to easily verify that some input data maps to a given hash value, but if the input data is unknown, it is deliberately difficult to reconstruct it ( or equivalent alternatives) by knowing the stored hash value. This is used for assuring integrity of transmitted data, and is the building block for HMACs, which provide message authentication.

Using the SHA-256 hash algorithm, we get The resulting hash is: 4da07b60cb90742026d7fd9ece673bfad677422e8261c1cc29ff00d0d6be4b7a

No matter how large the input data is, even if it is 100GB, its SHA-256 value is always 256 bits (64 hexadecimal bits). The Hash algorithm is much faster than asymmetric encryption, so in practical applications, the information is hashed first and then the hash is signed, instead of signing the entire information. This greatly speeds up the process without compromising safety.