Blog

Digital Signatures, HMACs and Hashing

11/08/2018

Recently, I have been plagued by the differences between a hashes, HMAC's and digital signatures. More importantly, I was asked about the difference between a hashing function and an HMAC during an interview, but was unable to answer this question. So, this post will explain hashing, HMAC's and digital signatures along with the differences between them. Hopefully, when asked about these during an interview, you will answer correctly!

Background

There are a few key terms that should be known before reading this article. I will briefly explain these.

Symmetric vs Asymmetric Encryption

Symmetric Encryption
Asymmetric Encryption
Both of these images are from cheapsslshop.com...

Symmetric encryption takes a value, known as m and encrypts it with the key, K . To decrypt it, use K to go back to the original value. A basic example of a symmetric encryption cipher would be the Cesar Cipher.

Asymmetric encryption (otherwise known as Public Key Cryptography) has two different keys: a public key and private key. In order to encrypt the data, use the public key. To decrypt it, use the private key. An example of this is the RSA algorithm.

Both symmetric and asymmetric serve their role and have their issues. In the early days of cryptography, their common problem: how to share the key? This plagued people for generations! The key cannot go alongside the message...The key also cannot be sent in a different letter, as it can still be intercepted and read...This is where public key cryptography comes into play. Because the public key is known and available, then encrypt the data with the public (known) key. If the a malicious adversary tries to intercept the message it will not matter, as only the receiver will have the private key to decrypt the message. The key exchange issue was solved with the creation of public key cryptography.

Even though public key cryptography solved the key exchange problem, symmetric ciphers are still used today. Most asymmetric encryption algorithms are extremely expensive to compute. In the example of RSA, the encryption scheme looks like messagee (mod n). The issue comes with the messagee, where e is an extremely large value. So, encrypting all data using Public Key Cryptography is not a viable option. Because of this, symmetric key ciphers are still used to encrypt message information. Practically speaking, asymmetric encryption is used in order to exchange keys, while symmetric algorithms use the transferred key in order to encrypt the data. Now, that receiver knows the key, there is a very small chance for an adversary to be able to read the message.

Public Key Infrastructure (PKI)

The system that provides the Public Key Encryption keys and digital signatures for an entity is the PKI. The purpose of the PKI is to maintain and manage the keys/certificates that are in the system. By doing this, a trustworthy networking environment is created for using the cryptographic systems. To sum it up, the PKI ensures that the entity saying they own the public key, to encrypt the data to send to the other party, is actually the entity they claim they are. The most well known of this is implemented within the browser ecosystem, known as TLS (Transport Layer Security).

Hashing, HMAC and Digital Signatures

In order to have a consistent way to discuss the differences between the three methods, I will use the example of transferring a file throughout the post.

Hashing

Hashing
Hashing

In general, hashing is a way to map an arbitrary sized data value to a fixed size (as shown in the image). In terms of security, a hash function allows someone to easily verify that two values are the same, without actually knowing the value itself. Typically, cryptographic hash functions are known as 'one-way' functions. This means that a value is easy to hash, but near impossible to go from the hash to the original value. Another interesting property of hash functions can be seen in the picture. If a single bit is changed, then it dramatically changes the result of the hash function. A few examples of cryptographic hashing functions are SHA2, MD5 and SWIFFT.

Hashing is used for ensuring the integrity of the data. Or, they can be used to store passwords securely with a few other modifications (salts). As this allows for the password to easily be verified against yet near impossible to get the original password back(assuming a strong password).

In the example stated above, the sender would send the file, alongside the hash of the file. The receiver would then use the same hashing function (as the sender) on the file. If these two hashes are the same, then the file has not been tampered with.


HMAC

A message authentication code (MAC), is a small value used to authenticate a message, meaning to confirm that the message came from the actual sender. Now, an HMAC (Hashed Authentication Message Code) is a message authentication code that uses a cryptographic key alongside a hash function to determine the integrity and authenticity of the message.

In the example, the sender and the receiver would both share a private key for a symmetric encryption algorithm. Then, the sender would hash the file with a strong cryptographic hash function. From there, the hash of the file is encrypted with the private key. The receiver can then decrypt the hashed value with the shared key to have V1. The receiver then will use the same hashing function (as the sender) on the file to get V2. If V1 and V2 are the same, then we know that the file came from the correct entity and that the file has not been tampered with because of the usage of a secret key. HMAC's are used to determine both the integrity and authenticity of the file.

Digital Signatures

Diagram of a Digital Signature
Diagram of a digital signature (from Docusign)

A digital signature is a scheme for presenting the authenticity of a digital message or document. Digital signatures use asymmetric cryptographic schemes.

Within the realm of digital signatures, a PKI (Public Key Infrastructure) is needed in order to ensure that the public key belongs to the correct entity. For example, Amazon has a public key that is used to encrypt data being sent to them. However, what stops a malicious actor from giving a victim user a public key, claiming that it is Amazons? If this happened, then the malicious actor can read the message sent to Amazon. This is the reason for the PKI system.

In terms of the example, the sender needs to have a asymmetric encryption scheme that allows for digital signatures, such as RSA. Once the keys are created, then the fun begins! To start with, the sender hashes the file. This is where the HMAC and digital signatures differ: the sender uses their private key (that only they know) to sign the message, instead of a symmetric key. Now, the receiver of the message will use the public key to decrypt the hash, V1. The receiver then hashes the file themselves, to get V2. If V1 and V2 are the same, then we know a couple of things: the file has not been tampered with and the sender is truthful.This demonstrates integrity, authenticity and non-repudiation for the file. More on these three terms below...



Differences

Even though the differences may seem very subtle, they are quite substantial. There are three main parts to this: integrity, authenticity and non-repudiation. The definitions about these three concepts came from here.

 
Security Goal Hash HMAC Digital Signature
Integrity
Authenticity
Non-repudiation

Integrity

Integrity is defined as "Can the recipient be confident that the message has not been modified?" All three of the methods described above ensure the integrity of the file because if the file is altered, then the hash function will give a different value in return. The integrity problem is the easiest part to solve.

Authenticity

Authenticity is defined as "Can the recipient be confident that the message originates from the sender?" A hash function can show that the file has not been tampered with. However, nothing prevents a malicious middle man from intercepting the message and hash to replace both the message and hash in the original message with their own. Because of this, just using a hash function lacks authenticity. Both the HMAC and digital signature provide this protection because of the keys used to encrypt the hash are unknown to the malicious adversary. Both the HMAC and digital signature prove that the message came from the expected sender.

Non-Repudiation

We define non-repudiation as "If the recipient passes the message and the proof to a third party, can the third party be confident that the message originates from the sender?" The only one of the three that can actually prove this is the digital signature. The reason for this is because an HMAC function uses a symmetric key, while a digital signature has an asymmetric key. With an HMAC function, the recipient of the message could alter the message however they please, because they have the key to do so. The recipient could then write whatever they want as the sender. However, with a Public Key Cryptography system, only the sender has the private key. In practice, this means that only the sender can encrypt the message! So, because the recipient cannot encrypt the message, allowing for the non-repudiation factor of this system. Wow, public key cryptography is amazing!

Issues

Digital Signature Size

One known issue with public key cryptography is how large the keys tend to be. Unlike MAC's and Hash functions (which provide a fixed length value) a digital signature is quite large and has a variant in size. Further, because of the major size, calculating a digital signature is incredibly slow.

Scaling

In order to be a viable option algorithms must scale for a wide adoption. In the case of Public Key Cryptography, it works quite well. Only a single public and private key are needed for the task, then any sender can use the public key to encrypt the hash. A PKI system takes care of exchanging public keys for the data to be encrypted. However, HMAC functions do not scale well in the PKI example because symmetric keys must be known by both parties. This means that every sender/receiver pair must have their own private key. The difference is that Public Key Cryptography has two keys (public and private) while HMAC must use n number of keys, where n is the number of receipts.

Replay

This data protection algorithms work extremely well. However, they all fail on one problem: replay. The replay attack is defined as "a malicious adversary sending the same message, found by passive listening on original messages." However, the fix for this uses timestamps and nonces (number used once) to ensure the message is the request that the user intended to sent. So, if the message is attempted to be sent again, the receiver will reject the message for being used previously.

Conclusion

I would like to give a major S/O to my amazing cryptography Professor Paul De Palma (Gonzaga U) for his incredible job at teaching me the basics of cryptography! With the knowledge simply learned from his class, I was able to do extremely well on a set of cryptography based interview problems.

Digital signatures, hashes and HMACs all have great use cases, if used in the proper setting. A practical example is within TLS (Transport Layer Security in the browser). Entity (such as a company) authentication is performed using a digital signature. However, the messages themselves are protected by using a fast MAC function. Hope you enjoyed! Cheers from Maxwell Dulin (ꓘ).