This is an old revision of the document!

The OpenPGP Modification Detection Code is Actually Good

I once worked for a company that had a strange and intriguing dilemma. They had a popular Product. Marketing determined that the popularity was due to the fact that the Product lasted significantly longer than competing products. No one in the company had the faintest idea why that was the case. The design did not differ in any obvious way from the design used by the competition. While I was there, an engineering project was initiated with the hope of understanding why the Product was better. I left the company before any definite result. For all I know the mystery still remains.

The situation with the OpenPGP modification detection code (MDC) very much reminds me of the story of the Product. Legend has it that the MDC was created as a kind of an afterthought. It works very well but it is not obvious why it does. I have never seen an inclusive explanation. Here I will attempt to produce such an explanation.

When used for something like email, the messages are authenticated directly with a signature. So the MDC is not relevant in the most common use case. So the MDC is not that important. It would still simplify things and eliminate much pointless discussion if the MDC could in fact be shown as strong. It would eliminate having to go through the more obscure uses of OpenPGP to determine how applicable the MDC was to each.

If you were to encrypt, say, a message text, an attacker would have no way to determine what was said in that message just by looking at the encrypted result. That, after all, is the whole point of encrypting it in the first place. An attacker might still be able to make changes to the text if they can get access to it along the way. If they know what text is encrypted they might, by duplicating, deleting and moving the encrypted text be able to change the meaning of the text. Many encryption systems allow an attacker to selectively flip the bits in the eventual decrypted message. By that I mean that an attacker can change a 0 to a 1 or a 1 to a 0. The attacker doesn't know what the bit is to start with by looking at the encrypted message, but if they know what text is encrypted they can change it to whatever they want by picking the right bits to flip. The MDC is used to detect any sort of malicious changes to OpenPGP encrypted messages and files.

We will start with a simple example in the style of the MDC and then improve it. For the sake of this example we will first assume that some sort of block¹⁾ cipher mode is used that allows any bit to be flipped in the message. The popular counter block cipher mode has that property²⁾. This is our oversimplified MDC:

We create this by hashing the message. Then we append the hash to the end of the message. After that we encrypt everything; message and hash.

Let's consider the easiest situation for the attacker and assume they know the entire message. Then the attacker can hash that known message and will then know what the hash was before encryption. As a result they can modify the hash to any value they want by flipping bits as required. So the attacker can change the message to anything they want without restriction and can change the hash so that their changes would not be detected. If they can somehow arrange to have an appropriate hash encrypted in the message they can chop off the end of the message without detection.

We can make it harder to maliciously reduce the length of the message by adding an explicit message length:

Now the attacker has to come up with a new encrypted length, and they don't have the encryption key.

It might be good to switch to a block encryption mode that is not so inherently easy to modify. The cipher feed back (CFB) mode imposes a penalty on modification in the form of completely unpredictable random garbage in the block after the modified block³⁾:

Now the hash only has to detect the random garbage triggered by the attempt to change the CFB protected data. Since predicting the random garbage would require knowledge of the encryption key, the attacker has no real way to fix either the garbage or the hash. Attempts to change the last block of the message will cause unpredictable damage to the hash itself.

Changing the last block of the hash would not cause any random garbage because there is no place for that random garbage to go. This means that at least a portion of the hash is modifiable. Let's fix that:

We have added some random data before the length. That random data is included in the hash. That means that the attacker can never know the entire message and will not know what the hash is to start with. As a result they will not be able to change it in a rational way by flipping bits.

This brings us to the end of our journey. The previous diagram represents the MDC (some irrelevant detail omitted). If you want to attack the MDC and modify a message without triggering the MDC you will have to deal with all of the following challenges:

Everything is encrypted. You will have to work through the encryption. You don't know the key.
The hash will detect your change directly.
CFB will cause unpredictable damage to the next block which will also be detected by the hash.
A changed length will be detected by the hash and will cause CFB damage. You don't know how to change it because of the encryption.
Even if you can somehow figure out how to make a rational change to the message/file or length you will not know how to change the hash due to the random data.

This might seem clunky and redundant. The same result is achieved in multiple ways. But that doesn't reduce the security; it seems very unlikey that one method could be used to overcome another. It also makes sense in an OpenPGP context. All of this was preexisting in the OpenPGP standard:

The length is the regular OpenPGP packet length.
The CFB block mode is the standard mode used in OpenPGP.
The random data is used by the CFB block mode to prevent similar/identical messages/files from leaking data after encryption.

All that was required to make the MDC was the addition of a single hash. The MDC is actually an example of minimalist and appropriate design.

I am not a professional cryptographer, but the MDC seems pretty secure. No one can say for sure that the MDC is secure. Anyone can prove it is not by demonstrating that they can modify messages/files without tripping the MDC. In the 20 years that the MDC has existed (2022) no one has managed to do this.

The combination of CFB and MDC is effectively authenticated encryption. It detects changes in messages based on the shared secret of the encryption key. There is a definition of authenticated encryption that makes refusal to release suspect data mandatory, but that is not relevant for the sort of offline applications that OpenPGP is used for. There is only one encrypted message/file available when working with an offline system. Eventually someone is going to have to look at a suspect message to try to determine if they are under some sort of attack. Someone might want to try to recover the data in a corrupted file. If you want to define CFB-MDC-NR (NR for No Release) for some situation where that would make sense then feel free to do so; there is nothing intrinsic to CFB-MDC that would prevent you from doing that.

Most authenticated encryption schemes use some sort of counter block encryption mode and as a result depend heavily on the implementation refusing to release data because counter mode is completely modifiable without penalty. In an offline encryption environment where such implementation behaviour can't be guaranteed, the inherent modification deterrence of the CFB mode becomes important. So the MDC is specifically suited to the offline encryption environment in a way that other schemes are not.

The MDC uses the SHA1 method for the hash. Not everyone knows that the discovered weakness in SHA1 is irrelevant to the MDC. I suppose you could redefine it as the “MDC hash” and include the weakness in the specification to prevent unnecessary angst. In general, the MDC is likely to be resistant to weaknesses in the hash due to the fact that the hash is encrypted and randomized by the random data which makes it very hard to mess with.

The MDC is secure and is well suited to the sort of offline encryption that the OpenPGP standard embodies. Proposals to add one or more encrypted authenticated modes and depreciate the MDC don't make sense to me. We would be better off if we simply did nothing.

References

PGP FAN index
Encrypted Messaging index

¹⁾

Things are usually encrypted in chunks (16 bytes long in most cases). The “blocks” referred to here are those chunks. This fact is ignored as much as possible to make the concepts clearer.

²⁾

Counter Block Cipher Mode

³⁾

See the Cipher Feedback article for a more detailed discussion.