The OpenPGP Modification Detection Code is Actually Good
I once worked for a company that had a strange and intriguing dilemma. They had a popular Product. Marketing determined that the popularity was due to the fact that the Product lasted significantly longer than competing products. No one in the company had the faintest idea why that was the case. The design did not differ in any obvious way from the design used by the competition. While I was there, an engineering project was initiated with the hope of understanding why the Product was better. I left the company before any definite result. For all I know the mystery still remains.
The situation with the OpenPGP modification detection code (MDC) very much reminds me of the story of the Product. Legend has it that the MDC was created as a kind of an afterthought1). It works very well but it is not obvious why it does. I have never seen an inclusive explanation. Here I will attempt to produce such an explanation.
When OpenPGP is used for something like email, the messages are authenticated directly with a signature. So the MDC is not relevant in the most common use case. So the MDC is not that important. It would still simplify things and eliminate much pointless discussion if the MDC could in fact be shown as strong. It would eliminate having to go through the more obscure uses of OpenPGP to determine how applicable the MDC was to each.
If you were to encrypt, say, a message text, an attacker would have no way to determine what was said in that message just by looking at the encrypted result. That, after all, is the whole point of encrypting it in the first place. An attacker might still be able to make changes to the text if they can get access to it along the way. If they know what text is encrypted they might, by duplicating, deleting and moving the encrypted text be able to change the meaning of the text. Many encryption systems allow an attacker to selectively flip the bits in the eventual decrypted message. By that I mean that an attacker can change a 0 to a 1 or a 1 to a 0. The attacker doesn't know what the bit is to start with by looking at the encrypted message, but if they know what text is encrypted they can change it to whatever they want by picking the right bits to flip. The MDC is used to detect any sort of malicious changes to OpenPGP encrypted messages and files.
We will start with a simple example in the style of the MDC and then improve it. For the sake of this example we will first assume that some sort of block2) cipher mode is used that allows any bit to be flipped in the message. The popular counter block cipher mode has that property3). This is our oversimplified MDC:
We create this by hashing the message. Then we append the hash to the end of the message. After that we encrypt everything; message and hash. To check for modification we hash the message and compare that hash to the hash appended to the message.
Let's consider the easiest situation for the attacker and assume they know the entire message. Then the attacker can hash that known message and will then know what the hash was before encryption. As a result they can modify the hash to any value they want by flipping bits as required. So the attacker can change the message to anything they want without restriction and can change the hash so that their changes would not be detected. If their target message is shorter than the original they can just generate the hash early and drop the extra part. So this is not entirely secure.
It might be good to switch to a block encryption mode that is not so inherently easy to modify. The cipher feed back (CFB) block mode imposes a penalty on modification in the form of completely unpredictable random garbage in the block after the modified block4):
Now the hash only has to detect the random garbage triggered by the attempt to change the CFB protected data. Since predicting the random garbage would require knowledge of the encryption key, the attacker has no real way to fix either the garbage or the hash. Attempts to change the last block of the message will cause unpredictable damage to the hash itself. So the modification detection code and the cipher feedback block mode work together.
Changing the last block of the hash would not cause any random garbage because there is no place for that random garbage to go. This means that at least a portion of the hash is modifiable. Let's fix that:
We have added some random data to the start of the message. The random data prefix is included in the hash. That means that the attacker can never know the entire message and as a result will not know what the hash is to start with. As a result they will not be able to change the hash in a rational way by flipping bits. So the hash is protected by first randomizing it and then encrypting it.
There are some mostly theoretical attacks that involve getting the victim to encrypt messages created by the attacker so that the attacker then can modify them by chopping off the start and/or the end of the message without detection. The version of cipher feedback used by OpenPGP5) (OCFB) prevents that sort of attack by preventing attacker knowledge of the random prefix data and requiring the key to create a new random prefix. This is the OCFB-MDC (OpenPGP Cipher FeedBack - Modification Detection Code) mode used by OpenPGP (irrelevant detail omitted):
Now both the hash and the random data are protected by first randomizing them and then encrypting them.
If you want to attack OCFB-MDC and modify a message without triggering the MDC you will have to deal with the following challenges:
- Everything is encrypted. You will have to work through the encryption. You don't know the key.
- The hash will detect your change directly.
- OCFB will cause unpredictable damage to the next block which will also be detected by the hash.
- Even if you can somehow figure out how to make a rational change to the message/file you will not know how to change the hash due to the random data prefix.
- The random data prefix is very well protected by the OpenPGP version of cipher feedback (OCFB).
This might seem inelegant but it makes complete sense in the OpenPGP context. This was preexisting in the OpenPGP standard:
- The OCFB block mode is the standard mode used in OpenPGP.
- The random data is used by the OCFB block mode to prevent similar/identical messages/files from leaking data after encryption.
All that was required to make the MDC was the addition of a single hash. The MDC is actually an example of minimalist and appropriate design.
I am not a professional cryptographer, but the MDC seems pretty secure. No one can say for sure that the MDC is completely secure. Anyone can prove it is not by demonstrating that they can modify messages/files without tripping the MDC. In the 20 years that the MDC has existed (2022) no one has managed to do this. I doubt that was because of a lack of effort. OpenPGP gets a fair bit of academic scrutiny.
The combination of OCFB and MDC is effectively authenticated encryption. It detects changes in messages based on the shared secret of the encryption key. There is a definition of authenticated encryption that makes refusal to release suspect data mandatory, but that is not relevant for the sort of offline applications that OpenPGP is used for. There is only one encrypted message/file available when working with an offline system. Eventually someone is going to have to look at a suspect message to try to determine if they are under some sort of attack. Someone might want to try to recover the data in a corrupted file. If you want to define OCFB-MDC-NR (NR for No Release) for some situation where that would make sense then feel free to do so; there is nothing intrinsic to OCFB-MDC that would prevent you from doing that.
Most authenticated encryption schemes use some sort of counter block encryption mode and as a result depend heavily on the implementation refusing to release suspect data because counter mode is completely modifiable without penalty. In an offline encryption environment where such implementation behaviour can't be guaranteed, the inherent modification deterrence of the OCFB mode becomes important. So the MDC is specifically suited to the offline encryption environment in a way that other schemes are not.
The MDC uses the SHA1 method for the hash. Not everyone knows that the discovered weakness in SHA1 is irrelevant to the MDC. I suppose you could redefine it as the “MDC hash” and specify that it only needs to be irreversible to prevent unnecessary angst. In general, the MDC is likely to be resistant to weaknesses in the hash due to the fact that the stored hash is encrypted and randomized by the random data which makes it very hard to mess with.
The MDC is secure and is well suited to the sort of offline encryption that the OpenPGP standard embodies. Proposals to add one or more encrypted authenticated modes and depreciate the MDC don't make sense to me. We would be better off if we simply did nothing.
A Less Intuitive, More Technical Explanation
OCFB-MDC is a case of hash then encrypt. The cipher block mode is the modified version of cipher feedback used by OpenPGP (OCFB). The modification is the addition of a prefix block consisting of random data. The traditional CFB initialization vector (IV) is replaced by the encryption of a block of zeros. This serves to prevent an attacker from being able to get access to either the IV or the plaintext value of the random data prefix block.
The modification detection code (MDC) is a SHA1 hash of the random data prefix block and the plaintext message. The inclusion of the random data makes the MDC unpredictable and prevents known plaintext based modification.
OCFB-MDC is immune to the classic attacks against hash then encrypt that involve getting the victim to encrypt an attack message that is later truncated to produce a second valid message.