So… I have spent significant time writing PGP advocacy articles. The process involves digging into an old system in an attempt to figure out how it works. I apparently do things like that now. As my knowledge grew the articles evolved from spin to solid argument. It turned out that there was a lot of wisdom embedded in OpenPGP. I have recently discovered that the encryption framework used in OpenPGP (OCFB-MDC) is good and appropriate for the application. I am also aware that there has been at least one proposal in the past and a present proposal (2022) to replace it with something else. I don't think there are any good reasons to do that … and what if I am right? It seems very possible to me that we have collectively forgotten (or never understood in the first place) why OCFB-MDC works. That suggests that the status quo should have a champion. That creates an obligation on my part. Hence this editorial.
OCFB stands for OpenPGP Cipher Feed Back. It is OpenPGP's improved version of the traditional cipher feedback block encryption mode5). It increases the difficulty of modification of messages/files by malicious entities. It is like a vault in a bank that increases the difficulty of modifying the location of the money. MDC stands for Modification Detection Code. It is the means to detect modification of the encrypted data. It is like the alarm on the bank vault. The two things work together and are should only used together. The combination should be considered a single OCFB-MDC mode and that is what I will be calling it here6). The OCFB-MDC mode is used when anything is encrypted in OpenPGP format. It has existed for over two decades (2022). So it is a very well established part of the standard. So we should expect that proposals to supersede it would come with strong justification for the change. This article takes the form of a search for such justification. On the journey I hope to show that there is nothing wrong with OCFB-MDC and that it is suited for OpenPGP in ways that typical proposals are not. Unfortunately, I have not been able to find any inclusive change rationale so this will mostly be based on what I think the arguments are as gleaned from discussions and blog posts.
We should be clear at the start about the cost of of replacing OCFB-MDC with something else. That would cause hard to understand interoperability problems whenever the OpenPGP preference system fails to work7) or is not applicable8). Some implementations will simply never implement the replacement. We are still occasionally seeing interoperability problems caused by the use of OCFB (the thing before OCFB-MDC) two decades after it was superseded by OCFB-MDC. By the very nature of the sort of problem OpenPGP is intended to solve, there can be no central authority to drive change. Change takes a long time and it isn't possible to get rid of the old thing until everyone is done with their archive of encrypted email and/or files. So this is really about an addition to everything that is already there, not a replacement.
The following articles serve as references to this one. The reader is warned that they may contain PGP advocacy…
This discussion assumes a normal OpenPGP usage scope of encrypting messages (e.g. email) and files. This is an important qualification. Any tool can be shown to be deficient simply by expanding the use scope of that tool. My hammer does a terrible job of driving screws, but that does not mean it is broken…
Please forgive the length. Since I have gone to all the trouble to have positions on all these issues I can't not include them in an editorial where they are directly related.
It is often implied that OCFB-MDC is insecure in some way but there doesn't seem to be any actual evidence for this. Here is a collection of the most definite implications that I have stumbled across…
OCFB-MDC hashes the message/file first, then appends the hash and then encrypts the result. So it counts as a case of “hash then encrypt”.
These are the circumstances of the attack that invariably comes up when hash then encrypt with cipher feedback (CFB) is discussed:
So, yes, hash then encrypt is insecure, but very possibly only in cases where the attacker can get active cooperation from their victim and only for an attack that few would care about. So it is entirely possible that hash then encrypt is more than secure enough for practical purposes.
This is moot for the OpenPGP case. OpenPGP uses an improved form of CFB (OCFB) that is immune to this sort of attack. OCFB-MDC randomizes and then encrypts both the initial value (IV) and hash value. This prevents attacker access to either value. So the generic insecurity, such as it is, is not an argument that can be used to support a replacement of the OCFB-MDC mode in OpenPGP.
We should not fail to recognize that hash then encrypt has a significant advantage over other methods in that it protects the integrity check with the encryption. An attacker has to work through the encryption to attack the check.
Chances are that no one has actually said exactly this but some arguments effectively come out to this.
First, OpenPGP, the standard, has authenticated encryption in the form of OCFB-MDC. One might not like typical implementation behaviour but we are discussing the standard here. Second, we don't have to go any further than OpenPGP messaging usage for an example of a case where authenticated encryption is not needed (in the way it is meant here). The authentication is done directly on the message using a signature. The encryption is irrelevant. It might not even exist.
According to a partially remembered email list post, the MDC only requires a hash that is non-reversible. That seems quite reasonable to me. So the currently known SHA1 collision weakness does not affect the security of OCFB-MDC in any way. I understand that the mere presence of something called SHA1 can cause problems in some situations, but such restrictions are not rational. If this is actually a problem then I suggest that the SHA1 used for OCFB-MDC be renamed to something like “The MDC Hash” and be respecified to only provide irreversability.
Because OCFB-MDC protects the output of the hash by randomizing and then encrypting it, it seems reasonable to think that it is more resistant to weaknesses in the hash than the commonly used schemes that expose the output of the hash to the attacker. So replacing OCFB-MDC with such a scheme might increase the probability of trouble down the road. If we were to take a stand against harmful changes prompted by broken policies covering the use of certain cryptographic functions, we couldn't really find a better place to take that stand than with SHA1 usage in OCFB-MDC. It isn't that we got lucky somehow here, the collision resistance of the hash used in OCFB-MDC is simply irrelevant.
This came from a blog post called The PGP Problem. It turned out to be a misunderstanding caused by an incorrect specification. The author of the blog post misunderstood the discussion. There is no such weakness.
The previously mentioned blog post strongly implied that the MDC can be easily stripped. This is not true, stripping the MDC is actually hard and has an extremely low chance of success9).
Let's put that aside for now and assume that someone has discovered a practical MDC stripping attack. Then the argument would be that a much different format would be impossible to downgrade so the sender of a message would be assured that someone will not downgrade their integrity protected message to a non-integrity protected message.
After a, probably extensive, transition period, OCFB-MDC will be rare. Thus receiving implementations can accept non-integrity checked encrypted messages (OCFB) with a low probability that an attacker had downgraded it from OCFB-MDC. Wait; that doesn't sound right. Why were we doing this again? Presumably not to allow the omission of the integrity check.
If the receiver requires the MDC of the integrity checked version of the message then it can't accept the non-integrity checked version no matter where it comes from. It makes no real difference in practice if it has been downgraded.
The rest of this article will assume that OCFB-MDC is cryptographically secure. If you, the reader, disagree then please demonstrate a modification not caught by the MDC check and save us all a lot of time and trouble. Some theoretical preference for one method over another would of been good 20 years ago but at this point we would have to break it to discredit it. This is a very good conclusion. It means that the OCFB-MDC message format does not have to change or be replaced. The very best outcome of an attempt to change a long established standard is the realization that the standard is fine as it is.
So most of the rest of this will be about the idea that OpenPGP should not release data that is suspected of being maliciously modified. One could take that approach with OCFB-MDC by refusing to release data on an invalid MDC check, but I will try to show that that is something that should not be done as a matter of policy.
EFAIL was a mostly impractical vulnerability in email clients that involved leaking data by modifying encrypted HTML emails. It seems to be behind recent initiatives to replace OCFB-MDC. The reasoning seems to go something like this:
Without context this would seem odd. Any data leak can be prevented if something in the chain before the leak refuses to release the data. Why single out the cryptography code? Context to follow…
In the beginning all encryption was offline. You would encrypt something and send it off somewhere, possibly on paper. Eventually someone with the key would get access to the encrypted material and decrypt it. OpenPGP was designed to cover such cases as it was originally intended to be used to protect email which was in turn modelled off the offline medium of paper mail. OpenPGP represents an offline, non-connection oriented medium, even though it is typically used on networks.
Then came the age of networked computers. Now we wanted to encrypt not just messages but connections. Instead of just firing an encrypted message off into the void with the hope it would find its way to a person, now we needed to establish and maintain these encrypted connections between machines. This new medium led to some significant differences in the environment:
These differences made it so that various sorts of oracle attacks were now practical. Briefly, an oracle attack is where a system is tested in some way, usually repetitively, and the response is analyzed. The result can be information that the attacker is not supposed to have, such as unencrypted data or key material. These oracle attacks became a serious problem.
There were several principles/approaches developed in the war on oracle attacks. Here are two that are relevant here:
The second principle follows from the first; at the level of the cryptography the only thing that can be done on suspected modification in the absence of any application level context is nothing. We might not even try to decrypt the data so obviously we can't release it. We need to be able to check that the data is unmodified first so that something like “hash then encrypt” as used in OCFB-MDC would not work consistently for this approach; that would involve decrypting the data first. This might seem a little ham fisted but it comes from a sense of desperation. It turns out that it is virtually impossible to predict what an attacker might use for an oracle attack. The title of the well known “Cryptographic Doom Principle” is an accurate expression of the feeling of dread engendered by the oracle issue.
OpenPGP implementations are normally immune to oracle attacks, not because of any intrinsic properties but because those implementations are used in a non-connection medium. Messages are sent off into the void with no automated response possible. Files are encrypted and saved to a storage device. EFAIL was significant because it was a kind of oracle attack where the encrypted email message was modified in such a way that the HTML interpreter could be tricked into leaking it. That brings us to the actual argument: since OpenPGP has been shown to be susceptible to an oracle attack, then the principles that evolved to combat such attacks must apply. Therefore:
EFAIL uses encrypted email as the practical example. We will as well. So the fix for EFAIL attacks against encrypted email comes out to having the cryptographic code detect a possible modification and then failing with an error while not releasing any of the suspect data. That after all is the fix used for this sort of issue for online, connected media like TLS. A simple and thus appealing approach.
The email client application code receives this error instead of the decrypted message. Now what? What do we tell the user?
A miserable user experience. If I am told that I am under some sort of attack I will definitely need to see the attack message. Denying a machine the decrypted text is one thing. Denying an actual person the decrypted text is quite another. An offline non-connected medium is quite different from an online connected medium. Solutions that work for one will not always work for the other.
What might an approach for preventing EFAIL type attacks look like with a focus on usability? The MDC integrity check (or equivalent) is normally not used for email. So it would be good if we could continue to keep it out of the user's conceptional model. The user already knows about email signatures, so why not try an approach based on, you know, authenticity? Signatures are how authenticity is actually verified in encrypted email. EFAIL (and any other malicious modification) breaks authenticity so it breaks the signature. In encrypted email terms that means that any modified message will show up as encrypted but unsigned/unauthenticated/anonomous. That's fairly suspicious. Some email clients don't even allow the purposeful sending of such messages. Most strongly discourage the sending of such messages.
So why not treat this suspicious message with suspicion? If it is plain text then feed it into a very simple Unicode interpreter that first sanitizes the text and then only outputs straightforward glyphs in a way that can't deceive the user. If it is HTML then feed it into a very simple and straightforward HTML interpreter that merely picks out the text and sends that text to the previously mentioned Unicode interpreter. The user would be only have access to the sanitized text. If they attempted to forward the suspicious message the user could be discouraged from doing that10). Any attachments would be tagged as suspicious so the user would be prevented from directly accessing them from the email client and prevented/discouraged from saving them to disk.
The user would see the plain text of the message with a explanatory message. The message would be something like “This suspicious unsigned/anonymous message is being displayed in safe mode”. That nicely ties into the concept of email signatures. The idea that an anonymous message should be treated as suspicious comes from general cultural context. So this is pretty good for usability.
OK, let's compare the effectiveness of the two proposals. First, refusing to release data that might of been maliciously modified…
This approach prevents attacks based on modification of an encrypted message, and only such attacks. So basically EFAIL. It can't detect/prevent direct attacks on, say, the HTML interpreter because such attacks do not require modification of an encrypted message. Since we are dealing with anonymous/unsigned messages here an attacker can directly create all the attack messages they want without going through the bother of modifying an existing encrypted message. The refusal to release data scheme would happily pass these along to the HTML interpreter.
Second, my proposal based on the suspicion level of the encryption/signature status…
In practice the suspicion based approach will be as secure against EFAIL as withholding suspect data. That is because all the safe mode needs to accomplish is to avoid leaking decrypted message data. That is very easy to reliably accomplish.
The suspicion based approach can help with other classes of attack as it adds in the authentication status of the message to the integrity status. So we can treat messages from known correspondents differently than from anonymous entities. It is common that email systems filter incoming mail for the following sorts of attack:
If the incoming email is encrypted, then the email system can not filter it. This fact is sometimes used as an argument against encrypted email, particularly in large organizations with definite security policies. The suspicion based approach would mostly prevent the possibility of attack from anonymous entities while allowing normal email usage for messages from known correspondents. A corporate IT department might find encrypted email more acceptable if email was handled by clients following such a policy.
Since we are ignoring the MDC with the suspicion based approach, we can happily accept non-integrity checked OCFB (the thing before OCFB-MDC) messages as non-suspicious if they are signed. Not having to deal with such messages in a special way is a huge usability win for a user that actually gets one of these messages and can remain blissfully unaware of the fact.
I am not claiming here that my proposed suspicion based approach to an EFAIL fix is the best approach. I am not even claiming it is a good approach. What I am claiming is that is a perfectly valid approach. If I as an application programmer in the middle of implementing such a approach discovered that the encryption code was going to blow up with an error on a failed integrity check I would not be at all happy while having to deal with this pointless error condition that I would have to explain to the user somehow. My position would likely be something to the effect that the OpenPGP standard was attempting to be helpful in a harmful way.
Preventing EFAIL by refusing to release data is a theoretically perfect fix for a very minor issue that causes larger practical problems. This seems to be an example of inappropriately applying wisdom learned from a newer medium to an older medium.
There is nothing from EFAIL that supports throwing out OCFB-MDC. Considering all the difficulty that the OCFB inherent modification deterrence caused the EFAIL researchers EFAIL supports the retention of the OCFB part of OCFB-MDC. There are some proposals that effectively replace OCFB with a mode that has no real modification deterrence at all, possibly under the theory that messages/files will always be withheld on an integrity error so that it makes no difference. Under the bank vault analogy that would be like replacing the vault with an old wooden shed and installing an alarm in it. After all, both the shed and the vault can be broken into. The alarm makes both secure.
Oh, and the MDC was completely effective against EFAIL. The researchers were not able to overcome the MDC. They were reduced to listing the email clients where the possibility of using the MDC to prevent EFAIL existed.
So what about the business of “hash then encrypt”? No actions performed by OCFB-MDC on the reception of a message contributed to the EFAIL data leak in any way. That happened in the following HTML interpreter. So EFAIL says nothing about this.
Again, no one actually said this, but it sure was implied at times. This is just to suggest that there might be a political aspect to the proposals to supersede OCFB-MDC.
The GnuPG project refused to get involved with EFAIL; they just checked, found out there was nothing they could do, and mostly ignored the whole thing. For that entirely reasonable response they were chastised. Perhaps we have to issue a strongly worded press release whenever anything significant happens, in the style of modern day corporations. Silence apparently is a sign of guilt even in the absence of any evidence. Perhaps this editorial can serve as such a rebuttal to the claims of OCFB-MDC inferiority.
There are a couple of relevant references to attacks against OCFB-MDC in the the security section of RFC-4880:
The first is the same sort of thing as EFAIL, although it predates EFAIL by 16 years11). Instead of tricking an HTML interpreter into sending the decrypted message to the attacker as in EFAIL, this attack instead tricks the user into sending the decrypted message back to the attacker. That message was disguised by flipping random bits to prevent the user from recognizing the message. The idea was that the user might return the disguised message to the sender with a query. This might seem like a trivial attack, but the researchers had to work around the inherent modification deterrence of OCFB which added significant difficulty. The existence of this attack does not support the replacement of OCFB-MDC for the same reason as in the EFAIL case: there is nothing that a replacement scheme could do that could not be done with OCFB-MDC.
The original researchers were not able (or didn't bother) to make the attack practical but there is no reason to think that this would be impossible or even all that hard. In the 20 years (2022) this attack has existed there seem to have been no attempts at actual use. We would very likely know that such attack was attempted because in an offline non-connection medium like email the recipient always gets the attack message. We would know that it was an actual attack and not just random corruption of the message because of the work required to defeat OCFB. This is a point. The deterrence against modification provided by OCFB does not end when an attacker works around it. That work then provides a sort of witness function that proves an attack. What attacker wants to send their victim incontrovertible evidence that they are under attack on the first attempt? Particularly in cases where that first attack was likely to fail?
OCFB-MDC has a convenience feature that allows an implementation to quickly check if a key submitted for decryption is correct for the encrypted material it is to be applied to. If an appropriate oracle system is set up, this check can be used repeatedly to eventually determine a couple of bytes of an encrypted block12). As mentioned in the paper itself, this attack is not applicable to OpenPGP used within its normal scope, but there are still some points of interest here…
This attack works on an error message oracle and probably would work as a timing oracle. This means that withholding suspect data would not prevent this attack. This is a good reminder that withholding suspect data is not some sort of panacea. There are many ways to leak data for use as an oracle.
Even if we accept that such an oracle is possible with some sort of usage wildly out of the scope of normal OpenPGP use then there is still no need to replace OCFB-MDC. To destroy the oracle all that is needed is to ignore the quick check. The actual message format would not change and interoperability would remain complete.
If we are wedded to the idea that OpenPGP should not release data that is suspected of being maliciously modified, then we probably support the idea that this should apply to large files encrypted in OpenPGP format. That quickly runs into trouble…
While not releasing the data, where do we keep it? Since this is decrypted from encrypted data it has to be a secure place to insure an attacker doesn't get access to it and/or modify it. Since files can be arbitrarily large this storage location has to be arbitrarily large. So we have basically created a requirement for infinite secure storage.
So we decide to break the data into blocks while encrypting it and then we do whatever we are doing for an integrity check for each block. Now we have two problems:
We have to decide on a block size. That will depend on how much storage the entity decrypting the material can dedicate to the task of not releasing data. This has created a new requirement for that amount of storage that now goes with OpenPGP usage. If we make it really small then we increase the size of the message/file due to the overhead of the integrity check. If we make it really big then we prevent embedded/small systems from decrypting OpenPGP messages/files.
Second, we have increased the complexity of the structure of all OpenPGP encrypted messages/files. Currently there are two levels of blocking used in OpenPGP: The fixed length block encryption structure and the variable length blocking provided by the packet structure. This would add another level of blocking for a total of three. That provides attackers another set of block boundaries to work against and greatly increases the number of ways program logic can get things wrong. This would provide attackers with many encryption cycles to work against instead of just one. In general, greater complexity leads to less security and reliability.
Libraries will now have to present a different interface that takes this blocking into account. Applications will have to be rewritten.
Integrity check blocking means that an attacker can trivially modify an encrypted file simply by causing a bit error in the block before the block that will become the last one. This seems kind of ironic considering what this is ultimately all about.
It is not all that clear that blowing up with an error and not releasing suspect data is the optimal behaviour for suspected modification of large files. The vast majority of such errors are going to be caused by bad data that evolved naturally in the large file. A user might want to attempt recovery and so will have to be given access to the suspect data much like in the EFAIL case. I am not sure that many actual users were consulted here, this came out of nowhere. I strongly suspect that it entirely came out of the idea that OpenPGP should not release data that is suspected of malicious modification and is based on no practical need at all.
Since the code/libraries available for these proposed modes practically can't release unverified data due to how they do things, they will come with a maximum message size specification. So, in practice, some sort of blocking scheme for long files is mandatory. So this can be seen as an example of an awkward workaround caused by the inappropriate application of online connection oriented encryption modes to offline non-connection oriented media.
The fix requires zero effort. Just leave things as they are. The blocking issue serves as a serious argument against the replacement of OCFB-MDC.
OCFB-MDC is cryptographically secure in that a protected message can not be modified without detection. It provides the capability of the sort of thing normally referred to as authenticated encryption.
OCFB-MDC seems to work well in practice against modification even if the detection is ignored. I believe that is because:
The properties of the offline non-connection medium are helpful:
As a result, attacks that involve modification of the encrypted data likely have a higher cost than the potential benefit.
EFAIL was a successful attack against this traditional inherent modification deterrence for the case where the modification detection is ignored. The victim still ends up with incontrovertible proof of an attack, so the success here is partial.
As demonstrated, EFAIL can be solved in a valid way that is arguably superior to the approach of having the cryptography code refuse to release data.
The MDC seems to be able to reliably detect modification and can be used where required to reduce the probability of successful attacks involving such modification. Yes, it could be used as part of some policy that involves not releasing suspect data but I believe that such a policy used in an offline non-connection medium would ultimately come at the cost of usability. At any rate, there is no good reason to replace OCFB-MDC with something else and multiple reasons why that would be a bad idea.
The problems caused by these harmful proposals are not off in the future. They are causing trouble today even though none have made it into the OpenPGP standard (2022). I have personally been involved in an obscure and hard to fix interoperability problem caused by one of these proposed encryption modes that had somehow leaked out of an implementation. There was no evidence that the user had explicitly authorized the use of the mode. The GnuPG implementation actually had a possible remote code execution exploit13) caused by the implementation of some of the OCFB-MDC replacement candidates that existed at the time. The issue was quickly rectified but I could not ask for a better example of the risk associated with implementing new functionality.
The Snowden leak 14) had documents that suggested PGP encrypted material was on a short list of things that the NSA15) could not get access to. That likely was for passive eavesdropping but this still suggests that it might not be a good idea to tinker with the fundamental cryptography of OpenPGP without good reason.
Long existing standards should not be modified in incompatible ways unless things are fairly dire. We should remember that the “extend then extinguish” principle often used to degrade or destroy standards still applies, even if the intentions behind the extend part are good. Changes made without strong justification are particularly likely to cause trouble in an instance like OpenPGP where there is no strong single entity to enforce those changes. Long existing standards are not important because of their technical superiority. They are important simply because they are long existing standards. Even if there were some sort of weakness in OCFB-MDC the first approach should involve some sort of interoperable workaround and not complete replacement. But here we see attempts to replace a subsystem that is working well. None of the issues discussed here have caused an actual user of OpenPGP embarrassment or even inconvenience over the last 20 years (2022). How likely is it that we would be so lucky in the case of a OCFB-MDC replacement?
This editorial was created to support these points:
This quote seems relevant:
In the matter of reforming things, as distinct from deforming them, there is one plain and simple principle; a principle which will probably be called a paradox. There exists in such a case a certain institution or law; let us say, for the sake of simplicity, a fence or gate erected across a road. The more modern type of reformer goes gaily up to it and says, 'I don't see the use of this; let us clear it away.' To which the more intelligent type of reformer will do well to answer: 'If you don't see the use of it, I certainly won't let you clear it away. Go away and think. Then, when you can come back and tell me that you do see the use of it, I may allow you to destroy it.'
Chesterton, G. K. (1929)