Table of Contents
Principles of the OpenPGP SEIP (OCFB-MDC) and SE (OCFB) Block Cipher Modes
The SEIP (Symmetrically Encrypted Integrity Protected) block cipher mode is important because it is in the OpenPGP standard. It has been in use for a quarter of a century now. There are ongoing efforts to add another block cipher mode to the OpenPGP standard but even if a new mode becomes popular enough to be usable, because OpenPGP is used mostly for applications where the encrypted data stays around indefinitely, the current SEIP mode, for all we know, might remain important for another quarter century.
It occurred to me recently that I had spent so much time defending the SEIP cipher block mode that I now understood it well enough to coherently explain it to others. So I'll attempt to do that. The hope here is to provide a solid introduction to this and related constructs. So let's go light on the politics this time and concentrate on the cryptography.
The OpenPGP SEIP mode is based on the well known cipher feedback (CFB) mode. CFB is a slightly modified version of the easier to understand output feedback (OFB) mode. So let's start there…
The Output Feedback (OFB) Block Cipher Mode
A block cipher takes 128 bits1) of data and transforms it into 128 bits of encrypted data using a key. That can be inconvenient if you have more than 128 bits of data to encrypt. So let's avoid the issue by using the block cipher to create a stream of bits (bitstream) that seem random to those who do not possess the key (pseudorandom). Then, perhaps inspired by the one-time pad encryption system, we use an exclusive-OR (XOR ⊕) function to combine the data (plaintext) and the seemingly random bits to create the encrypted data (cyphertext). This sort of thing is often called a stream cipher. Here is how OFB does this:
We start with a 128 bit Initialization Vector (IV) (any previously unused 128 bits) and encrypt it using the key to produce the first 128 bits of the Bitstream. Then we take the first segment of the Bitstream and encrypt it. We encrypt the IV over and over again until we have a long enough Bitstream. Then we XOR the Bitstream with the Plaintext to produce the Ciphertext. The interesting thing here is that we are not actually using the encryption function to encrypt anything we care about. We are using the encryption function to do something else and then use that result to do the actual encryption. We don't even need a separate decryption function for OFB. When we want the plaintext back we just use the key to generate the Bitstream again and XOR it with the Ciphertext. That is based on a helpful property of the XOR function. If you XOR something with the same bitstream twice, you end up with what you started with.
The OFB mode has a fairly serious failure possibility. If you use the same IV and key on two or more messages then you get the same bitstream which causes, essentially, one-time pad key reuse. An unhelpful property of the XOR function is that when you XOR two ciphertexts that were produced with the same bitstream, the bitstream cancels out. You are left with the two plaintexts XORed together. Separating such combined plaintexts is usually not very hard.
There is a simple change we can make to OFB to mostly prevent this failure. The result is cipher feedback (CFB).
The Cipher Feedback (CFB) Block Cipher Mode
Instead of sending the last segment of the Bitstream along to the next stage, we instead send the last Ciphertext instead. This allows the Plaintext to modify the Bitstream. This means that the Bitstream will be different when the Plaintext is different. As a result, using the same IV and key on more than one message is much less disastrous. The attacker will only end up with one block (128 bits) worth of plaintext XORed together. That's much better than losing the whole thing.
There is an interesting question here. There is a block cipher mode called cipher block chaining (CBC) that was (and still is) much much more popular than CFB. The S/MIME email encryption scheme uses CBC mode for example. Why didn't the OpenPGP standard use CBC instead of CFB? It was, after all, what everyone else was using.
I encountered CFB well before I paid any attention to CBC. You like what you know, right? So for me, the question is, why did anyone use CBC instead of CFB? The two modes don't differ in any way that would make a significant difference. The few differences seem to favour CFB.
For one thing, CFB doesn't need padding in the same way that CBC does. For another, CBC has a subtle weakness that occurs when an attacker can predict the value of future IVs. CFB doesn't have that weakness2). TLS/SSL (Secure Sockets Layer) used CBC and had lots of problems with CBC padding … you know, the thing CFB doesn't need. There was also a TLS/SSL exploit that was based on predicting CBC IVs3). So it is interesting to think about an alternative reality where SSL used CFB instead of CBC. I doubt the differences between CBC and CFB would have made much difference for typical OpenPGP use so the situation here seems a bit ironic. OpenPGP ended up with unneeded advantages that TLS/SSL very much could have benefited from.
The currently popular counter block cipher mode (CTR) wasn't really a thing when the PGP protocols were being designed. CTR is faster than CFB for encryption in the sort of heavily optimized and multiprocessor environments that exist today. So the less optimized, single processor environment back in the day would likely not have caused counter mode to seem very attractive. CTR mode, like OFB mode, can leak the whole message when the key and IV are reused.
OpenPGP defines a prefix to generic CFB:
The original IV is replaced with zeros and the IV is shifted down to the first part of the message. Encrypting zeros is a common way to generate a new key from the existing key. For example, the popular Galois/counter (GCM) block cipher mode does this to generate the key for the GHASH used. Encrypting zeros can be thought of as hashing the key using the Davies-Meyer method. So we can redraw the diagram to make it easier to understand:
OpenPGP's CFB prefix, hash representation
The IV is encrypted by XORing it with the hash of the key (Key2). That means that the IV is encrypted in a different way than the rest of the message. That will be important later. The Check Value is the last 16 bits of the IV duplicated to provide a bit of redundancy. It is documented in the standard as a way to quickly determine if a user has entered the wrong passphrase. If the two check values do not match then it means that something went wrong with decryption and chances are it is a passphrase issue and an error can be returned without any further processing. So the prefix is the encrypted IV followed by the check value.
The Symmetrical Encryption (SE) Block Cipher Mode
The SE mode existed before the SEIP mode. It provides no detection of modification in transit and would only be used in situations where such detection is provided by some other method or is not required in the first place. Say, signed email or some instant messaging scheme that signs each message (iMessage does this).
OpenPGP taught me that the CFB mode allows something called resyncronization. The SE mode does CFB resynchronization. The OpenPGP CFB prefix is 128+16=144 bits long. CFB doesn't need padding so we start by encrypting just those 144 bits. The extra bits from the second block encryption operation are just thrown away. Then we effectively start a whole new CFB operation using the most recent 128 encrypted bits as the IV. Now the start of the plaintext message is aligned with the blocks of the CFB cypher mode. I guess that could make the program structure simpler or something. It doesn't seem to have any security implications, one way or the other. I didn't really see the point of CFB resyncronization but it is always good to learn new things, right? Anyway, after the CFB resyncronization the rest is just generic CFB so we now know how SE mode works.
There is a mysterious aspect to this. Why does the SE mode encrypt the IV in the prefix? On decryption the encrypted IV is used as is. To just get the plaintext decrypted data you don't even have to decrypt the IV. So this could be defined as generic CFB with a plain old boring IV as the prefix to the encrypted message. The only thing that uses the unencrypted IV is the check value and it would undoubtedly be possible to provide that bit of redundancy some other way.
Normally it is trivial to remove the first part of a CFB message but if an attacker chopped off the first part of a SE message, the check value would break. But if this was an intended feature the OpenPGP standard should mention that the check value is critical to security and the standard doesn't. This isn't the sort of thing that anyone using encrypted email in the 1990's would care about anyway, messages were either signed and guaranteed intact or not.
It isn't even clear here why a IV, encrypted or not, is required at all for the SE formatted message. Generally, an IV is useful when the same key is used to encrypt two or more messages. Think TLS where thousands/millions of messages might be encrypted with the same key during a connection. In the sort of usage that OpenPGP targets there is normally only one message and that message is encrypted with a new/unique key. So from this it is possible to imagine that at one time the IV was set to zero simply because it was not considered useful. That is actually done in another part of the OpenPGP standard for that exact reason. Perhaps it was later considered important to add in an IV and it made the most sense at the time to put it as the first block of the plaintext.
I have drifted well into baseless speculation here. I shouldn't be required or even be allowed to indulge in such speculation. This seems to be an excellent example to support a principle near and dear to my heart. When creating a standard (or any specification really) you are not done once you have defined what it is and how it works. You should really also explain why it works the way it does. Explaining why puts you at risk of someone later coming along and successfully attacking your reasoning, but if you don't provide an explanation they will tend to assume that there wasn't any reasoning.
In fairness to those that created the OpenPGP standard, this originally came from the need to be compatible with a commercial software system called Pretty Good Privacy. It is possible that the standard writers simply didn't have access to the why. So the reasons might actually be lost to time. My experience is that that sort of thing is disturbingly common for the things we use.
The Symmetrical Encryption Integrity Protected (SEIP) Block Cipher Mode
The SE mode was great and all, but it became obvious that just signing messages/files was not everything that everyone wanted to do. You might not want to sign your messages all the time (deniability) but would like the recipient of such anonymous messages to be warned about modifications in transit. You might want to be able to use the passphrase used to encrypt files with symmetrical encryption to prevent undetected modifications to those files.
The chosen way to fulfill these types of needs was to build integrity protection into the block cipher mode. As a result the SEIP mode was created from the SE mode with the addition of a hash operation on the unencrypted IV and the plaintext message. The result of that hash, named the MDC (Modification Detection Code), was tacked on to the end of the IV and message before encryption. Oh, and the CFB resynchronization was dropped. … That's it. That's the whole thing.
So we just drop a single hash operation into an already existing, perhaps crufty, construction? It's normally harder than that.
Protection
The SEIP scheme, when encrypting a message, first does everything concerned with modification detection and then encrypts the whole result. The encryption protects the modification detection which is really the SEIP defining characteristic. So it seems reasonable to look at this in terms of how well that protection actually works. Perhaps this will reveal other characteristics along the way.
The protection here is against discovery, modification or creation by an attacker. To avoid cluttering up the diagrams, let's just assume that the first 16 bits of the Plaintext are the check value.
Discovery
The protection against discovery simply comes from the fact that the encryption is applied last. So everything is encrypted and unavailable to the attacker.
Modification
Protection against modification
Flipping a bit in encrypted CFB (ciphertext) results in the corresponding bit being flipped in the decrypted message (plaintext). In practice this means that you can cause random damage but if you want to make a meaningful modification you will have to know the state of the bit before you got there. That's why the Plaintext is not considered to be protected against modification, the attacker might know parts or all of it and can thus modify it in a rational way. This is where the MDC (Modification Detection Code) comes in. If we can't directly prevent modification, at least we can detect it.
The IV is a random value unknown to the attacker. So the IV is not modifiable in a rational way. The IV is included in the hash. So assuming reasonable hash properties, that means that the MDC will be unpredictable and thus not modifiable in a rational way. More on hash properties later…
A secret IV is a stricter requirement than with generic CFB (and SE mode) where the IV can be public and only needs to be unique for a particular key. Our new modification detection feature has come with a cost.
The MDC is not discoverable or modifiable. Based on that assumption, an attacker has no knowledge or control of the mechanism used to detect modifications. The encryption and secret IV work together to create this desirable situation. Once we understand and accept that the MDC is protected from attack, we only have to understand how the MDC detects modifications in a benign environment. We don't have to understand everything at once.
CFB has a property I am going to call damage amplification. If an attacker modifies an encrypted block of data, even just a single bit flipped, when the message is decrypted the entire next block will be unpredictable garbage. CBC is much the same except that the modified block is destroyed and the corresponding bit is modified in the next block. ECB (electronic code book) is better than either in that it destroys the block modified and modifies nothing in a predictable way. OFB and counter modes have no damage amplification at all and allow any bit to be flipped with no further penalty.
The CFB damage amplification has implications for the case where the attacker learns the secret IV. If the attacker modifies the encrypted IV then the hash will pick up the garbage in the next block. If the attacker modifies any of the plaintext blocks then the hash will pick up the garbage in the next block. For the last plaintext block the MDC itself is destroyed. So in practice, knowledge of the secret encrypted IV will not help an attacker escape modification detection. So the requirement that the IV be secret is resistant to this particular misuse.
Creation
Normally an attacker can't create a whole new message because they don't have access to the key. So let's kick things up a notch and assume that the attacker can cause the victim to encrypt a particular plaintext (encryption oracle). For an OpenPGP example, we could imagine that the attacker can cause the victim to encrypt a file (perhaps through some automated process) with at least a portion of that file under the attacker's control. The attacker would arrange the part they control to be a complete and valid message after it is encrypted4). Then the attacker would trim off the unneeded parts at the start and/or the end of the message to create a new, shorter message that passes the modification check.
This sort of attack is not possible in the case of SEIP. The attacker can get the victim to create a new MDC for them (the MDC is not protected against creation). The secret IV is not encrypted in the same way as the Plaintext (and is thus protected). If the attacker embeds an entire valid message, IV and all, inside the message the victim creates for them, then the IV will eventually get decrypted in a different way than it was encrypted. Alternatively, if the attacker lets the victim encrypt the IV for them, then the attacker will not know what the IV was to start with (the victim provides it) and will not be able to create a valid MDC.
If the attacker knows the secret IV then they can probably create a new and valid message. So IV disclosure is significant at this level of threat. Still, the situation where the attacker can make the victim encrypt messages for them is fairly unlikely. It seems to me that, overall, SEIP is better than GCM5) for IV misuse considering how much worse GCM is than SEIP on IV reuse.
The CFB damage amplification is beneficial here as well. If an attacker could somehow come up with an encrypted suffix with appropriately encrypted MDC, splicing that suffix on the end of an existing message to extend it would result in a block of damage. The MDC would pick that block up and fail the modification detection in a way the attacker could not anticipate.
Luck
So… The encrypted IV, which seemed mostly pointless when used for the SE mode, is security critical for the SEIP mode. We added one hash and suddenly everything works. That seems … lucky? Oh, and the thing about the CFB resynchronization? Dropping it for the SEIP mode drives the blocks out of alignment and makes it much harder to convert a SEIP message to a SE mode message. Oh, and the check value? It breaks (and thus fails) if you attempt such a conversion. It just happens to be in the right place to do that.
This all seems to be much too much luck. The SEIP mode ends up with no apparent cruft at all. This makes it seem that the SE mode was designed with the SEIP mode in mind. I must have missed some significant historical context.
MAC
Speaking of missing things, it had to be pointed out to me that the SEIP mode, which I had been thinking of as a hash then encrypt scheme, could be interpreted as a MAC then encrypt scheme6). A MAC is a way of detecting modifications using a secret key shared between the originator and receiver of a message. So the secret IV is the shared secret key here. From the prospective of a MAC then encrypt scheme things look like this (terminology inconsistent with the rest of the article):
- The key is hashed by encrypting zeros producing a new key.
- The new key is used to encrypt the secret MAC key.
- The encrypted MAC key is prefixed to the message.
- The plaintext is MACed and the MAC tag is the suffix to the message.
- The plaintext and MAC tag is encrypted via CFB using the encrypted MAC key as the IV.
I have to admit that this seems more straightforward than all the stuff about protection. But be careful here. Most are not very knowledgeable about the properties of a MAC buried under the encryption. A MAC is most often used where the material being MACed and the MAC tag are available to the attacker. That is not the case here. I would like to think that the protection discussion is better for ultimately understanding why things work the way they do verses what results things provide. That was certainly the case for me while developing the protection discussion.
It should be pointed out that SEIP, while using a hash, is not an HMAC as such things are normally understood. The most significant difference is that the internal state of the hash is disguised in a different way to prevent length extension attacks. An HMAC does this by hashing in a secret key to the output of the hash of the protected material. SEIP instead encrypts the output of the hash of the protected material, thus preventing attacker access in the first place. It seems to me at least, that it is easy to understand why encrypting the hash prevents length extension. Preventing attacker knowledge of a value is after all the primary function of encryption. Easier than understanding why rehashing the hash in an HMAC prevents length extension where we are ultimately using the output of the same type of hash we are protecting.
The topic of key committing block cipher modes is popular right now (2024)7). This property ends up being related to MAC behaviour in most contemporary block cipher modes so this is as good a place as any to have the discussion. A key committing block cipher mode will make it impossible for an attacker to produce a message that will pass the integrity check with two different keys. The SEIP mode is key committing for a simple, easy to understand, reason. SEIP decrypts first and then checks for modifications using the MDC (MAC then encrypt). Changing the key used to decrypt a message with SEIP changes, well, everything. The plaintext and the MDC change to entirely different values in a very complicated, unpredictable, way. The encrypted IV changes as well, but in a predictable way. Then the MDC check is applied to the wreckage. An attacker would have a very difficult time coming up with two keys that would cause the eventual MDC to pass. If the MAC had been done first on decryption (encrypt then MAC) as, say, GCM does, then the attacker would only have to figure out how to come up with two keys that would produce the same MAC value. With the GHASH used for GCM this is fairly easy. As a result, GCM is not key committing.
Hash Properties
The OpenPGP standard says this about the required properties of the hash used to generate the MDC:
It does not rely on a hash function being collision-free, it relies on a hash function being one-way.
This is a really good property for a cipher block mode that is intended to protect files/messages for an indefinite, possibly quite long, period of time as hashes often are found to have weaknesses over time8). The SHA-1 hash used for the MDC is a good example here. It was found to be vulnerable to a collision attack. That made no difference whatsoever to the security of the MDC.
SEIP for TLS
Many are familiar with connected, encrypted pipe, applications like TLS these days. So I think we could cover a lot of ground, quickly, if we imagined the use of SEIP in a TLS context.
OpenPGP is normally used for regular, traditional encryption applications where someone encrypts something and then transfers the result in time and/or space with the expectation that the desired recipient will decrypt it. The sort of thing that people have been doing for thousands of years. On decryption there are no special requirements. Any errors that occur can (and should) be delivered to the one doing the decryption in a detailed and hopefully understandable form. Any decrypted material should be ultimately available to the decrypting user to help them determine what went wrong.
With TLS things are not so straightforward. We need to ask the question: who is the decrypting user? Entities on the network send messages to an endpoint. If something goes wrong, then there will be a strong temptation to have the endpoint return error information to the entity that sent the message. After all, where else could we send it? That would mean sending diagnostic information across the network to an entity we don't know. That doesn't end well in practice…
An attacker can copy an encrypted message off the network, make changes to it, and see what sort of errors come back when they send it to the endpoint. More advanced attackers can note changes in the response time. If the errors/time depend on the plaintext then an attacker might be able to determine some aspect of the plaintext. With repeated trials they might get the whole thing.
A connection oriented protocol like TLS has a significant advantage over boring old regular encryption/decryption. If something goes wrong, you can just do the thing that went wrong again. The current TLS approach is to first check for any modifications when a message hits an endpoint and then abort if any are found. All a potential attacker gets is the logical equivalent of “something went wrong, please try again”.
Implementations of SEIP currently in use don't work that way at all. The previously discussed check value is a good example. It is specifically intended to bypass any other processing. An implementation decrypts the first block, examines the check value and aborts with an error if it is invalid. The check value is encrypted and thus counts as plain text. An attacker can use the check value error to determine 16 bits worth of a block9) with repeated trials. Other error conditions can be used for the same sort of thing10). So if we want to use SEIP for TLS, we need to come up with a suitable implementation. Decryption would look something like this when a SEIP message was sent to an endpoint:
- Decrypt the entire message.
- Check the decrypted message using the MDC.
- If the MDC fails then abort with a suitable error.
- If the MDC passes then proceed with processing the plaintext.
So we need to decrypt the message before we can check it for modifications (MAC then encrypt). The CBC padding oracle vulnerability is often used as an example of the weakness of such a scheme but, as already mentioned, it is specifically not relevant here. Instead let's consider timing. The MDC is based on SHA-1 which is inherently constant time due to the primitives it is constructed from. The decryption function, on the other hand, might not be constant time and could conceivably leak information about the plaintext due to timing differences when used in TLS. A block cipher mode that checked the MAC first and then refused to decrypt (encrypt then MAC) would be resistant to this sort of thing. The attacker would be restricted to messages generated legitimately and could not just generate their own.
We would preserve the authentication established at the start of the connection based on the shared key. This is normally how this is done in TLS where there is no separate authentication key. GCM in TLS is handled this way for example. When the SEIP mode was created, the specific objective was to handle the case of anonymous/unauthenticated/unsigned messages/files. So the term “modification detection” was used. When block cipher modes targeting things like TLS were created, it was assumed that every message would be authenticated. So the term “authenticated encryption” was used. There is no difference other than the terminology. GCM can be used in a modification detection situation in the same way that SEIP can be used in an authenticated encryption situation. This difference in terminology sometimes causes confusion. It is perfectly appropriate to categorize the SEIP mode as an example of authenticated encryption.
The check value no longer would have any, err, value and would not be checked so there would be a temptation to save the otherwise wasted space and start the plaintext 16 bits earlier. The secret IV would cause an efficiency issue in that an entire block worth of secret/random data would be required for each and every message in the TLS connection. There would be a strong motivation to only send the secret once at the start of the connection and use it for all the messages in the connection. The encrypted IV prefix could be replaced with a counter that incremented for each new message. At this point we end up with a simple CFB based encryption mode with a MAC for integrity.
So we see that we could use SEIP for TLS, but that is not what we would end up with. From this we see that SEIP mode is really just generic CFB mode with a prefix to allow the transport of a secret value used to detect modifications. This is something that would really only be done for an environment where we were only sending one message.
Up to this point we assumed a cipher with 128 bit blocks. The OpenPGP standard supports ciphers with 64 bit blocks. Use of a 64 bit cipher in a SEIP formatted message would result in the secret IV only being 64 bits long. So the strength of the modification detection would be reduced. It's not clear to me that SEIP with 64 bit blocks is something that anyone ever implemented and that such messages/files were ever generated. This is still an important point as a dependency on block size for modification detection strength is not something that anyone would expect.
HMAC(k, m) = H(k || m)