Introduction to Applied Cryptography and Steganography

One can set up a reasonably secure wireless or wired network without knowing which ciphers are used and how the passwords are encrypted. This, however, is not an approach endorsed by us and discussed here. Hacking is about understanding, not blindly following instructions; pressing the buttons without knowing what goes on behind the scenes is a path that leads nowhere. Besides, security and quality of service are tightly interwoven, and as you will see later in this chapter, incorrect selection of the cipher and its implementation method can lead to a secure but sluggish and inefficient network. Although the achieved security enhancements are unlikely to be mentioned by the network users, low throughput and high delay would surely get reported to the IT team and, possibly, to management.

Before getting down to ciphers, modes, and protocols, let's get some definitions right.

Cryptography defines the art and science of transforming data into a sequence of bits that appears as random and meaningless to a side observer or attacker. The redundancy of data is also removed by compression data. However, whereas compressed data is easy to decompress, decrypting data requires a key that was used to bring the "randomness" to the plaintext. On a side track, because both encryption and compression increase the entropy of data compressed, encrypted data might actually expand in size after the compression, which makes compression unfeasible. If you have to implement both encryption and compression of data, apply the compression first.

Cryptanalysis is the reverse engineering of cryptography—attempts to identify weaknesses of various cryptographic algorithms and their implementations to exploit them. Any attempt at cryptanalysis is defined as an attack. Exhaustive key search (or brute-forcing) is not a form of cryptanalysis, but it is still an attack!

Cryptology encompasses both cryptography and cryptanalysis and looks at mathematical problems that underlie them.

Encrypting data provides data confidentiality, authentication, data integrity, and nonrepudiation services. Data availability could be affected by incorrect implementations of cryptographic services, for example when bandwidth consumption and packet delay are above the acceptable limit due to improperly implemented cryptographic solutions. Also, for local DoS attacks, preceding authentication is necessary. Many sources that claim that cryptography does not affect the availability part of the "CISSP triad" (confidentiality, integrity, availability) are therefore incorrect. Additionally, encrypted viruses that decrypt themselves to self-activate are common, as well as backdoors that use encrypted channels of communication with crackers (most of the latest distributed DoS tools do). These are the examples of Black Hat cryptography implementations. At the same time, secure authentication of access to antiviral software and encryption of virus signature databases can protect the antivirus software from tampering by both malware and malicious users. Thus, sources indicating that encryption has nothing to do with malware protection aren't exactly right either.

The first ciphers, in use were simple substitution and transposition algorithms. Imagine that you have a pack of cards. Changing the position of cards in the pack in a predetermined way known to you but not others (one of the ways to cheat!) rather than just shuffling them would be an example of a transposition cipher. The cards remain the same, but their order is changed. Having an agreement that a king is really a jack, a 6 is an ace, or diamonds are now spades and vice versa are examples of substitution ciphers. Textbook examples of substitution ciphers are shift ciphers in which the data is shifted to the side by a predefined number of positions. For example, a Caesar's cipher involves assigning a number to every letter and then shifting the position of each letter by a predefined number k (in Caesar's case, k = 3). Thus, A becomes D, B becomes E, and so on. A variety of Caesar's cipher called ROT13 is still used by some software and involves a shift by 13 characters: P = ROT13 (ROT13 (P)), so encrypting text with ROT13 twice gives you the original text.

The substitution and shift ciphers are easy to break. For example, if the opponent wanted to break Caesar's cipher, he or she could choose a single encrypted word from a long text, give it to 22 soldiers (because there are 23 letters in the Latin alphabet), and ask the first soldier to shift all letters in the word by one position, the second soldier by two positions, and so forth, obtaining the value of k in no time. In the current case, the k value is the key, and a very weak key indeed: one integer with modulo 23 = less than 5 bits of data in all possible combinations! To break more sophisticated substitution ciphers with seemingly random agreement on which letter substitutes for another, as well as the transposition ciphers, statistical cryptanalysis is used. Every language has a defined frequency distribution of used letters, and by analyzing this distribution in a ciphertext, a machine can easily deduce the plaintext, and finally a key. In a nutshell, the most abundant letter in the English alphabet is e, so the most common letter or symbol in the English plaintext-derived ciphertext must be e, and so on. Substituting digrams or trigrams (two- and three-letter sequences) was tried to bypass statistical analysis and failed; now the frequencies of digrams and trigrams for various languages are documented. In the case of encrypted source code, frequencies of various operators and statements from different programming languages are documented and used in conjunction with spoken language statistical analysis. For example, in C we would expect a high frequency of #define and #include occurrences in the beginning of the source code. Encrypted binaries have similar problems, making them vulnerable to statistical cryptanalysis: functions, loop structures, and so on. Regarding the encrypted traffic on the network collected by tcpdump or some other (tcpdump-based or ridiculously expensive) network analyzer, should we mention the similarities and repetitiveness of fields in frames, packets, segments, and datagrams? We do know their precise length and where exactly these fields are.

In attempts to create a cipher superior to substitution and transposition algorithms, various approaches have been tried. One working approach was concealment ciphers—security through obscurity that actually works. Historical tricks included invisible inks, grilles covering some characters but not others, and so on. More recently, spread spectrum military radio technology, now actively used by various 802.11 LANs and Bluetooth, came as an example of concealment security—weak wideband radio signals that appear to be nothing but noise for a casual radio frequency scanner. Unfortunately or not, due to the compatibility and usefulness issues, this security through obscurity does not work in our WLAN case. Besides, an attacker with a decent (expensive) spectrum analyzer can still detect and dissect spread spectrum signals. See http://www.tscm.com/spectan.html for some examples of spread spectrum bugs signal detection and analysis.

Steganography is another new player in the concealment field. It is based on replacing the least significant bit in image, music, or video files with the concealed message data, using tools such as Steghide (http://steghide.sourceforge.net; see also http://www.outguess.org/detection.php for the opposite). Mimic functions are another form of steganography, an offspring of the "hardware" grilles mentioned earlier. These functions modify the message so that it appears to be something else, usually casual and inconspicuous. An example of something very casual and inconspicuous (if annoying!) constantly flowing through the Internet is SPAM. You can check http://www.spammimic.com or download a Perl script the site uses to hide the messages under the disguise of junk mail from http://packetstormsecurity.org/UNIX/misc/mimic.zip. Another example, somewhat close to steganography, is hiding suspicious traffic in data streams that do not usually raise network administrators' suspicions. A variety of backdoors use inconspicuous ICMP packets (e.g., echo reply) or IGMP traffic to hide a communication channel with the backdoor (e.g., http://packetstormsecurity.org/UNIX/penetration/rootkits/icmp-backdoor.tar.gz or http://packetstormsecurity.org/UNIX/penetration/rootkits/sneaky-sneaky-1.12.tar.gz). We have already mentioned using such backdoors to mask a wireless attacker behind a legitimate host in Chapter 8. Interestingly, similar covert channels can be employed to transmit highly confidential data over an insecure physical medium (wireless) as part of an advanced defense-in-depth strategy.

Running key ciphers involves a sequence of physical actions to obtain the key. For example, an agreed-on message might say bk10.3L.15.36.9, which states "The key is in a book on shelf 10, 3 books to the left, page 15, 36th line, 9th word." You open the book and the word is, of course, "Microsoft" (no pun intended!). Although running key ciphers can be reasonably secure, they aren't really applicable in network and host security.

Finally, there is a perfect encryption scheme that cannot be broken, no matter how much processing power is at the attacker's disposal. Ironically, this scheme is of very little use for IT security, just like running key ciphers. You probably gathered that we are talking about one-time pads. A one-time pad is a large matrix of truly random data. Originally it was a one-time tape for teletype transmission. Each pad is XORed with plaintext to encrypt it and is used only once on both communication ends. Irrecoverable destruction of the pad follows use. Such a data transmission scheme is perfectly secure from the cryptanalysis viewpoint, providing the entropy source for the pad is truly random. However, secure pad distribution and storage and sender–receiver synchronization prove a tremendously difficult task. Because the superpowers usually have sufficient resources to accomplish such a task, one-time pads were employed to secure the hotline between the Cold War giants and were frequently used by spies on both sides of the Iron Curtain. A Russian submarine radio operator in the movie K-19 Widowmaker appears to use a one-time pad to encrypt his message before the radio transmission takes place.

Looking back at the options just presented, we are left with two choices. One choice is continuing to fortify substitution and transposition ciphers until their cryptanalysis becomes computationally unfeasible. Another choice is to come up with novel encryption schemes different from classical methodologies described (we discuss this more when we come to asymmetric ciphers). Yet another choice is steganography. This chapter does not dwell on steganography because it is not widely used to secure wireless networks. However, stegtunnel from SYN ACK Labs (http://www.synacklabs.net/projects/stegtunnel/) is an interesting free tool one can employ for wireless traffic protection. If you have a particular interest in this subject, we suggest checking out a variety of online sources, such as http://www.cl.cam.ac.uk/~fapp2/steganography/ or http://www.jjtc.com/Steganography/, as well as books currently on the market (Information Hiding: Steganography and Watermarking—Attacks and Countermeasures by Johnson, Duric & Amp, 2000, Cluwer Academic Publishers, ISBN: 0792372042; Disappearing Cryptography: Information Hiding: Steganography; Watermarking by Wayner, 2002, Morgan Kaufmann, ISBN: 1558607692; and Information Hiding: Techniques for Steganography; Digital Watermarking by Katzenbeisser, 2000, Artech House Books, ISBN: 1580530354). Now it is time to return to the substitution and transposition ciphers we started with.

Before dealing with the modern-day substitution and transposition cipher offspring, there is a common misconception to deal with first. This misconception is that you have to be a brilliant mathematician to understand cryptography. As far as our experience goes, understanding what a function is, and understanding binary arithmetic, matrices, modular arithmetic, and Boolean logic operators, will get you by without significant problems. Some revision of the latter is, perhaps, a good idea. We find truth tables to be particularly good for Boolean logic memory refreshment:


NOT. NOT (!= in C) truth table is:



   INPUT      OUTPUT



     1           0



     0           1





OR ( || in C, as in {if ((x>0) || (x<3)) y=10;} ) truth table is



     A   B      A || B



     1   1        1



     1   0        1



     0   1        1



     0   0        0

AND ( && in C, as in {if ((x>0) && (x<3)) y=20;} ) truth table is



     A   B      A && B



     1   1        1



     1   0        0



     0   1        0



     0   0        0



(remember subnetting ? IP && netmask !)



And finally, XOR (or eXclusive OR, ^= in C) truth table is



     A   B      A ^= B



     1   1        0



     1   0        1



     0   1        1



     0   0        0



     mention that:



     a ^= a = 0

     a ^= b ^= b = a



     or if



     p ^= k = c

     c ^= k = p

In layman's terms, this is "XORing the same value twice restores the original value," pretty much like the double use of ROT13 shift cipher mentioned earlier. In fact, some software vendors implement XORing with a secret key as a form of encryption. This is a grave mistake, and that kind of "encryption" would not be more secure than ROT13. All one needs to do is discover the length of the key by counting coincidences of bytes in the ciphertext. Then the ciphertext can be shifted by that length and XORed with itself, efficiently removing the key.

However, XORing is used excessively by many strong ciphers as a part of their operation. When popular literature states that the key was "applied" to the plaintext, it actually means plaintext ^= key at some point. The main reason for this is because XORing the same data twice restores the original data, both encryption and decryption software can use exactly the same piece of code to perform these tasks.

So, how does one go about creating strong "product ciphers" on the basis of insecure substitution and permutation ciphers and XORing?

< Day Day Up >