DI Management Home > Cryptography > Keys in Cryptography

An Introduction to Using Keys in Cryptography


Introduction

We get many queries from people about how to use keys in cryptography and how to represent them. This page is a simple introduction. If you take away nothing else, remember that a password is not a key.

Contents

What is a key?

A definition of a key from ISO/IEC 10116 (2nd edition): 1997 is

A sequence of symbols that controls the operation of a cryptographic transformation (e.g. encipherment, decipherment).

In practice a key is normally a string of bits used by a cryptographic algorithm to transform plain text into cipher text or vice versa. The key should be the only part of the algorithm that it is necessary to keep secret.

Key length

The key length is usually expressed in bits, 8 bits to one byte. Bytes are a more convenient form for storing and representing keys because most computer systems use a byte as the smallest unit of storage (the strict term for an 8-bit byte is octet). Just remember that most encryption algorithms work with bit strings. It's up to the user to pass them in the required format to the encryption function they are using. That format is generally as an array of bytes, but could be in hexadecimal or base64 format.

In theory, the longer the key, the harder it is to crack encrypted data. The longer the key, however, the longer it takes to carry out encrypion and decryption operations, although with modern computers, this is not normally an issue. Speed might be an issue with a smart card, for example.

Block cipher encryption algorithms like AES and Blowfish work by taking a fixed-length block of plaintext bits and transforming it into the same length of ciphertext bits using a key. Blowfish uses a variable key length between 8 and 448 bits long. It is at the choice of the two parties exchanging data to agree what length they use. Most other block cipher encryption methods have a fixed length key. For example, DES has a 64-bit key (but only uses 56 of them) and Triple DES has a 192-bit key (but only uses 168 of them). IDEA uses a 128-bit key. The Advanced Encryption Algorithm (AES) has a choice of three key lengths: 128, 192 or 256 bits. Public key encryption algorithms like RSA typically have key lengths in the order of 1000-2000 bits. Be careful with the difference in key lengths for block cipher algorithms and public key algorithms. According to SP800-57 Part 1 Table 4, a 192-bit Triple DES key is equivalent in security terms to a 2048-bit RSA key, and an AES-128 key is equivalent to a 3072-bit RSA key.

How relevant is key length?

To crack some ciphertext encrypted with a 64-bit key by the brute-force method of trying every combination of keys possible means you have 2^64 possible combinations or 1.8 x 10^19 (that's 18 followed by 18 naughts). You can expect, on the average, to find a correct answer in half this number of tries. If you have a computer that can carry out one encryption operation every millisecond, it will take about 292 million years to find the correct value. Speed up your computer by a million times and it will still take about 3 centuries to solve. The equivalent brute force technique for a 128-bit key will, in theory, take a "long time", probably past the expected life of the universe. But, in practice, a set of supercomputers operating in parallel can crack a 64-bit key in a relatively short time. If an attacker has access to a large selection of messages all encrypted with the same key, there are other techniques that can be used to reduce the time to derive the key.

Will other people try and crack my ciphertext?

Your competitors in business and people who you suspect log your encrypted credit card number from the Internet don't usually have these facilities at their disposal. However, if an organisation like the NSA wanted to, they may well be prepared to put such resources into it. We doubt that the NSA or anyone else for that matter is routinely decrypting all your PGP emails. You are not that important - see SatireWire's very amusing article Hackers Beg Boring People To Stop Encrypting Email. Security Experts Concur Most of You Have Nothing Worth Encrypting. However, "they" may well be flagging emails that are sent encrypted and keeping a copy of them. If some other incriminating evidence comes up about you, they may then go back and try to decrypt them. On the other hand, they could just arrest you and force you to tell them your password: how long are you prepared to protect your secret love letters when someone is using a thumbscrew on you?

How do encryption schemes fail?

Most encryption schemes are cracked not by brute force trying of all possible combinations of key bits, but by using other knowledge about how the sender derived the key. This could be a faulty random number generator known to used by the system, or knowledge that the user derived the key solely from a password of only the letters a to z, or just used simple English words. Or perhaps by finding out the keystrokes typed on the keyboard by the user with a keystroke logger, or by bribing (or torturing) someone to give them the key, or by reading the post-it note the user has conveniently left on the side of the computer with the password written on it. The traps are many and subtle and even the experts get it wrong. Why spend hours trying to pick the expensive security lock when the owner of the house has left a window open?

Strictly, it's not the length of the key, but the "entropy" in the method used to derive the key. There is approximately one bit of entropy in an normal ascii character. If you derive a 128-bit key from a password or pass phrase, you will need a very long pass phrase to get enough theoretical entropy in the key to match the security of the underlying key length: Bruce Schneier estimates that you need a 98-character English passphrase for a 128-bit key. Most people can't be bothered with such a cumbersome passphrase.

Using AES with a 128-bit key should provide adequate security for most purposes. The longer you intend to keep the encrypted data secret, the longer the key you should use, on the principle that cracking techniques will continue to improve over time. Bruce Schneier recommends a 256-bit key for data you intend to keep for 20-30 years.

No one is going to criticise you for using a key that is too long provided your software still performs adequately. However, in our opinion, the biggest danger in using a key that is too large is the false sense of security it provides to the implementers and users. "Oh, we have n-million-bit security in our system" may sound impressive in a marketing blurb, but the fact that your private key is not adequately protected or your random number generator is not random or you have used an insecure algorithm may mean that the total security is next to useless.

Remember it is the security of the total system that counts, including procedures followed by users.

Choice of Algorithm

Whatever you use, use an accepted algorithm: AES, Blowfish, Triple DES, IDEA, etc. Don't try making up your own algorithm; you aren't that good. The only secret should be in the value of the key. Everything else should be publicly available for scrutiny. Any system that is otherwise is known as "snake oil".

Password, pass phrase and key

People often get confused between "password" and "key". A password is typically a series of ASCII characters typed at a keyboard, e.g. "hello123" or "my secret pass phrase". This makes it easier for users to remember. They are, of course, much easier to crack because there are significantly fewer combinations to choose from.

A pass phrase is simply a password that consists of several words in a string, e.g. "she sells sea shells", so the terms "password" and "pass phrase" are equivalent for our purposes. In principle, a pass phrase makes it easier for a user to remember a long combination of characters. In practice, this adds to security only if the pass phrase is something known only to the user. Don't use quotes from famous literature - hackers read them, too.

A key used by an encryption algorithm is a bit string. A 128-bit key will have exactly 128 bits in it, i.e. 16 bytes. You will often see keys written in hexadecimal format where each character represents 4 bits, e.g. "FEDCBA98765432100123456789ABCDEF" represents 16 bytes or 128 bits. The actual bits in this example are

1111 1110 1101 1100 1011 1010 1001 1000 
0111 0110 0101 0100 0011 0010 0001 0000
0000 0001 0010 0011 0100 0101 0110 0111
1000 1001 1010 1011 1100 1101 1110 1111
There are many, many more combinations of 128 bits than of combinations of a few simple letters of the alphabet. This is where the security comes in.

How should I derive the key?

How you get from the password to the key is one of the tricks to ensure good, underlying security. One method is to use a one-way hash function such as SHA-1 or MD5. This takes a message of varying length and creates a digest of a fixed length. MD5 produces a digest of 128 bits or 16 bytes; SHA-1 produces one of 160 bits or 20 bytes. (Thanks to Jorge Fabregas for pointing out that an MD5 digest is 128 bits, not 64.)

The same message will always give the same digest, but it should be impossible (strictly, computationally infeasible) to derive the message from the digest. Well, maybe not. Remember that hackers know the digest values for all the simple passwords: for example, MD5('password')=5f4dcc3b5aa765d61d8327deb882cf99.

To improve security, we should mix in a few random bytes (a salt) with the message and re-hash it a given number of times to produce a bit string to use as a key. Obviously, whatever method you use, you will need to be able to reproduce the same key given the same password when you come to decrypt your message.

Example 1

We are using an encryption algorithm that uses a 128-bit key. Our user has entered the ascii string "abc" as the password they want to use to encrypt their secret data. The data may involve details of secret liaisons that our client is having with his mistress, or an important business deal, or the medical history of a patient, or the codeword to trigger a world-wide nuclear war. We don't care about the significance; we are going to apply our encryption algorithm that requires a 128-bit key to transform plain text into cipher text. Our problem is how to derive a key.

The ascii string "abc" consists of the three bytes with hexadecimal values 61, 62, 63. We need 16 bytes for our key, we only have 3. We could just do a direct conversion and use the key
61 62 63 00 00 00 00 00 00 00 00 00 00 00 00 00
Obviously this does not give us much security as we are only using 24 out of the 128 bits. The fact that "abc" will only take a few seconds to guess anyway is another issue.

If we hash this with MD5 we get the following 16 bytes:

90 01 50 98 3C D2 4F B0 D6 96 3F 7D 28 E1 7F 72

If we hash these bytes again, we get:

AF 5D A9 F4 5A F7 A3 00 E3 AD ED 97 2F 8F F6 87
and so on. Notwithstanding the poor choice of password in the first place, these bit strings could be used as keys in our encryption algorithm.

If we only needed a 64-bit key, we could just use the first 8 bytes. If we needed a 256-bit key, we could concatenate (join together) the two hash digests and use that.

However, there are limitations in this simple method. True, we have expanded our simple three-character password into a longer bit string, but if an attacker knows the method we have used to derive the key from the password, he, too, can do the same for all possible combinations of letters. The key space is still just as small as we started with. An attacker could just prepare a lookup table of hashed results for all the possible combinations of words in the dictionary and use that.

A good designer should assume that the key derivation method is known to an attacker. It is not good enough just to hope it's well hidden. The only secret should be the password itself.

Example 2

There are two standard methods to hinder an attacker's ability to reproduce our key derivation function. The first is salting and the second is stretching where we introduce an iteration count. Salting involves mixing in some random data with the password. This greatly increases the possible number of keys that could be derived from a given password and prevents an attacker from compiling a dictionary of pre-computed values. The iteration count specifies that the function must be repeated a given number of times, say, a minimum of 1000. This forces an attacker to go to a lot of extra work when trying out alternatives, but is not particularly onerous for a valid user.

You must be able to communicate the salt and iteration count to the other party so they can reproduce the same key from the password. This information can be sent in the clear. The iteration count can be agreed and fixed for all messages, but the seed must be different for each message.

In this example, we start with our password "abc" but select a random set of 8 bytes to use as a salt (S), say,

78 57 8E 5A 5D 63 CB 06

and specify that the iteration count (c) is to be 1000. We concatenate our password and salt and compute the hash of this:-

P || S = 61 62 63 78 57 8E 5A 5D 63 CB 06
H(1) = MD5(P || S) = 0E8BAAEB3CED73CBC9BF4964F321824A
Then we repeat this until we have carried out the hash 1000 times
H(2) = MD5(H(1)) = 1F0554E6F8810739258C9ABC60A782D5   
H(3) = MD5(H(2)) = ABA6FEDB4AD3EFAE8180364E617D9D79
...
H(1000) = MD5(H(999)) = 8FD6158BFE81ADD961241D8E4169D411
We use the final digest 8FD6158BFE81ADD961241D8E4169D411 as the key to encrypt the data.

The next time we send a message, we use the same password but a different salt, say,

7D 60 43 5F 02 E9 E0 AE
and specify the iteration count to be 2048. In this case, using the same method as before, the resulting key will be CC584D1EE305FB7EF65926F62E88DFE3.

Note we are not condoning using such a simple combination as "abc" as a valid password. It is merely used to show the technique.

This password-based encryption algorithm is known as PBKDF1. The Visual Basic code basMD5pbe1.bas shows how to carry out above computations. Thanks to Chris Searl for pointing out an error in an earlier version.

Another complication arises with systems set up for oriental characters. We could write a book on this topic (and many people have). Put simply, if your computer is set up for US-English, all the characters you need to display can be represented by a single 8-bit byte. So it is a simple matter to go from the string "abc" to the three bytes 0x61, 0x62 and 0x63 that we can then use to derive our key bit string. When we start using systems set up for oriental characters, things get complicated, and converting a password string into bytes becomes messy. See Cryptography with International Character Sets and Cross-Platform Encryption for more information on this topic. Just remember that, in Visual Basic, always use the Byte type to do your cryptographic operations and always convert your password String type into a Byte array first.

Password-Based Encryption Algorithms

The RSA Laboratories document PKCS#5: Password-Based Cryptography Specification Version 2.0 describes two secure password-based encryption (PBE) algorithms. This Visual Basic code basPBKDF2.bas demonstrates the use of the PBKDF2 algorithm using the built-in function in our CryptoSys API package.

Hindering brute force password cracking

If we have an application that uses a password to allow users access to sensitive encrypted data, it is exposed to the possibility that someone may just try every conceivable password until they succeed. There are automated programs out there that do just that. Most users if left to themselves will choose simple, easy to remember passwords that considerably reduce the possible number of combinations. For some interesting research on this topic see The Memorability and Security of Passwords by Ross Anderson and others from Cambridge University.

As a designer there are two simple techniques to use to improve security:

  1. Three strikes and you're out
  2. Introduce a delay before responding

In the first technique, you only allow the user three attempts to log in and then shut down the application, or at least the user's access screen. This is easy to implement on an EXE application on a PC, but harder to do with a web application where each attempt to log in is a separate event as far as the server is concerned and people can just keep trying.

In the second technique, you deliberately introduce a short delay of, say, half a second, before confirming or rejecting the user's attempt to log in. The user will hardly notice such a short delay, but this will severely hamper a cracker's automated attempt to try all variations in a short time.

Oh yes, and don't limit your users to ridiculously short passwords.

What's base64 encoding?

"Encoding" simply means converting data from one form to another, not necessarily encrypted. Just remember that Encoding is not encryption. base64 encoding is a method to convert 8-bit characters to 7-bit characters so they can be transferred over the internet. MIME uses this; so does PGP. It's also known as radix 64 encoding. Binary data encoded with base64 encoding looks like

/ty6mHZUMhA=
This represents the 8 bytes
FE DC BA 98 76 54 32 10

A base64 string consists of the 64 characters A-Z, a-z, 0-9 and '/' and '+', and can be terminated by up to two '=' characters for padding. Each four base64 characters represent three bytes. A valid base64 string will always be an exact multiple of four characters long. In practice, programs that decode from base64 will (should) just ignore characters that are not valid; but if a '=' character is found, it should be taken as indicating the end of the data.

Hexadecimal notation

Hexadecimal means a number expressed in base 16. The convention is that each digit can be between 0 and 15 in value and is represented by the characters 0-9 and A-F. It does not matter whether the letters are in upper- or lower-case. The 4-bit number that a hexadecimal digit represents is sometimes referred to as a nibble or nybble ("byte", "nybble", get it?).

HexadecimalDecimalBinary
000000
110001
220010
330011
440100
550101
660110
770111
881000
991001
A101010
B111011
C121100
D131101
E141110
F151111

In the C programming language, hexadecimal values are expressed in the form 0x7ABF, where the "0x" signifies a number following in hexadecimal. In Visual Basic, the format is &H7ABF. In Assembler language, the format is 7ABFH. They all mean the same thing. The usual convention is that bits to the left are the most significant and those to the right are the least significant. This would mean that 7ABF (or 0x7ABF in C or &H7ABF in Visual Basic) would represent the decimal integer 31423. (Hint: use the Calculator that comes with Windows to do these conversions - use it in Programmer mode.)

Try this in your Visual Basic (VB6) debugger:-
? Hex(31423)
? &H7ABF
This shows how to convert between hexadecimal and decimal format in Visual Basic.

Now try this example:-

? Hex(65244)
? &HFEDC

Now why doesn't the second statement give us what we expect? This is because of the sign bit - Visual Basic 6 uses "signed" arithmetic and we want to use "unsigned". This is a topic for another time. (For the solution, see Binary and Byte Operations in Visual Basic on our Cryptography page.)

What's big-endian and little-endian?

How your computer actually stores its data is another issue again - there are big-endian computers that store their data with the most significant byte first, and little-endian computers that store their data with the least significant bytes first. Unix-based systems are often big-endian. Intel-based Windows computers are little-endian. However, this is only of interest to us if we go poking around looking at how the bytes are arranged inside our computer memory and, for practical purposes, only serves to confuse an otherwise messy subject. The usual convention is to write out bit strings with the most significant bit first.

Other Information

For an introduction on how to use padding in cryptography, see Using Padding in Cryptography.

Contact

To comment on this page or ask a question, please send us a message.

This page last updated 14 October 2014

Comments

4 comments so far

[Comments are now closed]

Informative article, although some things seem to be presented more confusing than they have to be. "You must be able to communicate the salt and iteration count to the other party so they can reproduce the same key from the password. This information can be sent in the clear. " --- If it is sent in the clear, how does it improve security of the password it is used to salt? Can't an attacker, we should assume, knowing the derivation method, prepare the same table with short passwords, concatenated with the salt sent in the clear, just as in the previous example without the salt?

S12 | - Mon, Jan 11 2010 04:09 GMT

Creating such a table with the salt does not help. It is equivalent to doing a brute-force attack on the algorithm. A table is only useful if it can be prepared in advance and used many times.

David | Moderator - Sun, Jan 17 2010 03:48 GMT

The link The Memorability and Security of Passwords (http://www.ftp.cl.cam.ac.uk/ftp/users/rja14/tr500.pdf) is broken. I believe it should be http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-500.pdf

BarrRobot | Birmingham, UK - Sat, Sep 25 2010 14:05 GMT

You guyz are great, i mean last night i was stiking my trying to figure out many aspects of cryptography and some things that were not clear to me like the concept of padding, Salt, MD5, PKCS5, and many more but, now they are crystal clear, i was constantly searching the oracle java tutorial for this, but was not getting anythingbut, the moment i switched on to your site, i was over and clear, u guyz are amazing......

thank you very much......

Nikhil

India

Nikhil Patil | India - Sun, Oct 31 2010 14:10 GMT