Storing and representing ciphertext
Ciphertext is not text!
In cryptography programming you have to be very careful to differentiate between binary data and what we'll refer to here as text data. 'Text' consists of readable, printable characters we expect to see on our computer screen or in a book. It might consist of simple US-ASCII/ANSI characters or it could be Unicode or DBCS oriental character strings. Text is usually stored in a string type of some kind. 'Binary' data is a string of bits that we conventionally store as bytes or octets.
- Bit strings
- Problems with bytes sequences
- Encoding in hexadecimal and base64
- Advantages of hex-encoded strings
- Converting strings to bytes and vice versa
- Input to Encryption and Decryption Processes
- Other Information
Bit strings
The input to and ciphertext output from all modern encryption functions is, strictly speaking, always a bit string. A `bit string' is an ordered sequence of `bits', each of value either `0' or `1'.
Most programming languages do not have a convenient `bit string' type and so we have to work around. We usually store bit strings as a sequence of bytes each consisting of 8 bits (an 8-bit byte is sometimes referred to a an octet). So, for example, a 128-bit bit string can be stored in a 16-byte sequence of bytes (since 128/8=16).
Storing bit strings
In VB we use an array of Byte types
Dim abData() As Byte nLen = 16 ReDim abData(nLen - 1)
In C we use the unsigned char type (often typedef'd as BYTE)
unsigned char data[16];
or
unsigned char *pdata; int len = 16; pdata = (unsigned char *)malloc(len);
C# and VB.NET have the byte and Byte types respectively
byte[] data = new byte[16];
Dim data(16) As Byte
Problems with byte sequences
Sequences of bytes are compact but not very convenient for programmers.
- You can't print them directly - if you do you get garbage.
- They are tricky to manipulate in code.
- They are difficult to debug.
- In C, you have to specify the length with a separate variable.
- Users sometimes treat them as strings and wonder why they have problems.
Encoding in hexadecimal and base64
A more convenient form is to encode the binary sequence in hexadecimal or base64 format. These encoded forms can easily be stored in a string. Hexadecimal (hex) is particularly convenient because you can easily (well, with practice) see immediately what the value of the underlying ciphertext is. Debugging is much easier. Test vectors for encryption algorithms are usually expressed in hexadecimal form. The only real disadvantage of hex formatted data is that it takes up twice as much storage space than the decoded bytes.
Base64-encoded data is more compact than hexadecimal, but is pretty well impossible to decode by eye.
For example, the 64-bit string11111110 11011100 10111010 10011000 01110110 01010100 00110010 00010000can be represented in hex by the eight bytes
FE DC BA 98 76 54 32 10or as the hex-encoded string
"FEDCBA9876543210"In base64 this is
"/ty6mHZUMhA="
Advantages of hex-encoded strings
When programming with bytes, a lot of your programming time is spent converting from hex format into byte format and then back again for debugging and testing. If your encryption package has the option, you may as well work consistently in hex format all the time. You then only need to convert the original plaintext from `text' into a hex-encoded string before encryption and then convert back after successful decryption.
The advantages of using hex strings include- You can store them in normal string variables, which are usually easier to manage in programs.
- You can pass them between different computer systems and in emails without corruption.
- Printing is straightforward.
- Debugging is easier as the value of each encoded byte is immediately visible.
Converting strings to bytes and vice versa
Use these functions to convert a string of text to an unambiguous array of bytes and vice versa.VB6/VBA
In VB6/VBA, use the StrConv function.
Dim abData() As Byte Dim Str As String Dim i As Long Str = "Hello world!" ' Convert string to bytes abData = StrConv(Str, vbFromUnicode) For i = 0 To UBound(abData) Debug.Print Hex(abData(i)); "='" & Chr(abData(i)) & "'" Next ' Convert bytes to string Str = StrConv(abData, vbUnicode) Debug.Print "'" & Str & "'"
48='H' 65='e' 6C='l' 6C='l' 6F='o' 20=' ' 77='w' 6F='o' 72='r' 6C='l' 64='d' 21='!' 'Hello world!'
VB.NET
In VB.NET use System.Text.Encoding.
Dim abData() As Byte Dim Str As String Dim i As Long Str = "Hello world!" ' Convert string to bytes abData = System.Text.Encoding.Default.GetBytes(Str) For i = 0 To UBound(abData) Console.WriteLine(Hex(abData(i)) & "='" & Chr(abData(i)) & "'") Next ' Convert bytes to string Str = System.Text.Encoding.Default.GetString(abData) Console.WriteLine("'" & Str & "'")You could be more explicit by replacing
.Default with .GetEncoding(1252),
and then use the appropriate code page for your character set (1252 is Western European).
C#
In C#, use System.Text.Encoding, which has identical behaviour to the function in VB.NET.
byte[] abData; string Str; int i; Str = "Hello world!"; // Convert string to bytes abData = System.Text.Encoding.Default.GetBytes(Str); for (i = 0; i < abData.Length; i++) { Console.WriteLine("{0:X}", abData[i]); } // Convert bytes to string Str = System.Text.Encoding.Default.GetString(abData); Console.WriteLine("'{0}'", Str);
C/C++
In C and C++, the distinction between a string and an array of bytes is often blurred. A string is a zero-terminated sequence ofchar types and
bytes are stored in the unsigned char type.
A string needs an extra character for the null terminating character;
a byte array does not, but it needs its length to be stored in a separate variable.
A byte array can can contain a zero (NUL) value but a string cannot.
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
static void pr_hexbytes(const unsigned char *bytes, int nbytes)
/* Print bytes in hex format + newline */
{
int i;
for (i = 0; i < nbytes; i++)
printf("%02X ", bytes[i]);
printf("\n");
}
int main()
{
char szStr[] = "Hello world!";
unsigned char *lpData;
long nbytes;
char *lpszCopy;
/* Convert string to bytes */
/* (a) simply re-cast */
lpData = (unsigned char*)szStr;
nbytes = strlen(szStr);
pr_hexbytes(lpData, nbytes);
/* (b) make a copy */
lpData = malloc(nbytes);
memcpy(lpData, (unsigned char*)szStr, nbytes);
pr_hexbytes(lpData, nbytes);
/* Convert bytes to a zero-terminated string */
lpszCopy = malloc(nbytes + 1);
memcpy(lpszCopy, lpData, nbytes);
lpszCopy[nbytes] = '\0';
printf("'%s'\n", lpszCopy);
free(lpData);
free(lpszCopy);
return 0;
}
48 65 6C 6C 6F 20 77 6F 72 6C 64 21 48 65 6C 6C 6F 20 77 6F 72 6C 64 21 'Hello world!'
The types char and unsigned char
might be identical on your system, or they might not be.
We strongly recommend that you explictly distinguish between strings and byte arrays in your code by using
the correct type and consistently treating them differently.
Unicode strings in C
If your string is a Unicode string, then it consists of a sequence of wchar_t types,
which are usually 2 bytes long, but may be longer.
Converting wide-character strings to a sequence of bytes in C is more problematic.
You can either copy the Unicode string directly to a string of bytes
(in which case every second byte will be zero for US-ASCII characters),
or use the stdlib wcstombs function or the Windows WideCharToMultiByte function
to convert to a sequence of multi-byte characters (some will be one byte long, some two or more)
and then convert the multi-byte string to bytes (you can do this with a simple cast).
Each party encrypting and decrypting must agree on which way to do it.
Input to Encryption and Decryption Processes
The input to an encryption process must be 'binary' data, i.e. a `bit string'.
We need to convert the text we want to encrypt into
`binary' format first and then encrypt it. The results of encryption are always binary. Do not attempt
to treat raw ciphertext as `text' or put it directly into a String type.
Store ciphertext either as a raw binary file or convert it to base64 or
hexadecimal format. You can safely put data in base64 or hexadecimal format in a String.
When you decrypt, always start with binary data, decrypt to binary data, and then and only then, convert back to text, if that is what you are expecting. You can devise your own checks to make sure the decrypted ciphertext is what you expect before you do the final conversion.
On a US-English system set up for ANSI characters, you can probably get away with using a String
type to carry out `binary' operations. We know, we have done it for years and a lot of code on this site still
has residual mistakes in it. We spend a lot of time explaining to people why their code doesn't work properly
on their Chinese/Japanese/Korean/Hebrew system.
Other information
See also our pages on:- Cross-Platform Encryption
- Encryption with International Character Sets
- Changes to Blowfish in Visual Basic in Version 6 which discusses Strings vs Bytes and ANSI vs Unicode
- Binary and byte operations in Visual Basic
- Using Byte Arrays in Visual Basic
- Encrypting variable-length strings with a password.
This document last updated 25 February 2007.
To comment on this Contact
DI Management.
Return to the Cryptography page.
[Top]