Building the Right Environment to Support AI, Machine Learning and Deep Learning
The following is Chapter 5 from Secure Coding in C and C++ by Robert Seacord. More details about this book are available on the last page of this article.
Chapter 5: Integer Security
Everything good is the transmutation of something evil:
every god has a devil for a father.
—Friedrich Nietzsche. Sämtliche Werke: Kritische
Studienausgabe, vol. 10, selection 5, number 68
Integers represent a growing and underestimated source of vulnerabilities in C and C++ programs. This is primarily because boundary conditions for integers, unlike other boundary conditions in software engineering, have been intentionally ignored. Most programmers emerging from colleges and universities understand that integers have fixed limits, but because these limits were either deemed sufficient, or because testing the results of each arithmetic operation was considered prohibitively expensive, violating integer boundary conditions has gone almost entirely unchecked in commercial software.
Security changes everything. It is no longer acceptable to assume a program will operate normally given a range of expected inputs when an attacker is looking for input values that produce an abnormal effect. Digital integer representations are, of course, imperfect. A software vulnerability may result when a program evaluates an integer to an unexpected value (that is, a value other than the one obtained with pencil and paper) and then uses the value as an array index, size, or loop counter.
Because integer range checking has not been systematically applied in the development of most C and C++ software systems, security flaws involving integers are certain to exist, and some portion of these are likely to be vulnerabilities.
Figure 5-1 contains an example of a vulnerable program. The vulnerability results from a failure in how integer operations are managed. If you are unsure of why this program is vulnerable or how this vulnerability can be exploited to run arbitrary code, you should read the remainder of this chapter. Because integer vulnerabilities result from limitations in how they're represented, integer representation is examined first.
This section describes integer representations, types, and ranges. If you are already familiar with machine-level representation and manipulation of integer values, you can ignore this section and proceed to Section 5.2.
The major consideration in the digital representation of integers is negative integers. Representation methods include signed-magnitude, one's complement, and two's complement [Shiflet 02].
Signed-magnitude representation uses the high-order bit to indicate the sign: 0 for positive, 1 for negative. The remaining low-order bits indicate the magnitude of the value. For example, the binary value 0010 1001 shown in Figure 5-2 represents +41 when the most significant bit is cleared and -41 when it is set.
One's complement representation replaced signed magnitude because thecircuitry required to implement signed-magnitude arithmetic was too complicated. Negative numbers are represented in one's complement form by complementing (taking the opposite value of) each bit, as shown in Figure 5-3 (a). Each 1 is replaced with a 0 and each 0 is replaced with a 1. Even the sign bit is reversed.
Both signed-magnitude and one's complement representations have two representations for zero, which makes programming awkward: tests are required for +0 and -0.
Two's complement representation is the dominant representation and is used almost universally in modern computers. The two's complement form of a negative integer is created by adding one to the one's complement representation, as shown in Figure 5-3 (b). Two's complement representation has a single (positive) value for zero. The sign is still represented by the most significant bit, and the notation for positive integers is identical to their signed-magnitude representations.
C and C++ provide a variety of integer types to allow a close correspondence with the underlying machine architecture. The integer types categories are shown in Table 5-1.
There are two broad categories of integer types: standard and extended. The standard integer types include all the well-known integer types that have existed from the early days of K&R C. Extended integer types are defined in the C99 standard to specify integer types with fixed constraints.
Signed and Unsigned Types. Integers in C and C++ are either signed or unsigned. Both standard and extended integer types include signed and unsigned types; for each signed type there is an equivalent unsigned type. Signed integers are used to represent positive and negative values, the range of which depends on the number of bits allocated to the type and the encoding technique. On a computer using two's complement arithmetic, a signed integer ranges from -2 n-1 through 2 n-1 - 1. When one's complement or sign-magnitude representations are used, the lower bound is -2 n-1 + 1, while the upper bound remains the same.
Figure 5-4 shows the two's complement representation for 4-bit signed integers. Note that incrementing a signed integer at its maximum value (7) results in the minimum value for that type (-8).
Unsigned integer values range from zero to a maximum that depends on the size of the type. This maximum value can be calculated as 2 n - 1, where n is the number of bits used to represent the unsigned type. For each signed integer type, there is a corresponding unsigned integer type.
Figure 5-5 shows the two's complement representation for 4-bit unsigned integers. Again, note that incrementing a signed integer at its maximum value (15) results in the minimum value for that type (0).
Standard and Extended Types. Standard integers include the following types, in increasing length order (for example, long long int cannot be shorter than long int):
- signed char
- short int
- long int
- long long int
Extended integer types are implementation defined and include the following types:
- int#_t, uint#_t, where # represents an exact width (for example, int8_t, uint24_t)
- int_least#_t, uint_least#_t, where # represents a width of at least that value (for example, int_least32_t, uint_least16_t)
- int_fast#_t, uint_fast#_t, where # represents a width of at least that value for fastest integer types (for example, int_fast16_t, uint_fast64_t)
- intptr_t, uintptr_t are integer types wide enough to hold pointers to objects
- intmax_t, uintmax_t are integer types with the greatest width
Compilers that adhere to the C99 standard support all standard types and most extended types.
Other Standard Integer Types. In addition to the standard and extended integer types, the C99 specification also defines a number of standard types that are used for special purposes. For example, the following types are defined in the standard header <stddef.h>:
- ptrdiff_t is the signed integer type of the result of subtracting two pointers
- size_t is the unsigned integer type of the result of the sizeof operator
- wchar_t is an integer type whose range of values can represent distinct codes for all members of the largest extended character set specified among the supported locales
You should use these types where appropriate, but understand how they are defined, particularly when combined in operations with differently typed integers.
Platform-Specific Integer Types. In addition to the integer types defined in the C99 standard types, vendors often define platform-specific integer types. The Microsoft Windows API defines a large number of integer types, including: __int8, __int16, __int32, __int64, ATOM, BOOLEAN, BOOL, BYTE, CHAR, DWORD, DWORDLONG, DWORD32, DWORD64, WORD, INT, INT32, INT64, LONG, LONGLONG, LONG32, LONG64, and so forth.1
As a Windows programmer you will frequently come across these types. Again, it is okay to use these types, but you should understand how they are defined.
The minimum and maximum values for an integer type depend on the type's representation, signedness, and number of allocated bits. Figure 5b6 shows the ranges of some integers that use two's complement representation.
The C99 standard sets minimum requirements for these ranges. Table 5-2 shows the maximum and minimum extents of integer types as required by C99 and as implemented by the Visual C++ .NET and gcc compilers on IA-32. Integer ranges are compiler dependent but greatly influenced by the target machine architecture, as demonstrated by the use of a single column in Table 5-2 to represent integer sizes for both compilers.
|A good example of a software failure resulting from integer limits came on Saturday, December 25, 2004, when Comair halted all operations and grounded 1,100 flights after a crash of its flight crew scheduling software. The software failure was due to a 16-bit counter that limits the number of changes to 32,768 in any given month. Storms earlier in the month caused many crew reassignments, and the 16-bit value was exceeded.|
Compiler- and platform-specific integral limits are documented in the limits.h header file. Familiarize yourself with these limits, but remember that these values are platform specific. For portability, use the named constants and not the actual values in your code.
5.2 Integer Conversions
Type conversions occur explicitly in C and C++ as the result of a cast or implicitly as required by an operation. While conversions are generally required for the correct execution of a program, they can also lead to lost or misinterpreted data. This section describes how and when conversions are performed and identifies their pitfalls.
Implicit conversions are, in part, a consequence of the C language ability to perform operations on mixed types. For example, most C programmers would not think twice before adding an unsigned char to a signed char and storing the result in a short int. This is because the C compiler generates the code required to perform the required conversions implicitly. The C99 standard rules define how C compilers handle conversions. These rules, which are described in the following sections, include integer promotions, integer conversion rank, and usual arithmetic conversions.
Integer types smaller than int are promoted when an operation is performed on them. If all values of the original type can be represented as an int, the value of the smaller type is converted to an int; otherwise, it is converted to an unsigned int.
Integer promotions are applied as part of the usual arithmetic conversions (discussed later in this section) to certain argument expressions, operands of the unary +, -, and ~ operators, and operands of the shift operators. The following code fragment illustrates the use of integer promotions:
char c1, c2; c1 = c1 + c2;
Integer promotions require the promotion value of each variable (c1 and c2) to int size. The two ints are added and the sum truncated to fit into the char type.
Integer promotions are performed to avoid arithmetic errors resulting from the overflow of intermediate values. On line 5 of Figure 5-7, the value of c1 is added to the value of c2. The sum of these values is then added to the value of c3 (according to operator precedence rules). The addition of c1 and c2 would result in an overflow of the signed char type because the result of the operation exceeds the maximum size of signed char. Because of integer promotions, however, c1, c2, and c3 are each converted to integers and the overall expression is successfully evaluated. The resulting value is then truncated and stored in cresult. Because the result is in the range of the signed char type, the truncation does not result in lost data.
Integer Conversion Rank
Every integer type has an integer conversion rank that determines how conversions are performed. The following rules for determining integer conversion rank are defined in C99.
- No two different signed integer types have the same rank, even if they have the same representation.
- The rank of a signed integer type is greater than the rank of any signed integer type with less precision.
- The rank of long long int is greater than the rank of long int, which is greater than the rank of int, which is greater than the rank of short int, which is greater than the rank of signed char.
- The rank of any unsigned integer type is equal to the rank of the corresponding signed integer type, if any.
- The rank of any standard integer type is greater than the rank of any extended integer type with the same width.
- The rank of char is equal to the rank of signed char and unsigned char.
- The rank of any extended signed integer type relative to another extended signed integer type with the same precision is implementation defined but still subject to the other rules for determining the integer conversion rank.
- For all integer types T1, T2, and T3, if T1 has greater rank than T2 and T2 has greater rank than T3, then T1 has greater rank than T3.
The integer conversion rank is used in the usual arithmetic conversions to determine what conversions need to take place to support an operation on mixed integer types.
Conversions from Unsigned Integer Types
Conversions occur between signed and unsigned integer types of any size and can result in lost or misinterpreted data when a value cannot be represented in the new type.
Conversions of smaller unsigned integer types to larger unsigned integer types are always safe and typically accomplished by zero-extending the value.
When a large unsigned integer is converted to a smaller unsigned integer type, the larger value is truncated and the low-order bits are preserved. When a large unsigned integer is converted to a smaller signed integer type, the value is also truncated, and the high-order bit becomes the sign bit. In both cases, data may be lost if the value cannot be represented in the new type.
When unsigned integer types are converted to the corresponding signed integer type (for example, an unsigned char to a char), the bit pattern is preserved, so no data is lost. The high-order bit, however, becomes the sign bit. If the sign bit is set, both the sign and magnitude of the value changes.
Table 5-3 summarizes conversions from unsigned integer types. Conversions that can result in lost data are light gray, and ones that can result in the incorrect interpretation of data are dark gray.
Conversions from Signed Integer Types
When a signed integer is converted to an unsigned integer of equal or greater size and the value of the signed integer is not negative, the value is unchanged. The conversion is typically made by sign-extending the signed integer. A signed integer is converted to a shorter signed integer by truncating the high-order bits.
When signed integer types are converted to unsigned, there is no lost data because the bit pattern is preserved. However, the high-order bit loses its function as a sign bit. If the value of the signed integer is not negative, the value is unchanged. If the value is negative, the resulting unsigned value is evaluated as a large, signed integer. In line 3 of Figure 5-8, the value of c is compared to the value of l. Because of integer promotions, c is converted to an unsigned integer with a value of 0xFFFFFFFF or 4,294,967,295.
Table 5-4 summarizes conversions from signed integer types. Conversions that can result in lost data are light gray, and ones that can result in the incorrect interpretation of data are dark gray.
Signed or Unsigned Characters
An additional complication in C and C++ conversion is that the character type char can be signed or unsigned (depending on the compiler and machine). When a signed char with its high-bit set is saved in an integer, the result is a negative number. In some cases this can lead to an exploitable vulnerability, so in general it is best to use unsigned char instead of char or signed char for buffers, pointers, and casts when dealing with character data that may have values greater than 127 (0x7f). For example, when processing e-mail messages, certain versions of sendmail create tokens from address elements (user, host, domain). The code that performs this function (prescan() in parseaddr.c) contains logic to check that the tokens are not malformed or overly long. In certain cases, a variable in prescan() is set to the special control value b1, which may alter the program logic to skip the length checks. Using an e-mail message with a specially crafted address containing 0xFF, an attacker can bypass the length checks and overwrite the saved instruction pointer on the stack. When prescan() evaluates a character with the value 0xFF as an int, the value is interpreted as -1, causing the length checks to be skipped. This vulnerability is described in VU#897604 and elsewhere.2