Previous Page
Next Page

2.3. Floating-Point Types

C also includes special numeric types that can represent nonintegers with a decimal point in any position. The standard floating-point types for calculations with real numbers are as follows:


float

For variables with single precision


double

For variables with double precision


long double

For variables with extended precision

A floating-point value can be stored only with a limited precision, which is determined by the binary format used to represent it and the amount of memory used to store it. The precision is expressed as a number of significant digits. For example, a "precision of six decimal digits" or "six-digit precision" means that the type's binary representation is precise enough to store a real number of six decimal digits, so that its conversion back into a six-digit decimal number yields the original six digits. The position of the decimal point does not matter, and leading and trailing zeros are not counted in the six digits. The numbers 123,456,000 and 0.00123456 can both be stored in a type with six-digit precision.

In C, arithmetic operations with floating-point numbers are performed internally with double or greater precision. For example, the following product is calculated using the double type.

    float height = 1.2345, width = 2.3456;  // Float variables have single
                                            // precision.
    double area = height * width;           // The actual calculation is
                                            // performed with double
                                            // (or greater) precision.

If you assign the result to a float variable, the value is rounded as necessary. For more details on floating-point math, see the section "math.h" in Chapter 15.

C defines only minimal requirements for the storage size and the binary format of the floating-point types . However, the format commonly used is the one defined by the International Electrotechnical Commission (IEC) in the 1989 standard for binary floating-point arithmetic, IEC 60559. This standard is based in turn on the Institute of Electrical and Electronics Engineers' 1985 standard IEEE 754. Compilers can indicate that they support the IEC floating-point standard by defining the macro _ _STDC_IEC_559_ _ . Table 2-6 shows the value ranges and the precision of the real floating-point types in accordance with IEC 60559, using decimal notation.

Table 2-6. Real floating-point types

Type

Storage size

Value range

Smallest positive value

Precision

float

4 bytes

±3.4E+38

1.2E-38

6 digits

double

8 bytes

±1.7E+308

2.3E-308

15 digits

long double

10 bytes

±1.1E+4932

3.4E-4932

19 digits


The header file float.h defines macros that allow you to use these values and other details about the binary representation of real numbers in your programs. The macros FLT_MIN, FLT_MAX, and FLT_DIG indicate the value range and the precision of the float type. The corresponding macros for double and long double begin with the prefixes DBL_ and LDBL_. These macros, and the binary representation of floating-point numbers, are described in the section on float.h in Chapter 15.

The program in Example 2-2 starts by printing the typical values for the type float, then illustrates the rounding error that results from storing a floating-point number in a float variable.

Example 2-2. Illustrating the precision of type float
#include <stdio.h>
#include <float.h>

int main( )
{
  puts("\nCharacteristics of the type float\n");

  printf("Storage size: %d bytes\n"
         "Smallest positive value: %E\n"
         "Greatest positive value: %E\n"
         "Precision: %d decimal digits\n",
         sizeof(float), FLT_MIN, FLT_MAX, FLT_DIG);

  puts("\nAn example of float precision:\n");
  double d_var = 12345.6;       // A variable of type double.
  float f_var = (float)d_var;   // Initializes the float
                                // variable with the value of d_var.
  printf("The floating-point number    "
         "%18.10f\n", d_var);
  printf("has been stored in a variable\n"
         "of type float as the value   "
         "%18.10f\n", f_var);
  printf("The rounding error is        "
         "%18.10f\n", d_var - f_var);

  return 0;
}

The last part of this program typically generates the following output:

    The floating-point number    12345.6000000000
    has been stored in a variable
    of type float as the value   12345.5996093750
    The rounding error is            0.0003906250

In this example, the nearest representable value to the decimal 12,345.6 is 12,345.5996093750. This may not look like a round number in decimal notation, but in the internal binary representation of the floating-point type it is exactly representable, while 12,345.60 is not.


Previous Page
Next Page