Previous Page
Next Page

18.6. Optimization

GCC can apply many techniques to make the executable program that it generates faster and/or smaller. These techniques all tend to reduce still further the "word-for-word" correspondence between the C program you write and the machine code that the computer reads. As a result, they can make debugging more difficult, and are usually applied only after a program has been tested and debugged without optimization .

There are two kinds of optimization options. You can apply individual optimization techniques by means of options beginning with -f (for flag), such as -fmerge-constants, which causes the compiler to place identical constants in a common location, even across different source files. You can also use the -O options (-O0, -O1, -O2, and -O3) to set an optimization level that cumulatively enables a number of techniques at once.

18.6.1. The -O Levels

Each of the -O options represents a number of individual optimization techniques. The -O optimization levels are cumulative: -O2 includes all the optimizations in -O1, and -O3 includes -O2. For complete and detailed descriptions of the different levels, and the many -f optimization options that they represent, see the GCC reference manual. The following list offers a brief description of each level:


-O0

Turn off all optimization options.


-O, -O1

Try to make the executable program smaller and faster, but without increasing compiling time excessively. The techniques applied include merging identical constants, basic loop optimization, and grouping stack operations after successive function calls. An -O with no number is interpreted as -O1.


-O2

Apply almost all of the supported optimization techniques that do not involve a tradeoff between program size and execution speed. This option generally increases the time needed to compile. In addition to the optimizations enabled by -O1, the compiler performs common subexpression elimination, or CSE; this process involves detecting mathematically equivalent expressions in the program and rewriting the code to evaluate them only once, saving the value in an unnamed variable for reuse. Furthermore, instructions are reordered to reduce the time spent waiting for data moving between memory and CPU registers. Incidentally, the data flow analysis performed at this level of optimization also allows the compiler to provide additional warnings about the use of uninitialized variables.


-O3

Generate inline functions and enable more flexible allocation of variables to processor registers. Includes the -O2 optimizations.


-Os

Optimize for size. This option is like -O2, but without those performance optimizations that are likely to increase the code size. Furthermore, block reordering and the alignment of functions and other jump destinations on power-of-two byte boundaries are disabled. If you want small executables, you should be compiling with the GCC option -s, which instructs the linker to strip all the symbol tables out of the executable output file after all the necessary functions and objects have been linked. This makes the finished program file significantly smaller, and is often used in building a production version.

The following example illustrates how -O options are used:

$ gcc -Wall -O3 -o circle circle.c circulararea.c -lm

This command uses -O3 to enable the majority of the supported optimization techniques.

18.6.2. The -f Flags

GCC's many -f options give you even finer control over optimization. For example, you can set a general optimization level using an -O option, and then turn off a certain technique. An example:

$ gcc -Wall -O3 -fno-inline-functions -o circle circle.c circulararea.c -lm

The options -O3 -fno-inline-functions in this command enable all the optimizations grouped in -O3 except inline compiling of functions.

There are also flags to enable many optimizations that are not included in any -O level, such as -funroll-loops; this option replaces loop statements that have a known, small number of iterations with repetitive, linear code sequences, thus saving jumps and loop-counter operations. A full list of the hundred or so -f options that control GCC's individual optimization flags would be too long for this chapter, but the examples in this section offer a hint of the capabilities available. If you need a certain compiler feature, there's a good chance you'll find it in the manual.

18.6.3. Floating-Point Optimization

Some of the optimization options that are not included in the -O groups pertain to floating-point operations. The C99 floating-point environment supports scientific and mathematical applications with a high degree of numeric accuracy, but for a given application, you might be more interested in speed than in the best floating-point math available. For such cases, the -ffast-math option defines the preprocessor macro _ _FAST_MATH_ _, indicating that the compiler makes no claim to conform to IEEE and ISO floating-point math standards. The -ffast-math flag is a group option, which enables the following six individual options:


-fno-math-errno

Disables the use of the global variable errno for math functions that represent a single floating-point instruction.


-funsafe-math-optimizations

The "unsafe math optimizations" are those that might violate floating-point math standards, or that do away with verification of arguments and results. Using such optimizations may involve linking code that modifies the floating-point processor's control flags.


-fno-trapping-math

Generates "nonstop" code, on the assumption that no math exceptions will be raised that can be handled by the user program.


-ffinite-math-only

Generates executable code that disregards infinities and NaN ("not a number") values in arguments and results.


-fno-rounding-math

This option indicates that your program does not depend on a certain rounding behavior, and does not attempt to change the floating-point environment's default rounding mode. This setting is currently the default, and its opposite, -frounding-math, is still experimental.


-fno-signaling-nans

This option permits optimizations that limit the number of floating-point exceptions that may be raised by signaling NaNs. This setting is currently the default, and its opposite, -fsignaling-nans, is still experimental.

18.6.4. Architecture-Specific Optimization

For certain system architectures, GCC provides options to produce optimized code for specific members of the processor family, taking into account features such as memory alignment, model-specific CPU instructions, stack structures, increased floating-point precision, prefetching and pipelining, and others. These machine-specific options begin with the prefix -m. If you want to compile your code to make the most of a specific target system, read about the available options in the GCC reference manual.

For several processor types, such as the Sparc, ARM, and RS/6000-PowerPC series, the option -mcpu=cpu generates machine code for the specific CPU type's register set, instruction set, and scheduling behavior. Programs compiled with this option may not run at all on a different model in the same CPU family. The GCC manual lists the available cpu abbreviations for each series.

The option -mtune=cpu is more tolerant. Code generated with -mtune=cpu uses optimized scheduling parameters for the given CPU model, but adheres to the family's common instructions and registers, so that it should still run on a related model.

For the Intel x86 series, the -mcpu=cpu option is the same as -mtune=cpu. The option to enable a model-specific instruction set is -march=cpu. An example:

$ gcc -Wall -O -march=athlon-4 -o circle circle.c circulararea.c -lm

This command line compiles a program for the AMD Athlon XP CPU.

18.6.5. Why Not Optimize?

Sometimes there are good reasons not to optimize . In general, compiling with optimization takes longer and requires more memory than without optimization. How much more depends on what techniques are applied. Furthermore, the performance gains obtained by a given optimization technique depend on both the given program and the target architecture. If you really need optimum performance, you need to choose the techniques that will work in your specific circumstances.

You can combine both -O and -f optimization options with GCC's -g option to include debugging information in the compiled program, but if you do, the results may be hard to follow in a debugging program; optimization can change the order of operations, and variables defined in the program may not remain associated with one register, or may even be optimized out of existence. For these reasons, many developers find it easier to optimize only after a program has been debugged.

Some optimization options may also conflict with strict conformance to the ISO C standard, such as merging variables declared with const as if they were constants. If standards-conformance is critical, and sometimes it is, there are certain optimizations you may not wish to pursue.

Another issue you may encounter is that some optimization techniques result in nondeterministic code generation. For example, the compiler may use randomness in guessing which branch of a conditional jump will be taken most often. If you are programming real-time applications, you'll probably want to be careful to ensure deterministic behavior.

In any case, if you want to be sure of getting the greatest possible runtime performance, or if you need to know in detail how GCC is arriving at the exact machine code for your C program, you will need to study the detailed optimization options in the GCC manual.


Previous Page
Next Page