Indiana University
University Information Technology Services
  
What are archived documents?

Compiler flags for IBM XL compiler suites

Following is an overview of available compiler flags for IBM XL compiler suites, with an emphasis on performance optimization and compatibility with GNU compilers on Big Red at Indiana University. This document assumes you have experience with compiling programs at the command line and know how to use basic compiler options (e.g., -I, -L, -g) to compile and link C/C++ or Fortran programs.

Consider the following tips for optimizing:

  1. Start with -O0 (the default for IBM compilers) and gradually improve optimization. Starting at -O0 ensures your code is algorithmically correct and also preserves all debug information and can expose problems in existing code. Also you can specify the -qarch option to take advantage of architectural benefits, for example: xlc_r -c -O0 -qarch=ppc970 -g example.c

    If a problem occurs, -qdbg generates debug information for use by a symbolic debugger. The default is -qnodbg .

  2. Apply -O2. Compilation at -O2 opens your application to a set of comprehensive low-level transformations that apply to subprogram or compilation unit scopes and can include some inlining as well. Optionally, you can specify -qmaxmem=-1 to allow the optimizer to use as much memory as needed. You can also use the debugger by specifying the -g option at -O2.

  3. Apply -O3. Specifying -O3 initiates more intense low-level transformations that remove many of the limitations present at -O2. -O3 implies -qnostrict -qmaxmem=-1 -qhot=level=0. If you encounter problems at this optimization, try using -qstrict or -qnohot along with -O3, for example: xlc_r -c -O3 -qstrict -qarch=ppc970 -qtune=ppc970 example.c
  4. Turn on selected -qhot suboptions. -O3 implies only -qhot=level=0, which includes minimal loop transformations to increase performance. You can specify -qhot along with -O3 to see if the application gets additional performance boost. -qhot is designed to have neutral effect when no opportunities exist.

Performance optimization

To effectively use the powerful hardware available on Big Red at IU, IBM XL compilers provide a series of flags to take advantage of specific hardware features:

-q32 and -q64
Big Red can run both 32- and 64-bit binaries. -q32 instructs the compiler to compile the source code into 32-bit object code, while -q64 makes the compiler generate 64-bit object code. On Big Red, the environment variable OBJECT_MODE has the same effect as these options; however, UITS recommends specifying the bit mode explicitly at the command line to avoid confusion.
-qarch
This lets the compiler target a specific architecture for compilation. For Big Red, you can use -qarch=ppc970, because Big Red has PPC970 processors. The compiler will generate PPC970 processor specific instructions. Note that the code generated this way may not run on other architectures. Alternatively, you can also use -qarch=ppc or -qarch=ppc64, which make the compiler generate code that runs on all PowerPC or all 64-bit PowerPC architectures. -qarch corresponds to the -march option in GNU compilers.
-qtune
Matching -qtune with -qarch will provide the best performance on the specified architecture. If -qarch is specified, the compiler will default to the matching -qtune option. If -qarch=ppc or -qarch=ppc64 is specified, you can use either -qtune=auto or -qtune=ppc970. This option corresponds to the -mtune option in GNU compilers.
-qhot
This option instructs compilers to perform high-order loop analysis and transformations during optimization. The suboptions for -qhot include the following:
arraypad
This option permits the compiler to increase the dimensions of arrays where doing so might improve the efficiency of array-processing loops, since arrays with dimension of power of two can lead to decreased cache utilization. Not all arrays will be padded and different arrays might be padded by different amounts. Use of -qhot=arraypad might be unsafe since there is no checking for reshaping, which may cause code to break.
simd and nosimd
This is useful for applications with significant image processing needs. It converts certain operations in a loop on successive elements of an array into a call to VMX instructions that could utilize the Altivec unit on the PPC970 processor. This is effective only with -qarch set to an architecture that supports VMX instructions.
vector and novector
This causes the compiler to convert certain operations performed on successive array elements in a loop into a call to the IBM MASS library routine, which calculates several results at a time. It supports both single and double precision floating point operations. Note that use of this option could reduce the precision of operations. Using -qstrict together with -qhot will turn it off.
level=0 and level=1
-qhot=level=0 is equivalent to novector, nosimd, noarraypad. -qhot=level=1 is equivalent to -qhot and the same as -qhot=nosimd, -qhot=noarraypad, and -qhot=vector.

Following are flags you might consider using together with the above architecture-specific flags in compiling and optimizing the code:

-qstrict and -qnostrict
This turns off aggressive optimizations that have the potential to alter program semantics.
-qipa and -qnoipa
-qipa enables inter-procedural analysis and -qnoipa disables it. -qipa performs automatic inlining, limited alias analysis, and limited call-site tailoring. This is implied when you specify an aggressive optimization level such as -O4 or -O5. UITS recommends you specify -qipa options both when compiling and when linking.
-qmaxmem
This limits the amount of memory that the compiler allocates while performing specific, memory-intensive optimizations to the specified number of kilobytes. A value of -1 allows optimization to take as much memory as it needs without checking for limits.
-O0, -O (same as -O2), -O3, -O4, and -O5
At -O0, the compiler performs only minimum optimizations, such as constant folding and elimination of local common subexpressions.

-O or -O2 specifies what the compiler's developer considered the best combination of compilation speed and run time performance.

-O3 performs additional optimizations that are memory intensive, compile time intensive, or both. UITS recommends you use them when run time improvement is more important than minimizing compilation resources. -O3 could slightly alter semantics of the program because it may rewrite floating point operations and relax conformance to IEEE rules. -O3 implies -qhot=level=1.

-O4 is more aggressive; it automatically sets the -qarch and -qtune options to the compiling architecture, and sets -qhot and -qipa.

-O5 raises the level of inter-procedural analysis.

Compatibility options

-W
UITS recommends using the IBM compiler and linker to avoid turning off link time optimization. The compiler uses the -W option to pass options specified at the command line to the linker and different components of the compiler. For example, to pass the option --whole-archive to the linker, specify -Wl,--whole-archive at the command line. For more information refer to the man page for IBM compilers on Big Red.
-qextname and -qnoextname (Fortran compiler only)
-qextname adds a trailing underscore to the names of all global entities except for the main program names.
-qlanglvl
Both IBM Fortran and C/C++ use -qlanglvl to specify the language standard to check against. For detailed language level specification, refer to the man pages. This option corresponds to the -std option in the GNU compilers.
-qintsize/-qrealsize/-qautodbl (Fortran compiler only)
-qintsize specifies the default size for INTEGER and LOGICAL values; -qrealsize specifies the default size for REAL, DOUBLE PRECISION, COMPLEX, and DOUBLE COMPLEX values. -qautodbl converts single-precision floating point calculations to double precision, and double precision calculations to extended precision. The default is -qautodbl=none.
-qmkshrobj (C/C++ compiler only)
This option corresponds to the -shared option in GNU C/C++ compilers. This option tells the linker to create a shared object from the generated object files.
-qpic
-qpic=small and -qpic=large correspond to -fpic and -fPIC in GNU C/C++ compilers. -qpic tells the compiler to generate position-independent code that can be used in the shared libraries. -qpic=small specifies that the Global Offset Table should be less than 64KB, while -qpic=large means the Global Offset Table larger than 64KB.

For more complete information, see the following IBM compiler references:

This document was developed with support from the National Science Foundation (NSF) under Grant No. 0503697 to the University of Chicago and subcontracted to Indiana University. Additional support was provided by IU through its participation in the TeraGrid, which is supported by the NSF under Grants No. 0833618, SCI451237, SCI535258, and SCI504075. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the NSF.

This is document aupa in domains all and tgrid-all.
Last modified on January 15, 2008.
Please tell us, did you find the answer to your question?