Indiana University
University Information Technology Services
  
What are archived documents?
Login>>
Login

Login is for authorized groups (e.g., UITS, OVPIT, and TCC) that need access to specialized Knowledge Base documents. Otherwise, simply use the Knowledge Base without logging in.

Close

Compiler flags for IBM XL compiler suites

Following is an overview of available compiler flags for IBM XL compiler suites, with an emphasis on performance optimization and compatibility with GNU compilers on Big Red at Indiana University. This document assumes you have experience with compiling programs at the command line and know how to use basic compiler options (e.g., -I, -L, -g) to compile and link C/C++ or Fortran programs.

Consider the following tips for optimizing:

  1. Start with -O0 (the default for IBM compilers) and gradually improve optimization. Starting at -O0 ensures your code is algorithmically correct, preserves all debug information, and can expose problems in existing code. You can also specify the -qarch option to take advantage of architectural benefits, for example: xlc_r -c -O0 -qarch=ppc970 -g example.c

    If a problem occurs, -qdbg generates debug information for use by a symbolic debugger. The default is -qnodbg .

  2. Apply -O2. Compilation at -O2 opens your application to a set of comprehensive low-level transformations that apply to subprogram or compilation unit scopes and can include some inlining. Optionally, you can specify -qmaxmem=-1 to allow the optimizer to use as much memory as needed. You can also use the debugger by specifying the -g option at -O2.

  3. Apply -O3. Specifying -O3 initiates more intense low-level transformations that remove many of the limitations present at -O2. -O3 implies -qnostrict -qmaxmem=-1 -qhot=level=0. If you encounter problems at this optimization, try using -qstrict or -qnohot along with -O3, for example: xlc_r -c -O3 -qstrict -qarch=ppc970 -qtune=ppc970 example.c
  4. Turn on selected -qhot suboptions. -O3 implies only -qhot=level=0, which includes minimal loop transformations to increase performance. You can specify -qhot along with -O3 to see if application performance improves. -qhot is designed to have no effect when no opportunities exist.

Performance optimization

To effectively use the powerful hardware available on Big Red at IU, IBM XL compilers provide a series of flags to take advantage of specific hardware features:

-q32 and -q64
Big Red can run both 32- and 64-bit binaries. -q32 instructs the compiler to compile the source code into 32-bit object code, while -q64 makes the compiler generate 64-bit object code. On Big Red, the environment variable OBJECT_MODE has the same effect as these options; however, UITS recommends specifying the bit mode explicitly at the command line to avoid confusion.
-qarch
This lets the compiler target a specific architecture for compilation. For Big Red, you can use -qarch=ppc970, to generate instructions specific to Big Red's PPC970 processors. Note that the code generated this way may not run on other architectures. Alternatively, you can use -qarch=ppc or -qarch=ppc64, which make the compiler generate code that runs on all PowerPC or all 64-bit PowerPC architectures. -qarch corresponds to the -march option in GNU compilers.
-qtune
Using -qtune with -qarch will provide the best performance on the specified architecture. If -qarch is specified, the compiler will default to the matching -qtune option. If -qarch=ppc or -qarch=ppc64 is specified, you can use either -qtune=auto or -qtune=ppc970. This option corresponds to the -mtune option in GNU compilers.
-qhot
This option instructs compilers to perform high-order loop analysis and transformations during optimization. The suboptions for -qhot include:
arraypad
This option permits the compiler to increase the dimensions of arrays where doing so might improve the efficiency of array-processing loops, since arrays with dimensions of powers of two can allow decreased cache utilization. Not all arrays will be padded, and different arrays might be padded by different amounts. Using -qhot=arraypad might be unsafe since, there are no checks for reshaping, which may cause code to break.
simd and nosimd
This is useful for applications with significant vector processing needs, such as operations over images. It converts certain operations in a loop on successive elements of an array into a call to VMX instructions that use the AltiVec unit on the PPC970 processor. This is effective only with -qarch set to an architecture that supports VMX instructions.
vector and novector
This causes the compiler to convert certain operations performed on successive array elements in a loop into a call to the IBM MASS library routine, which calculates several results at a time. It supports both single- and double-precision floating-point operations. Note that use of this option could reduce the precision of operations. Using -qstrict together with -qhot will turn it off.
level=0 and level=1
-qhot=level=0 is equivalent to novector, nosimd, noarraypad. -qhot=level=1 is equivalent to -qhot, and has the same effect as a combination of -qhot=nosimd, -qhot=noarraypad, and -qhot=vector.

Following are flags you might consider using together with the above architecture-specific flags in compiling and optimizing the code:

-qstrict and -qnostrict
This turns off aggressive optimizations that have the potential to alter program semantics.
-qipa and -qnoipa
-qipa enables inter-procedural analysis and -qnoipa disables it. -qipa performs automatic inlining, limited alias analysis, and limited call-site tailoring. This is implied when you specify an aggressive optimization level such as -O4 or -O5. UITS recommends you specify -qipa options both when compiling and when linking.
-qmaxmem
This limits the amount of memory that the compiler allocates while performing specific, memory-intensive optimizations to the specified number of kilobytes. A value of -1 allows optimization to take as much memory as it needs without checking for limits.
-O0, -O (same as -O2), -O3, -O4, and -O5
At -O0, the compiler performs only minimal optimizations, such as constant folding and elimination of local common subexpressions.

-O or -O2 specifies what the compiler's developer considered the best combination of compilation speed and run time performance.

-O3 performs additional optimizations that are memory intensive, compile time intensive, or both. UITS recommends you use them when run time improvement is more important than minimizing compilation resources. -O3 could slightly alter semantics of the program, because it may rewrite floating point operations and relax conformance to IEEE rules. -O3 implies -qhot=level=1.

-O4 is more aggressive; it automatically sets the -qarch and -qtune options to the compiling architecture, and sets -qhot and -qipa.

-O5 raises the level of inter-procedural analysis.

Compatibility options

-W
UITS recommends using the IBM compiler and linker to avoid turning off link time optimization. The compiler uses the -W option to pass options specified at the command line to the linker and different components of the compiler. For example, to pass the option --whole-archive to the linker, specify -Wl,--whole-archive at the command line. For more information, refer to the man page for IBM compilers on Big Red.
-qlanglvl
Both IBM Fortran and C/C++ use -qlanglvl to specify the language standard to check against. For detailed language level specification, refer to the man pages. This option corresponds to the -std option in the GNU compilers.
-qpic
-qpic=small and -qpic=large correspond to -fpic and -fPIC in GNU C/C++ compilers. -qpic tells the compiler to generate position-independent code that can be used in the shared libraries. -qpic=small makes the Global Offset Table less than 64KB, while -qpic=large makes it larger than 64KB.

C/C++ compiler only

-qmkshrobj
This option corresponds to the -shared option in GNU C/C++ compilers. This option tells the linker to create a shared object from the generated object files.

Fortran compiler only

-qextname and -qnoextname
-qextname adds a trailing underscore to the names of all global entities except for the main program names.
-qintsize/-qrealsize/-qautodbl
-qintsize specifies the default size for INTEGER and LOGICAL values; -qrealsize specifies the default size for REAL, DOUBLE PRECISION, COMPLEX, and DOUBLE COMPLEX values. -qautodbl converts single-precision floating-point calculations to double-precision, and double-precision calculations to extended-precision. The default is -qautodbl=none.

For more complete information, see the following IBM compiler references:

This document was developed with support from the National Science Foundation (NSF) under Grant No. 0503697 to the University of Chicago and subcontracted to Indiana University. Additional support was provided by IU through its participation in the TeraGrid, which is supported by the NSF under Grants No. 0833618, SCI451237, SCI535258, and SCI504075. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the NSF.

This is document aupa in domains all and tgrid-all.
Last modified on July 15, 2009.

Comments/Questions/Corrections

Use this form to offer suggestions, corrections, and additions to the Knowledge Base. We welcome your input!

If you are affiliated with Indiana University and would like assistance with a specific computing problem, please use the Ask a Consultant form, or contact your campus Support Center.

Contact Information

Note: We will reply to your comment at this address. If your message concerns a problem receiving email, please enter an alternate email address.